Welcome to the AWS Glue Data Copy repository! This project provides a robust function for copying data such as CSV, Parquet, Avro, and more from a source S3 bucket to a destination S3 bucket using AWS Glue. This repository includes the necessary setup for the Glue job, logging, and efficient data handling.
- Data Formats: Supports various data formats including CSV, Parquet, and Avro.
- AWS Glue Integration: Seamlessly integrates with AWS Glue for data processing.
- S3 Buckets: Easily read from and write to S3 buckets.
- Logging: Built-in logging for monitoring job execution.
- Efficient Data Handling: Optimized for performance and reliability.
To get started with the AWS Glue Data Copy function, follow these steps:
-
Clone the Repository:
git clone https://github.com/muhd-minhaz/AWS-Glue--Data-Copy.git cd AWS-Glue--Data-Copy
-
Install Dependencies: Make sure you have the necessary dependencies installed. You can install them using pip:
pip install -r requirements.txt
-
Configure AWS Credentials: Ensure your AWS credentials are configured. You can set them up in the
~/.aws/credentials
file or use environment variables.
To use the AWS Glue Data Copy function, you need to create a Glue job in the AWS Management Console. Here’s how to do it:
-
Create a Glue Job:
- Navigate to the AWS Glue Console.
- Click on "Jobs" and then "Add job".
- Set the job name and choose the IAM role.
- In the script path, point to your Glue script in the repository.
-
Run the Job:
- Start the job from the AWS Glue Console.
- Monitor the job execution in the console.
-
Check Logs:
- View logs in CloudWatch to troubleshoot any issues.
You can customize the Glue job by modifying the parameters in the script. Here are some key parameters:
- Source Bucket: Specify the S3 bucket where the source data resides.
- Destination Bucket: Specify the S3 bucket where the copied data will be stored.
- Data Format: Choose the format of the data you are copying (CSV, Parquet, etc.).
The AWS Glue Data Copy function includes logging capabilities to help you monitor job execution. The logs are sent to Amazon CloudWatch. You can view the logs to troubleshoot issues or verify that the job ran successfully.
- Job Started: Indicates when the job begins execution.
- Data Read: Confirms data has been read from the source bucket.
- Data Written: Confirms data has been written to the destination bucket.
- Job Completed: Indicates successful completion of the job.
We welcome contributions to the AWS Glue Data Copy project. If you would like to contribute, please follow these steps:
- Fork the Repository: Click the "Fork" button on the top right of the page.
- Create a New Branch:
git checkout -b feature/YourFeature
- Make Your Changes: Implement your feature or fix.
- Commit Your Changes:
git commit -m "Add Your Feature"
- Push to Your Branch:
git push origin feature/YourFeature
- Create a Pull Request: Go to the original repository and create a pull request.
This project is licensed under the MIT License. See the LICENSE file for details.
To download the latest release or previous versions, visit the Releases section. Here, you can find the necessary files to be downloaded and executed.
If you need to check for updates or previous versions, you can also refer to the Releases section.
Thank you for checking out the AWS Glue Data Copy repository! We hope you find it useful for your data processing needs. If you have any questions or feedback, feel free to reach out. Happy coding!