The GeoReference Pipeline is a cloud-native, fully automated system designed to process and analyze geospatial map images efficiently. This system enables compression of raw TIFF maps into optimized PNG formats, extracts meaningful metadata using Amazon Bedrock LLMs, and integrates with GitHub for geospatial data storage.
- AWS Lambda & S3 Triggers: Handles automated processing of map files from S3 storage.
- PIL-Based Image Compression: Reduces TIFF file sizes while maintaining quality.
- AWS Bedrock Claude 3.5 Integration: Extracts map metadata using Large Language Models (LLMs).
- Automated GitHub Storage: Stores processed GeoJSON outputs in a GitHub repository.
- Error Handling & CloudWatch Logging: Ensures robust monitoring and debugging.
- Upload TIFF Map to S3 (
raw/
folder) - Compression Lambda Converts TIFF to PNG
- Compressed Images Stored in
compressed/
Folder - Analysis Lambda Extracts Metadata Using AWS Bedrock
- GeoJSON & CSV Metadata Files Generated
- GeoJSON Data Pushed to GitHub Repository
- Error Handling & Logging in
error/
Folder
Ensure the following are installed and configured:
- AWS CLI (with IAM permissions)
- AWS CDK (globally installed)
- Docker (running)
- GitHub Token (for repository access)
$ git clone https://github.com/YOUR_GITHUB_USERNAME/water_resources_geojson.git
$ cd water_resources_geojson
Bootstrap AWS CDK (First-Time Setup)
$ cdk bootstrap
Modify cdk.json
to include:
"context": {
"bucket_name": "my-geo-pipeline-bucket",
"compression_function_name": "GeoCompressionLambda",
"analysis_function_name": "GeoAnalysisLambda",
"compression_layer_name": "GeoCompressionLayer",
"analysis_layer_name": "GeoAnalysisLayer",
"github_token": "YOUR_GITHUB_ACCESS_TOKEN",
"github_repo_name": "water_resources_geojson",
"bedrock_model_id": "anthropic.claude-3-5-sonnet-20241022-v2:0",
"bedrock_region": "us-west-2",
"max_lambda_memory_mb": 10240,
"max_lambda_timeout_minutes": 15,
"max_lambda_ephemeral_storage_mb": 10240,
"compression_target_mb": 3,
"prompt_file_name": "prompt.py"
}
$ cdk deploy --all
Once the deployment is complete, the necessary AWS services will be created, including:
- S3 Buckets (
raw/
,compressed/
,error/
,analysis/
) - Lambda Functions (Compression & Analysis)
- IAM Roles & Policies
- AWS Bedrock Model Integration
- GitHub Integration for GeoJSON Files
$ aws s3 cp test-map.tif s3://my-geo-pipeline-bucket/raw/
To check logs for the Compression Lambda:
$ aws logs tail /aws/lambda/GeoCompressionLambda --follow
To check logs for the Analysis Lambda:
$ aws logs tail /aws/lambda/GeoAnalysisLambda --follow
- Check the
compressed/
folder for the converted PNG. - Check the
analysis/
folder for the generated CSV metadata. - Verify the GitHub Repository for the stored GeoJSON file.
- Check the
error/
folder if any errors occur during processing.
Check logs in CloudWatch:
$ aws logs tail /aws/lambda/GeoCompressionLambda --follow
$ aws logs tail /aws/lambda/GeoAnalysisLambda --follow
Ensure your GitHub Token is correct in cdk.json
and has repo
access.
Make sure S3 notifications are enabled:
$ aws s3api get-bucket-notification-configuration --bucket my-geo-pipeline-bucket
This GeoReference Pipeline provides a scalable, cloud-native solution for processing and analyzing geospatial maps. It automates compression, metadata extraction, and structured data storage while leveraging AWS services for seamless execution.
Happy coding! 🚀