python utils/syntetic_data_energy.py
python utils/syntetic_data_weather.py
aws s3 sync ./data/output/ s3://<BUCKET_NAME>/electricity-forecasting/data/input/
This example is providing an utility class HDFSManager for managing data from and to HDFS.
hdfs_manager.copy_full_to_hdfs(<LOCAL_PATH>)
hdfs_manager.copy_from_hdfs(<LOCAL_FILE_SYSTEM>, <HDFS_FILE_NAME>)
df = hdfs_manager.load_df(<SPARK_CONTEXT>, <FILE_NAME>)
hdfs_manager.save_df(df, <FILE_NAME>)
aws ecr create-repository --repository-name <REPOSITORY_NAME>
- REPOSITORY_NAME: ECR Repository name
Example:
aws ecr create-repository --repository-name sagemaker-spark-custom-container
aws ecr get-login-password --region <REGION> | docker login --username AWS --password-stdin <SPARK_IMAGE_URI>
- REGION: AWS Region
- SPARK_IMAGE_URI: Public ECR URI for the Spark image
Please refer to the official AWS GitHub repository
Example:
aws ecr get-login-password --region eu-west-1 | docker login --username AWS --password-stdin 571004829621.dkr.ecr.eu-west-1.amazonaws.com/sagemaker-spark-processing:3.3-cpu-py39-v1.2
docker build -t <ACCOUNT_ID>.dkr.ecr.<REGION>.amazonaws.com/${REPOSITORY_NAME}:${IMAGE_TAG} -f Dockerfile .
Example:
docker build -t <ACCOUNT_ID>.dkr.ecr.<REGION>.amazonaws.com/sagemaker-spark-custom-container:latest -f Dockerfile .
- ACCOUNT_ID: AWS Account ID
- REGION: AWS Region
- REPOSITORY_NAME: ECR Repository name
- IMAGE_TAG: Tag associated to the image
aws ecr get-login-password --region <REGION> | docker login --username AWS --password-stdin <ACCOUNT_ID>.dkr.ecr.<REGION>.amazonaws.com/${REPOSITORY_NAME}:${IMAGE_TAG} | docker push <ACCOUNT_ID>.dkr.ecr.<REGION>.amazonaws.com/${REPOSITORY_NAME}:${IMAGE_TAG}
docker push <ACCOUNT_ID>.dkr.ecr.<REGION>.amazonaws.com/${REPOSITORY_NAME}:${IMAGE_TAG}
- ACCOUNT_ID: AWS Account ID
- REGION: AWS Region
- REPOSITORY_NAME: ECR Repository name
- IMAGE_TAG: Tag associated to the image
Example:
aws ecr get-login-password --region <REGION> | docker login --username AWS --password-stdin <ACCOUNT_ID>.dkr.ecr.<REGION>.amazonaws.com/sagemaker-spark-custom-container:latest
docker push <ACCOUNT_ID>.dkr.ecr.<REGION>.amazonaws.com/sagemaker-spark-custom-container:latest