Hi Team,
We have data.all Dev, Test and Prod - 3 environments, where Dev and Test has a common code repository vs. prod has a separate repo. I agree that downgrade is not recommended but we had to do it in Dev env. because of following reason:
We started upgrading dev from 1.6.2 to v2.3 but during upgrade Test environment started throwing issue due to ECR image retention limit of 200, it stopped showing Admin tools.
To fix this 1.6 issue in Test environment, we had no other option other than downgrading dev repo code from 2.3 to 1.6, we did that but now we started facing strange issues in Dev environment, it seems code is downgraded but AWS infrastructure is not (cdk version/python version/Lambda layer etc.)
Issues: All Environments stack update are in UPDATE_ROLLBACK_COMPLETE state from UPDATE_COMPLETE state.
Error from CloudWatch logs:- TypeError: unsupported operand type(s) for |: 'type' and 'NoneType'
Root cause identified:- Lambda layer (dataallGlueProflingJobDeploymentAwsCliLayer2ABEAF10) attached to lambda (dataall-environment-CustomCDKBucketDeployment) is still pointing to Python 3.10+ but code is expecting it to be on Python 3.9
** Lambda layer contains an AWS CLI version that has a broken module is missing or incompatible with Python 3.9
Fix attempted: Tried creating a new 3.9 compatible Lambda layer manually
Current Blockers: Environment stack update is still failing with errors:
Blocker No. 1: S3 copy from cdk assets bucket to Environment specific bucket is failing
The resource dataallGlueProflingJobDeploymentCustomResourceC9CF42F0 is in a CREATE_FAILED state
This Custom::CDKBucketDeployment resource is in a CREATE_FAILED state.
Received response status [FAILED] from custom resource. Message returned: Command '['/opt/awscli/aws', 's3', 'cp', 's3://cdk-hnb659fds-assets-985757210942-eu-west-1/bda8704101aab5f506bbd80c208427b6d7beb2f685b712197d0b181c1c521e41.zip', '/tmp/tmpsgm_eu1j/2a4eaa29-61ad-4185-8771-426818ad919f']' returned non-zero exit status 1. (RequestId: 909e7e1a-a77b-43e5-8f1b-a926ef47353f)
Blocker No. 2: Behavior difference between creating a new environment stack vs. existing environment stack
Using manually created 3.9 compatible Lambda layer works for new environment stack but it doesn't works for existing environment stack update - "CF stack still trying to search old CLI Layer version"
The resource CustomCDKBucketDeployment8693BB64968944B69AAFB0CC9EB8756C81C01536 is in a UPDATE_FAILED state
This AWS::Lambda::Function resource is in a UPDATE_FAILED state.
Resource handler returned message: "Layer version arn:aws:lambda:eu-west-1:582667241002:layer:dataallGlueProflingJobDeploymentAwsCliLayer2ABEAF10:38 does not exist. (Service: Lambda, Status Code: 400, Request ID: a1bbac76-2719-4028-9177-b5e4ec449149) (SDK Attempt Count: 1)" (RequestToken: 8c7f35dd-772e-d2cc-658c-50f2d1b0fe83, HandlerErrorCode: InvalidRequest)
Blocker No. 3 CodePipeline stage "dataall-dev-backend-stage" failure
The resource S3ResourcesNestedStackS3ResourcesNestedStackResourceEF3D2964 is in a UPDATE_FAILED state
This AWS::CloudFormation::Stack resource is in a UPDATE_FAILED state.
Embedded stack arn:aws:cloudformation:eu-west-1:244469940082:stack/dataall-dev-backend-stage-backend-stack-S3ResourcesNestedStackS3ResourcesNestedStackRe-1BIDTIXL3YIAE/d6b284a0-fc0e-11ef-a4db-029b476e7481 was not successfully updated. Currently in UPDATE_ROLLBACK_IN_PROGRESS with reason: The following resource(s) failed to create: [PivotRoleDeploymentdevCustomResource90F356DC, CDKExecutionPolicyDeploymentdevCustomResourceA5234CC9].
Fix attempted: Commented code for these 2 Nested stack to bypass error
Hi Team,
We have data.all Dev, Test and Prod - 3 environments, where Dev and Test has a common code repository vs. prod has a separate repo. I agree that downgrade is not recommended but we had to do it in Dev env. because of following reason:
We started upgrading dev from 1.6.2 to v2.3 but during upgrade Test environment started throwing issue due to ECR image retention limit of 200, it stopped showing Admin tools.
To fix this 1.6 issue in Test environment, we had no other option other than downgrading dev repo code from 2.3 to 1.6, we did that but now we started facing strange issues in Dev environment, it seems code is downgraded but AWS infrastructure is not (cdk version/python version/Lambda layer etc.)
Issues: All Environments stack update are in UPDATE_ROLLBACK_COMPLETE state from UPDATE_COMPLETE state.
Error from CloudWatch logs:- TypeError: unsupported operand type(s) for |: 'type' and 'NoneType'
Root cause identified:- Lambda layer (dataallGlueProflingJobDeploymentAwsCliLayer2ABEAF10) attached to lambda (dataall-environment-CustomCDKBucketDeployment) is still pointing to Python 3.10+ but code is expecting it to be on Python 3.9
** Lambda layer contains an AWS CLI version that has a broken module is missing or incompatible with Python 3.9
Fix attempted: Tried creating a new 3.9 compatible Lambda layer manually
Current Blockers: Environment stack update is still failing with errors:
Blocker No. 1: S3 copy from cdk assets bucket to Environment specific bucket is failing
The resource dataallGlueProflingJobDeploymentCustomResourceC9CF42F0 is in a CREATE_FAILED state
This Custom::CDKBucketDeployment resource is in a CREATE_FAILED state.
Received response status [FAILED] from custom resource. Message returned: Command '['/opt/awscli/aws', 's3', 'cp', 's3://cdk-hnb659fds-assets-985757210942-eu-west-1/bda8704101aab5f506bbd80c208427b6d7beb2f685b712197d0b181c1c521e41.zip', '/tmp/tmpsgm_eu1j/2a4eaa29-61ad-4185-8771-426818ad919f']' returned non-zero exit status 1. (RequestId: 909e7e1a-a77b-43e5-8f1b-a926ef47353f)
Blocker No. 2: Behavior difference between creating a new environment stack vs. existing environment stack
Using manually created 3.9 compatible Lambda layer works for new environment stack but it doesn't works for existing environment stack update - "CF stack still trying to search old CLI Layer version"
The resource CustomCDKBucketDeployment8693BB64968944B69AAFB0CC9EB8756C81C01536 is in a UPDATE_FAILED state
This AWS::Lambda::Function resource is in a UPDATE_FAILED state.
Resource handler returned message: "Layer version arn:aws:lambda:eu-west-1:582667241002:layer:dataallGlueProflingJobDeploymentAwsCliLayer2ABEAF10:38 does not exist. (Service: Lambda, Status Code: 400, Request ID: a1bbac76-2719-4028-9177-b5e4ec449149) (SDK Attempt Count: 1)" (RequestToken: 8c7f35dd-772e-d2cc-658c-50f2d1b0fe83, HandlerErrorCode: InvalidRequest)
Blocker No. 3 CodePipeline stage "dataall-dev-backend-stage" failure
The resource S3ResourcesNestedStackS3ResourcesNestedStackResourceEF3D2964 is in a UPDATE_FAILED state
This AWS::CloudFormation::Stack resource is in a UPDATE_FAILED state.
Embedded stack arn:aws:cloudformation:eu-west-1:244469940082:stack/dataall-dev-backend-stage-backend-stack-S3ResourcesNestedStackS3ResourcesNestedStackRe-1BIDTIXL3YIAE/d6b284a0-fc0e-11ef-a4db-029b476e7481 was not successfully updated. Currently in UPDATE_ROLLBACK_IN_PROGRESS with reason: The following resource(s) failed to create: [PivotRoleDeploymentdevCustomResource90F356DC, CDKExecutionPolicyDeploymentdevCustomResourceA5234CC9].
Fix attempted: Commented code for these 2 Nested stack to bypass error