ci: Enable AWS runners#1388
Conversation
Signed-off-by: Charlie Truong <chtruong@nvidia.com>
There was a problem hiding this comment.
Additional Comments (1)
-
.github/actions/test-template/action.yml, line 90 (link)logic: references undefined step output
steps.azure-fileshare.outputs.mnt_path- theazure-filesharestep was removed in this PR, so this variable will be empty and may cause issues
3 files reviewed, 2 comments
| @@ -427,9 +409,9 @@ jobs: | |||
| all-install-import-test: | |||
| needs: [pre-flight] | |||
There was a problem hiding this comment.
logic: missing cuda-pre-flight in needs array - line 414 references needs.cuda-pre-flight.outputs.cuda_base_image but the job doesn't declare it as a dependency
| needs: [pre-flight] | |
| needs: [pre-flight, cuda-pre-flight] |
Signed-off-by: Charlie Truong <chtruong@nvidia.com>
There was a problem hiding this comment.
Additional Comments (1)
-
.github/actions/test-template/action.yml, line 100 (link)logic: references deleted step output
steps.azure-fileshare.outputs.mnt_path- this step was removed but the variable assignment remains
3 files reviewed, 1 comment
Signed-off-by: Charlie Truong <chtruong@nvidia.com>
There was a problem hiding this comment.
Additional Comments (1)
-
.github/actions/test-template/action.yml, line 100 (link)style: Dead code - references removed
azure-filesharestep andMNT_PATHis never used
3 files reviewed, 1 comment
Signed-off-by: Charlie Truong <chtruong@nvidia.com>
Signed-off-by: Charlie Truong <chtruong@nvidia.com>
There was a problem hiding this comment.
Additional Comments (1)
-
.github/actions/test-template/action.yml, line 100 (link)style: unused variable assignment referencing removed step
3 files reviewed, 1 comment
Signed-off-by: Charlie Truong <chtruong@nvidia.com>
There was a problem hiding this comment.
Additional Comments (1)
-
.github/actions/test-template/action.yml, line 100 (link)style: unused variable references removed step
MNT_PATHreferencessteps.azure-fileshare.outputs.mnt_pathbut theazure-filesharestep was removed. This line should be deleted.
3 files reviewed, 1 comment
Signed-off-by: Charlie Truong <chtruong@nvidia.com>
Signed-off-by: Charlie Truong <chtruong@nvidia.com>
| - name: Install uuidgen | ||
| shell: bash -x -e -u -o pipefail {0} | ||
| if: ${{ contains(inputs.runner, 'aws') }} | ||
| run: | | ||
| sudo apt update | ||
| sudo apt install -y cifs-utils | ||
|
|
||
| RESOURCE_GROUP_NAME="azure-gpu-vm-runner_group" | ||
| STORAGE_ACCOUNT_NAME="nemocistorageaccount2" | ||
| FILE_SHARE_NAME="fileshare" | ||
|
|
||
| MNT_ROOT="/media" | ||
| MNT_PATH="$MNT_ROOT/$STORAGE_ACCOUNT_NAME/$FILE_SHARE_NAME" | ||
|
|
||
| echo "MNT_PATH=$MNT_PATH" | tee -a "$GITHUB_OUTPUT" | ||
|
|
||
| sudo mkdir -p $MNT_PATH | ||
|
|
||
| # Create a folder to store the credentials for this storage account and | ||
| # any other that you might set up. | ||
| CREDENTIAL_ROOT="/etc/smbcredentials" | ||
| sudo mkdir -p "/etc/smbcredentials" | ||
|
|
||
| # Get the storage account key for the indicated storage account. | ||
| # You must be logged in with az login and your user identity must have | ||
| # permissions to list the storage account keys for this command to work. | ||
| STORAGE_ACCOUNT_KEY=$(az storage account keys list \ | ||
| --resource-group $RESOURCE_GROUP_NAME \ | ||
| --account-name $STORAGE_ACCOUNT_NAME \ | ||
| --query "[0].value" --output tsv | tr -d '"') | ||
|
|
||
| # Create the credential file for this individual storage account | ||
| SMB_CREDENTIAL_FILE="$CREDENTIAL_ROOT/$STORAGE_ACCOUNT_NAME.cred" | ||
| if [ ! -f $SMB_CREDENTIAL_FILE ]; then | ||
| echo "username=$STORAGE_ACCOUNT_NAME" | sudo tee $SMB_CREDENTIAL_FILE > /dev/null | ||
| echo "password=$STORAGE_ACCOUNT_KEY" | sudo tee -a $SMB_CREDENTIAL_FILE > /dev/null | ||
| else | ||
| echo "The credential file $SMB_CREDENTIAL_FILE already exists, and was not modified." | ||
| fi | ||
|
|
||
| # Change permissions on the credential file so only root can read or modify the password file. | ||
| sudo chmod 600 $SMB_CREDENTIAL_FILE | ||
|
|
||
| # This command assumes you have logged in with az login | ||
| HTTP_ENDPOINT=$(az storage account show --resource-group $RESOURCE_GROUP_NAME --name $STORAGE_ACCOUNT_NAME --query "primaryEndpoints.file" --output tsv | tr -d '"') | ||
| SMB_PATH=$(echo $HTTP_ENDPOINT | cut -c7-${#HTTP_ENDPOINT})$FILE_SHARE_NAME | ||
|
|
||
| STORAGE_ACCOUNT_KEY=$(az storage account keys list --resource-group $RESOURCE_GROUP_NAME --account-name $STORAGE_ACCOUNT_NAME --query "[0].value" --output tsv | tr -d '"') | ||
|
|
||
| sudo mount -t cifs $SMB_PATH $MNT_PATH -o credentials=$SMB_CREDENTIAL_FILE,serverino,nosharesock,actimeo=30,mfsymlinks | ||
|
|
||
| ls -al $MNT_PATH/TestData | ||
| apt-get update | ||
| apt-get install -y uuid-runtime |
There was a problem hiding this comment.
running apt-get without sudo while shell doesn't specify root privileges - this may fail on non-root runners
| - name: Install uuidgen | |
| shell: bash -x -e -u -o pipefail {0} | |
| if: ${{ contains(inputs.runner, 'aws') }} | |
| run: | | |
| sudo apt update | |
| sudo apt install -y cifs-utils | |
| RESOURCE_GROUP_NAME="azure-gpu-vm-runner_group" | |
| STORAGE_ACCOUNT_NAME="nemocistorageaccount2" | |
| FILE_SHARE_NAME="fileshare" | |
| MNT_ROOT="/media" | |
| MNT_PATH="$MNT_ROOT/$STORAGE_ACCOUNT_NAME/$FILE_SHARE_NAME" | |
| echo "MNT_PATH=$MNT_PATH" | tee -a "$GITHUB_OUTPUT" | |
| sudo mkdir -p $MNT_PATH | |
| # Create a folder to store the credentials for this storage account and | |
| # any other that you might set up. | |
| CREDENTIAL_ROOT="/etc/smbcredentials" | |
| sudo mkdir -p "/etc/smbcredentials" | |
| # Get the storage account key for the indicated storage account. | |
| # You must be logged in with az login and your user identity must have | |
| # permissions to list the storage account keys for this command to work. | |
| STORAGE_ACCOUNT_KEY=$(az storage account keys list \ | |
| --resource-group $RESOURCE_GROUP_NAME \ | |
| --account-name $STORAGE_ACCOUNT_NAME \ | |
| --query "[0].value" --output tsv | tr -d '"') | |
| # Create the credential file for this individual storage account | |
| SMB_CREDENTIAL_FILE="$CREDENTIAL_ROOT/$STORAGE_ACCOUNT_NAME.cred" | |
| if [ ! -f $SMB_CREDENTIAL_FILE ]; then | |
| echo "username=$STORAGE_ACCOUNT_NAME" | sudo tee $SMB_CREDENTIAL_FILE > /dev/null | |
| echo "password=$STORAGE_ACCOUNT_KEY" | sudo tee -a $SMB_CREDENTIAL_FILE > /dev/null | |
| else | |
| echo "The credential file $SMB_CREDENTIAL_FILE already exists, and was not modified." | |
| fi | |
| # Change permissions on the credential file so only root can read or modify the password file. | |
| sudo chmod 600 $SMB_CREDENTIAL_FILE | |
| # This command assumes you have logged in with az login | |
| HTTP_ENDPOINT=$(az storage account show --resource-group $RESOURCE_GROUP_NAME --name $STORAGE_ACCOUNT_NAME --query "primaryEndpoints.file" --output tsv | tr -d '"') | |
| SMB_PATH=$(echo $HTTP_ENDPOINT | cut -c7-${#HTTP_ENDPOINT})$FILE_SHARE_NAME | |
| STORAGE_ACCOUNT_KEY=$(az storage account keys list --resource-group $RESOURCE_GROUP_NAME --account-name $STORAGE_ACCOUNT_NAME --query "[0].value" --output tsv | tr -d '"') | |
| sudo mount -t cifs $SMB_PATH $MNT_PATH -o credentials=$SMB_CREDENTIAL_FILE,serverino,nosharesock,actimeo=30,mfsymlinks | |
| ls -al $MNT_PATH/TestData | |
| apt-get update | |
| apt-get install -y uuid-runtime | |
| - name: Install uuidgen | |
| shell: bash -x -e -u -o pipefail {0} | |
| if: ${{ contains(inputs.runner, 'aws') }} | |
| run: | | |
| sudo apt-get update | |
| sudo apt-get install -y uuid-runtime |
| cat > ${{ github.workspace }}/.run_install.sh << 'SCRIPT' | ||
| set -e | ||
| apt-get update && apt-get install -y --no-install-recommends git curl | ||
| curl -LsSf https://astral.sh/uv/install.sh | sh |
There was a problem hiding this comment.
piping remote script directly to shell (curl | sh) creates security risk - if connection is interrupted or URL is compromised, arbitrary code could execute
Additional Comments (1)
|
Signed-off-by: Charlie Truong <chtruong@nvidia.com>
| apt-get update | ||
| apt-get install -y uuid-runtime |
There was a problem hiding this comment.
apt-get commands run without sudo but shell doesn't specify root context - may fail on runners without root privileges
| uv run python -c "import sys; print('Python:', sys.version);" | ||
| uv run python -c "from nemo_curator import package_info; print(f'nemo_curator v{package_info.__version__}')" | ||
| uv run python -c "import cudf, cuml; print('cudf, cuml');" | ||
| uv run python -c "import torch; print(f'torch {torch.__version__} (CUDA: {assert not torch.cuda.is_available()})');" |
There was a problem hiding this comment.
invalid Python syntax - assert is a statement, not an expression; this will cause a SyntaxError
| uv run python -c "import torch; print(f'torch {torch.__version__} (CUDA: {assert not torch.cuda.is_available()})');" | |
| uv run python -c "import torch; print(f'torch {torch.__version__} (CUDA: {torch.cuda.is_available()})');" |
| cat > ${{ github.workspace }}/.run_install.sh << 'SCRIPT' | ||
| set -e | ||
| apt-get update && apt-get install -y --no-install-recommends git curl | ||
| curl -LsSf https://astral.sh/uv/install.sh | sh |
There was a problem hiding this comment.
piping remote script to shell (curl | sh) without verification creates security risk - if connection is interrupted or compromised, arbitrary code could execute
Additional Comments (1)
|
This reverts commit 09fdc5f. Signed-off-by: Charlie Truong <chtruong@nvidia.com>
Description
ci: Enable AWS runners
The install test had to be written a little differently because running the GHA job directly in the container does not work yet on our self-hosted runners. So I had to rewrite to manually start the container and install from there.
Usage
# Add snippet demonstrating usageChecklist