-
Notifications
You must be signed in to change notification settings - Fork 173
feat(csharp/src/Drivers/Databricks): Add CI for E2E Databricks tests #3115
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat(csharp/src/Drivers/Databricks): Add CI for E2E Databricks tests #3115
Conversation
|
We don't have access to things like that. We could ask INFRA, but I'm not sure if they would add non-committers to the approval list or not. |
502ec81 to
c64ea70
Compare
e43ea36 to
6785dc1
Compare
1ee8037 to
5901ee0
Compare
fc78074 to
a66afa3
Compare
9a0b830 to
01997af
Compare
| environments: | ||
| databricks-e2e: | ||
| wait_timer: 0 | ||
| required_reviewers: | ||
| - alexguo-db | ||
| - jadewang-db | ||
| deployment_branch_policy: | ||
| protected_branches: true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should probably add these to a collaborators key https://github.com/apache/infrastructure-asfyaml/blob/main/README.md#assigning-the-github-triage-role-to-external-collaborators
| -d "subject_token_type=urn:ietf:params:oauth:token-type:jwt" \ | ||
| -d "scope=sql") | ||
| DATABRICKS_TOKEN=$(echo "$OAUTH_RESPONSE" | jq -r '.access_token') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You might want to use ::add-mask:: in some areas here to ensure that the access tokens don't show up in the logs
|
This seems fine to me in general, but appears to be non-functional as the |
Yeah, apparently this is not populated on forks which is why I changed the target from |
zeroshade
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| with: | ||
| dotnet-version: ${{ matrix.dotnet }} | ||
| - name: Checkout ADBC | ||
| uses: actions/checkout@v4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| uses: actions/checkout@v4 | |
| uses: actions/checkout@v5 |
| ref: ${{ github.event.pull_request.head.sha || github.sha }} | ||
| fetch-depth: 0 | ||
| submodules: recursive | ||
| - name: Build | ||
| shell: bash | ||
| run: ci/scripts/csharp_build.sh $(pwd) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this mean that we may run ci/scripts/csharp_build.sh in forked repository with the pull_request_target context (that has write access to apache/arrow-adbc)?
I think that it's not acceptable based on the ASF GitHub Actions policy: https://infra.apache.org/github-actions-policy.html
Triggers
You MUST NOT use pull_request_target as a trigger on ANY action that exports ANY confidential credentials or tokens such as GITHUB_TOKEN or NPM_TOKEN.
Can we run this on fork not apache/arrow-adbc by removing branches: [main] from on.push?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I suppose the point is that the manual approval for the environment protects against this. But Infra may not have intended for environments to be used this way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe @zeroshade checked with ASF Infra about this use case. Can you confirm with ASF Infra that we can bypass the triggers policy if we have an environment with required reviewers? Otherwise, I don't see how other Apache repos can implement this click-approval pattern
Can we run this on fork not apache/arrow-adbc by removing branches: [main] from on.push?
@kou Sorry, I'm not following why excluding it from being run on the upstream repo would improve the security
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we run this job on apache/arrow-adbc, evil pull requests can get GITHUB_TOKEN that has id-token: write permission for apache/arrow-adbc. They may abuse it.
If we run this job on fork repository, evil developers can get only GITHUB_TOKEN for their fork repositories. It's not a problem because they already have permissions that these GITHUB_TOKEN have. They can't get additional permissions for apache/arrow-adbc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Within our own repo we found out that our GITHUB_TOKEN would only be valid for 5 minutes thus the exchanged Databricks token is also 5 mins long and any test case passing that limit will fail because of invalidation.
We will probably need to supply a Databricks PAT token instead, which will bypass the GITHUB_TOKEN entirely, would that address the concern here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kou That won't work for us, the Databricks service principal is set up to only allow OIDC token exchange when the Github OIDC token originates from the main repo.
If we run this job on fork repository, evil developers can get only GITHUB_TOKEN for their fork repositories.
If we allowed the service principal to accept any Github OIDC token then anybody can create a fork and run malicious queries against the Databricks workspace.
The alternative is to use a Databricks personal access token secret but that runs into the same problem where we can only store it only on the main repo (so only people with main branch permissions can copy the branch to main and run the E2E tests)
How is this click approval pattern implemented elsewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We will probably need to supply a Databricks PAT token instead, which will bypass the GITHUB_TOKEN entirely, would that address the concern here?
Can we avoid using pull_request_target by this? If so, it addresses the concern.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we allowed the service principal to accept any Github OIDC token then anybody can create a fork and run malicious queries against the Databricks workspace.
Can we accept only trusted fork repositories not any fork repositories in Databricks side?
How is this click approval pattern implemented elsewhere?
I haven't seen the implementation in apache/* repositories...
apache/airflow-publish may have an implementation of it...?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, can Databricks provide local test tool like MinIO for AWS S3, Azurite fhttps://github.com/Azure/Azurite or Azure Storage and Storage Testbench https://github.com/googleapis/storage-testbench for Google Cloud Storage?
.asf.yaml
Outdated
| required_reviewers: | ||
| - alexguo-db | ||
| - jadewang-db | ||
| - eric-wang-1990 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that this syntax is wrong:
| required_reviewers: | |
| - alexguo-db | |
| - jadewang-db | |
| - eric-wang-1990 | |
| required_reviewers: | |
| - id: alexguo-db | |
| type: User | |
| - id: jadewang-db | |
| type: User | |
| - id: eric-wang-1990 | |
| type: User |
| - krlmlr | ||
| - nbenn | ||
| - alexguo-db | ||
| - jadewang-db | ||
| - eric-wang-1990 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you keep this list in alphabetical order?
Before merging:
databricks-e2eenvironment, so that approval is required