-
Notifications
You must be signed in to change notification settings - Fork 86
PLUGIN-1413 Added argument for BQ temporary staging bucket names #1152
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
|
||
| private static final Map<Schema.Type, Set<LegacySQLTypeName>> TYPE_MAP = ImmutableMap.<Schema.Type, | ||
| Set<LegacySQLTypeName>>builder() | ||
| Set<LegacySQLTypeName>>builder() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this change isn't needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reverted
| if (Integer.parseInt(chunkSize) % MediaHttpUploader.MINIMUM_CHUNK_SIZE != 0) { | ||
| collector.addFailure( | ||
| String.format("Value must be a multiple of %s.", MediaHttpUploader.MINIMUM_CHUNK_SIZE), null) | ||
| String.format("Value must be a multiple of %s.", MediaHttpUploader.MINIMUM_CHUNK_SIZE), null) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indentation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reverted this change
| private static final Logger LOG = LoggerFactory.getLogger(BigQueryUtil.class); | ||
|
|
||
| private static final String DEFAULT_PARTITION_COLUMN_NAME = "_PARTITIONTIME"; | ||
| private static final String BIGQUERY_BUCKET_PREFIX_PROPERTY_NAME = "io.cdap.plugin.bigquery.bucket.prefix"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we do a similar thing for cmek, where the argument key is 'gcp.cmek.key.name' (see CmekUtils). Let's follow a similar pattern and name it something like 'gcp.bigquery.bucket.prefix'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated
| * We use this to ensure location name length is constant (only 8 characters). | ||
| * | ||
| * @param location location to checksum | ||
| * @return checksum value as@ an 8 character string (hex). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: as@
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
| hashValues.add(hash); | ||
| } | ||
|
|
||
| System.out.println(hashValues); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't need this
| if (bucketName == null && bucketPrefix != null) { | ||
| // Check if the destination dataset exists. | ||
| DatasetId datasetId = DatasetId.of(config.getDatasetProject(), config.getDataset()); | ||
| Dataset dataset = bigQuery.getDataset(datasetId); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like we should already be making this call as part of the existing validation/automatic bucket creation. If so, we should do some refactoring to avoid duplicate calls. It's become kind of a mess now, but ideally we do all the I/O in a single place and pass the return objects around.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see an easy way to refactor this without making 2 calls to get the dataset.
The problem is the bucket name is needed even when in preview. This makes it difficult to refactor the code in a way that doesn't change the entire method signature for the abstract bigquery sink and other callers (such as the BigQuery Pushdown implementation).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have refactored the code a bit, hope this helps
No description provided.