Add batch DELETE/UPDATE samples for datasets exceeding 3k row limit by rmconstantin · Pull Request #698 · aws-samples/aurora-dsql-samples

rmconstantin · 2026-03-17T02:14:56Z

Demonstrates sequential and parallel batch processing patterns for Aurora DSQL with OCC retry logic and recommended connection management. Includes Python (psycopg2), Java (pgJDBC), and Node.js (node-postgres) implementations.
Fixes #693 .

By submitting this pull request, I confirm that my contribution is made under
the terms of the MIT-0 license.

Thank you for your contribution!

Benjscho · 2026-03-17T20:26:28Z

Can you add the pycache path to gitignore?

Benjscho · 2026-03-17T20:28:39Z

+        while (true) {
+            try (Connection conn = pool.getConnection()) {
+                conn.setAutoCommit(false);
+                String sql = "UPDATE " + table + " SET " + setClause + ", updated_at = NOW()"


How does this ensure progress over all of the source items?

Benjscho · 2026-03-17T20:29:10Z

+ *   gradle run --args="--endpoint &lt;cluster-endpoint&gt; [--user admin]
+ *              [--batch-size 1000] [--num-workers 4]"
+ */
+public class Main {


Could you add an integ test that runs these batch ops?

Benjscho · 2026-03-17T20:30:54Z

+);
+
+-- Create an asynchronous index on the category column.
+-- Aurora DSQL requires CREATE INDEX ASYNC for tables with existing rows.


For all tables, maybe delete this comment

Benjscho · 2026-03-17T20:32:07Z

@@ -0,0 +1,52 @@
+# Aurora DSQL Batch Operations


I think we might be better organizing these examples under the specific language/driver pairing instead of having it as a top level dir.

Can we also add integ tests for each example? There should be patterns for how to do that in each language

Benjscho · 2026-03-17T20:40:59Z

+     * @param connection a JDBC connection (autoCommit should be false)
+     * @param operation  the database operation to execute
+     * @param maxRetries maximum retry attempts (default 3)
+     * @param baseDelay  base delay in seconds for backoff (default 0.1)


Nit: can we make baseDelay milliseconds instead?

Benjscho · 2026-03-17T20:41:28Z

+ */
+public class Repopulate {
+
+    private static final String INSERT_SQL =


What's going on with the repopulate fn vs the batch setup script?

rmconstantin · 2026-03-18T17:31:31Z

Updated the code to address all comments.

Batch operations are now in standalone directories under each language (java/batch_operations/, javascript/batch_operations/, python/batch_operations/).
baseDelay is now base_delay_ms.
Initial table+index setup script comments updated.
Got rid of the Repopulate fn .
Integ tests added for each language.
Added an outer retry to make sure all rows are processed (keep batching until done, and if OCC conflicts persist on a single batch, get a fresh connection and try again).

Ready for another look.

Benjscho · 2026-03-20T20:22:03Z

What's this jar file for? Should we be shipping it?

The gradle-wrapper.jar is the Gradle Wrapper bootstrap JAR. It allows anyone to build the project without having Gradle pre-installed — the wrapper downloads the correct Gradle version automatically. Shipping it in version control is the recommended Gradle convention. The other Java projects in this repo (java/pgjdbc, java/spring_boot) follow the same pattern.

I also updated java/.gitignore to stop ignoring these wrapper files (previously they were gitignored but force-tracked, which was inconsistent).

Benjscho · 2026-03-20T20:34:51Z

Should this and gradelw be checked in or gitignored?

Yes, gradlew and gradlew.bat should be checked in. They are the Gradle Wrapper scripts (Unix and Windows respectively) that bootstrap the build — users run ./gradlew build instead of installing Gradle globally. This is the standard Gradle convention and matches java/pgjdbc and java/spring_boot in this repo.

I cleaned up java/.gitignore in the latest commit to remove the **/gradlew, **/gradlew.bat, and **/gradle/ patterns that were incorrectly ignoring these files.

Demonstrates sequential and parallel batch processing patterns for Aurora DSQL with OCC retry logic and hashtext() partitioning. Includes Python (psycopg2), Java (pgJDBC), and Node.js (node-postgres) implementations.

…dded tests.

- Add SELECT COUNT(*) post-check after each batch loop to verify all matching rows were processed (sequential and parallel, all 3 languages) - Update integration tests to seed data via psql -f batch_test_setup.sql - Add connect_timeout to Python pool creation for IPv6 fallback

The gradle wrapper (gradle-wrapper.jar, gradlew, gradlew.bat) should be committed to version control per Gradle convention. This allows anyone to build the project without pre-installing Gradle. Consistent with existing java/pgjdbc and java/spring_boot projects in this repo. Removed **/gradle/, **/gradlew, and **/gradlew.bat from .gitignore. The .gradle/ (build cache) pattern remains correctly ignored.

rmconstantin · 2026-05-14T19:24:41Z

Rebased on main — conflicts resolved.

CI has two failures:

deps-review — Transitive npm dependencies from Jest (e.g. color-name, co, ci-info) score below the repo's OpenSSF Scorecard threshold of 3. These are all standard, widely-used packages pulled in by Jest. Is there an allow-list or policy exception for test dependencies?
javascript-node-postgres / create-cluster — Expected failure since cluster creation requires maintainer AWS credentials.

Ready for another look when you get a chance.

amaksimo · 2026-05-27T22:27:34Z

+
+```bash
+pip install -r requirements.txt
+```


Missing requirements.txt.

This README instructs pip install -r requirements.txt, but the PR doesn't add one. The sample imports aurora_dsql_psycopg2 and psycopg2, but we cannot install from that dir.

Can you addrequirements.txt or switch topyproject.toml?

amaksimo · 2026-05-27T22:27:34Z

+disjoint subset, avoiding OCC conflicts between workers.
+"""
+
+import threading


Unused threading isn't referenced in this module (however theparallel* modules use it).

amaksimo · 2026-05-27T22:27:34Z

+        total_deleted = 0
+        consecutive_failures = 0
+        partition_condition = (
+            f"{condition} AND abs(hashtext(id::text)) % {num_workers} = {worker_id}"


condition gets inserted without parentheses before appending the partition filter:

... {condition} AND abs(hashtext(...)) % N = i

That becomes a problem if someone passes a condition with an OR, like:

category = 'food' OR status = 'expired'

Because AND has higher precedence than OR, it can cause multiple workers to process the same rows again, defeating the purpose of partitioning and bringing back OCC conflicts.

The current demo is fine since it only uses simple equality conditions, but this is easy for users to run into when adapting the sample. We can do this:

partition_condition = ( f"({condition}) AND abs(hashtext(id::text)) % {num_workers} = {worker_id}" )

Which is better, the same fix should also be applied in:

parallel_batch_update.py
BatchDelete.java
BatchUpdate.java
batchDelete.js
batchUpdate.js

amaksimo · 2026-05-27T22:27:34Z

+            if attempt >= max_retries:
+                raise MaxRetriesExceeded(max_retries)
+            delay_ms = base_delay_ms * (2 ** attempt)
+            logger.warning(


No jitter in backoff, delay_ms = base_delay_ms * (2 ** attempt) is just exp backoff, what about:

import random delay_ms = base_delay_ms * (2 ** attempt) * (0.5 + random.random())

the same applies to OccRetry.java and occRetry.js

amaksimo · 2026-05-27T22:27:34Z

+-- =============================================================================
+
+INSERT INTO batch_test (category, status, value)
+SELECT


Nit: this 1,000-row INSERT block is repeated 25x and the whole file is duplicated under java/batch_operations/ and javascript/batch_operations/ (3 times copied). Options?

Replace the 25 copy-pasted INSERTs with a psql \set loop, or call from the host language driver (committing each batch as a separate transaction to respect the 3k mut limit).

Keep one copy and reference it from each language's README?

amaksimo · 2026-05-27T22:27:34Z

+ *
+ * @param {import('pg').PoolClient} client - A node-postgres pool client.
+ * @param {(client: import('pg').PoolClient) => Promise<*>} operation - Async
+ *   function that performs database work. Should NOT commit.


JSDoc says the operation "Should NOT commit," but the operations passed in batchDelete.js and batchUpdate.js call BEGIN inside the operation (and the caller does COMMIT after executeWithRetry returns). It works — each retry begins a fresh tx after the rollback — but the responsibility split is inconsistent. Either move BEGIN into executeWithRetry, or update the doc to say "operation must call BEGIN; caller commits after success."

amaksimo · 2026-05-27T22:27:34Z

+public class OccRetry {
+
+    private static final Logger logger = Logger.getLogger(OccRetry.class.getName());
+    private static final String SERIALIZATION_FAILURE = "40001";


This class logs via java.util.logging.Logger, but BatchDelete, BatchUpdate, and Main use System.out.println everywhere. Pick one?

amaksimo · 2026-05-27T22:27:34Z

+
+    @Test
+    public void testBatchOperations() {
+        assertAll(() -> Main.main(new String[]{


Main.main calls System.exit(1) on any SQLException / OccRetry.MaxRetriesExceededException.

If the integration test hits a db error, JVM exits before JUnit can report it as a failure (exit code 1 with no test output). Take the main body into a helper that throws, and have both the CLI entry and the test call it?

Benjscho reviewed Mar 17, 2026

View reviewed changes

rmconstantin force-pushed the batch-operations branch from 8bc247c to a3525c4 Compare March 18, 2026 16:22

rmconstantin requested a review from Benjscho March 18, 2026 17:31

rmconstantin force-pushed the batch-operations branch from 63b7d16 to 94efc68 Compare March 19, 2026 23:44

Benjscho reviewed Mar 27, 2026

View reviewed changes

rmconstantin requested a review from Benjscho May 14, 2026 19:05

ralconst added 9 commits May 14, 2026 12:19

Add batch DELETE/UPDATE samples for datasets exceeding 3k row limit

3cfd8b5

Demonstrates sequential and parallel batch processing patterns for Aurora DSQL with OCC retry logic and hashtext() partitioning. Includes Python (psycopg2), Java (pgJDBC), and Node.js (node-postgres) implementations.

Added cleanup instructions to READMEs.

89cdc1e

switch Java to DSQL JDBC connector

adf6782

Use AuroraDSQLPool for Node.js, HikariCP for Java

c63cc83

Reorganized batch operations into batch_operations/ subdirectories. A…

22590c7

…dded tests.

Move batch operations to standalone directories under each language

39ec1fb

Add __pycache__ and *.pyc to .gitignore

1278d91

rmconstantin force-pushed the batch-operations branch from 9d687d7 to d67e3ff Compare May 14, 2026 19:20

rmconstantin requested a review from amaksimo May 22, 2026 04:56

amaksimo reviewed May 27, 2026

View reviewed changes

Conversation

rmconstantin commented Mar 17, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rmconstantin commented Mar 18, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rmconstantin commented May 14, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants