duckdb and postgres connection management and thread safety in a multithreaded, long lived process #319
Unanswered
provinzkraut
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I'm trying to optimise DuckDB + Postgres connection handling in a long-lived, multithreaded application, but, after reading the docs, I'm not quite sure what the best way is to put all the moving parts together, so I'm looking for some advice.
Preface
I only ever want to perform reads, through DuckDB into postgres, e.g.
conn.sql("SELECT * FROM postgres_scan(...);").fetchall()Postgres side
The two methods to connect to postgres:
ATTACHpostgrespostgres_scanFrom what I can tell,
ATTACHing a postgres DB will establish a connection and keep it open, making it not a suitable method for this application, since it would not allow concurrent reads because they'd all share the same postgres connection.I'm assuming (couldn't find anything about this in the docs, and a cursory look into the source also didn't yield a result) that
postgres_scancreates an ad-hoc connection, so it feels like that's what I should use.DuckDB side
According to this, the connection is inherently not thread safe. It does mention
.cursor, but also says this about the cursor:This seems to imply that within the same
DuckDBPyRelation, effectively no multithreading is possible or safe.However, there's also this guide, which does show using threads and
.cursorfor concurrent reads / writes, leaving me a bit confused.Baring this guide, I would have concluded that what I actually want is some sort of connection pool, with one
DuckDBPyRelationper thread, but now I'm not sure what to think.I would appreciate any insights in how this is supposed to be handled, or corrections if I'm missing something really obvious :)
Beta Was this translation helpful? Give feedback.
All reactions