Skip to content

Conversation

@wtgee
Copy link
Member

@wtgee wtgee commented Dec 24, 2025

  • Add bulk COPY command for insert dataframe

For inserting 65k rows into cobra_target, this, offers a +10x speedup.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements bulk insert optimization using PostgreSQL's COPY command, providing significant performance improvements (claimed 10x speedup for 65k rows) over the previous multi-row INSERT approach.

Key Changes:

  • Added new psql_insert_copy function that uses PostgreSQL's COPY command for bulk data loading
  • Added use_copy parameter (default True) to insert_dataframe method to enable/disable COPY optimization
  • Added performance timing to track insert operations

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Member

@CraigLoomis CraigLoomis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll suggest making that WITH (FORMAT csv, HEADER MATCH) (and whatever the equivalent of df.to_csv(...., header=True) is to ward against the worst mistakes.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@wtgee wtgee force-pushed the tickets/INSTRM-2821 branch 2 times, most recently from c9f780b to d05bc09 Compare December 24, 2025 19:54
@wtgee
Copy link
Member Author

wtgee commented Dec 24, 2025

I'll suggest making that WITH (FORMAT csv, HEADER MATCH) (and whatever the equivalent of df.to_csv(...., header=True) is to ward against the worst mistakes.

Since this is coming from the to_sql command itself, it has the keys parameter that specifies the exact column, so it should be fine to rely on the specific HEADER FALSE and the ordering of the keys.

I've added some explicit parameters to the csv writer and some other dataframe scrubbing checks that shouldn't interfere with data. It's actually running even faster with these explicit parameters since I guess it doesn't have to do an initial pass or conversions.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 9 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@wtgee wtgee force-pushed the tickets/INSTRM-2821 branch from d05bc09 to c96de62 Compare December 24, 2025 20:12
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@wtgee wtgee force-pushed the tickets/INSTRM-2821 branch 2 times, most recently from 15b0653 to 13579a7 Compare December 24, 2025 20:18
* Add bulk COPY command for insert dataframe
* Scrub the dataframe before inserting.
@wtgee wtgee force-pushed the tickets/INSTRM-2821 branch from 13579a7 to 2025a74 Compare December 24, 2025 21:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants