Skip to content

feat: add option to enable Redshift OIDs#1291

Open
niger-prequel wants to merge 6 commits intolib:masterfrom
prequel-co:master
Open

feat: add option to enable Redshift OIDs#1291
niger-prequel wants to merge 6 commits intolib:masterfrom
prequel-co:master

Conversation

@niger-prequel
Copy link

Add option for Redshift-specific OIDs mapping

Description

This PR introduces support for Redshift-specific OID mapping within the lib/pq driver, enabled via a new redshift_oids connection string parameter.

Currently, there is no dedicated, robust, and database/sql-compatible driver for Amazon Redshift in the Go ecosystem. Because Redshift is heavily based on a fork of PostgreSQL 8.0.2, many developers rely on lib/pq to connect to Redshift databases. While this works well for standard system types, Redshift introduces several custom data types (such as _spectrum_array, _spectrum_map, etc.) with Object Identifiers (OIDs) not found in Postgres.

When these types are queried through Rows.ColumnTypes(), they return an empty rows.DatabaseTypeName() string since lib/pq does not natively recognize them. By giving developers a way to map these Redshift-specific OIDs to their actual type names, we can drastically improve the experience for developers using lib/pq as a Redshift driver.

Changes

  • Introduced a redshift_oids parameter that can be toggled via the standard DSN configuration.
  • Added a redshiftTypeName dictionary mapping Redshift's unique OIDs to their human-readable type names.
  • Updated rows.ColumnTypeDatabaseTypeName() to fall back to the custom Redshift OID mapping if the column's type string evaluates to empty (unrecognized by standard Postgres OIDs) and the redshift_oids feature is explicitly enabled.

Safe by Default

To guarantee that this feature introduces zero regressions for existing PostgreSQL users, this OID mapping sits strictly behind an explicit config boolean (redshift_oids). Unless explicitly enabled in the connection string (redshift_oids=1), the driver operates exactly as it did before.

Furthermore, because Redshift inherited its system types directly from PostgreSQL 8.0.2, there are no overlapping OIDs assigned to radically different types. As an extra layer of caution, the Redshift mapping map acts purely as a fallback mechanism; it takes effect only if standard Postgres natively evaluates the system type's name to "".

@arp242
Copy link
Collaborator

arp242 commented Mar 18, 2026

Isn't that redshiftTypeName map identical to what's in the oid package? Or a subset of it? And isn't Redshift supposed to be compatible with PostgreSQL (with some limitations)? Why does it need to map this (especially since they seem identical to what's in oid?)

It's not really clear to me what exactly this fixes; did you get an error? Some unexpected behaviour? Something else? Having a small test case and the output of it being run with PQGO_DEBUG=1 would be helpful.

@arp242
Copy link
Collaborator

arp242 commented Mar 18, 2026

Looking at this code a bit more, your new code maps e.g. 16 to BOOL, which is exactly the same as what the code currently does (via oid.TypeName).

So it seems it's probably just some specific type that's missing? It could be oid package just needs updating: it was last updated for PostreSQL 9.4.

@niger-prequel
Copy link
Author

niger-prequel commented Mar 18, 2026

@arp242 its not identical. I updated the map to just be the OIDs that Redshift has that postgres doesn't. SUPER and VARBYTE are we see it the most.

Having a small test case and the output of it being run with PQGO_DEBUG=1 would be helpful.

The hard part is you can't run Redshift locally. you need an instance in the cloud to repro what the OIDs will be. I got them by running the following in 2 of our different Redshift instances to confirm they are stable across instances.

SELECT oid, typname 
FROM pg_type WHERE oid < 7000 order by oid desc;

@arp242
Copy link
Collaborator

arp242 commented Mar 18, 2026

Okay, so Redshift has a few extra types.

None of these oids are currently used, I think we can just add that to oid/types.go? If there is a conflict in the future then we can cross that bridge when we get there, which we may never do – oids are not added very often: once in 2026 and 2025, zero in 2024 and 2023, 2022, two in 2021, etc.

Possibly we could only lookup these oids based on the version string Redshift sends, or some other signal that identifies it as such (if any). I would like to avoid adding connection parameters for this sort of thing.

@arp242
Copy link
Collaborator

arp242 commented Mar 18, 2026

Updated oid package with: #1292

I think all that's needed with that is:

diff --git i/oid/gen.go w/oid/gen.go
index 57fbbdd..a920042 100644
--- i/oid/gen.go
+++ w/oid/gen.go
@@ -36,6 +36,28 @@ var extra = []OID{
        {1024, "_reltime", ""},
        {1025, "_tinterval", ""},
        {2282, "opaque", ""},
+
+       {0, "", "Amazon Redshift (March 2026)"},
+       {86: "pg_shadow", ""},
+       {87: "pg_group", ""},
+       {88: "pg_database", ""},
+       {90: "pg_tablespace", ""},
+       {635: "_spectrum_array", ""},
+       {636: "_spectrum_map", ""},
+       {637: "_spectrum_struct", ""},
+       {1188: "intervaly2m", ""},
+       {1189: "_intervaly2m", ""},
+       {1190: "intervald2s", ""},
+       {1191: "_intervald2s", ""},
+       {2935: "hllsketch", ""},
+       {3000: "geometry", ""},
+       {3001: "geography", ""},
+       {4000: "super", ""},
+       {4600: "useritem", ""},
+       {4601: "_useritem", ""},
+       {4602: "roleitem", ""},
+       {4603: "_roleitem", ""},
+       {6551: "varbyte", ""},
 }

 func main() {

And that should be enough?

@arp242
Copy link
Collaborator

arp242 commented Mar 18, 2026

Actually, never mind – there are a few duplicates. I should have actually tried it before posting 😅

So it does need a separate map, which is okay. But I'd really like to avoid adding a connection string. It already gets the PostgreSQL version in conn.processParameterStatus(); what is that for Redshift?

You can see all the startup parameters with something like:

% PQGO_DEBUG=1 go test -run 'TestSimpleQuery$' |& grep ParamStatus
SERVER ← (S) ParamStatus         19  "in_hot_standby\x00off\x00"
SERVER ← (S) ParamStatus         21  "integer_datetimes\x00on\x00"
SERVER ← (S) ParamStatus         17  "TimeZone\x00Etc/UTC\x00"
SERVER ← (S) ParamStatus         23  "IntervalStyle\x00postgres\x00"
SERVER ← (S) ParamStatus         28  "search_path\x00\"$user\", public\x00"
SERVER ← (S) ParamStatus         16  "is_superuser\x00on\x00"
SERVER ← (S) ParamStatus         18  "application_name\x00\x00"
SERVER ← (S) ParamStatus         34  "default_transaction_read_only\x00off\x00"
SERVER ← (S) ParamStatus         22  "scram_iterations\x004096\x00"
SERVER ← (S) ParamStatus         19  "DateStyle\x00ISO, MDY\x00"
SERVER ← (S) ParamStatus         31  "standard_conforming_strings\x00on\x00"
SERVER ← (S) ParamStatus         27  "session_authorization\x00pqgo\x00"
SERVER ← (S) ParamStatus         21  "client_encoding\x00UTF8\x00"
SERVER ← (S) ParamStatus         45  "server_version\x0018.3 (Debian 18.3-1.pgdg13+1)\x00"
SERVER ← (S) ParamStatus         21  "server_encoding\x00UTF8\x00"

Or anything else that connects; PQGO_DEBUG=1 should work from applications too.

@niger-prequel
Copy link
Author

@arp242 happy to make it dynamic. I used PQGO_DEBUG to identify that redshift sends a version related to ParAccel Database, its original name before amazon acquired it. let me know if that looks better. Thanks again for taking the time to look at this 🙏

@arp242
Copy link
Collaborator

arp242 commented Mar 18, 2026

So I haven't looked at any of this code at all since I started maintaining this. Looking at this a bit more, I think this entire strategy of generating a list might be incorrect. What it should do instead if query pg_type dynamically.

For example custom types will fail too:

func TestXXX(t *testing.T) {
	db := pqtest.MustDB(t)

	pqtest.Exec(t, db, `create type bug_status as enum ('new', 'open', 'closed')`)
	pqtest.Exec(t, db, `create temp table bugs (id serial, status bug_status)`)
	pqtest.Exec(t, db, `insert into bugs (status) values ($1), ($2)`, "new", "open")

	rows, err := db.Query(`select * from bugs`)
	if err != nil {
		t.Fatal(err)
	}
	defer rows.Close()

	coltypes, err := rows.ColumnTypes()
	if err != nil {
		t.Fatal(err)
	}

	for _, c := range coltypes {
		fmt.Printf("%s (%s)\t", c.Name(), c.DatabaseTypeName())
	}
	fmt.Println()

	for rows.Next() {
		var (
			id     int
			status string
		)
		err := rows.Scan(&id, &status)
		if err != nil {
			t.Fatal(err)
		}
		fmt.Printf("%v\t%v\n", id, status)
	}
}

Outputs:

id (INT4, int32)    status (, interface {})
1                   new
2                   open

No type for the status column, as that's a custom enum type.

If we query pg_type, then it should always be correct: for any PostgreSQL version, for any Redshift, cockroachdb, etc. etc.


Need to think about it; I want to hold off merging new things for a week or two anyway, because I did a release today and would rather not merge anything so it's easier to do a new bugfix release for regressions (if needed). Also have some other projects I want to work on.

@niger-prequel
Copy link
Author

@arp242 appreciate you taking time to think on this. Some things to consider about your proposed rewrite:

  1. its not a guarantee that every wire compatible DB will give the user pg_type permissions by default. For example I believe in vanilla Postgres you automatically get access to the information_schema whereas in Redshift that has to be granted. But Redshift has separate tables like SVV_COLUMNS that contain the same information but are automatically available.
  2. Users can add custom types, so without a refresh mechanism the list can become stale. Forcing the client to know to refresh their client seems like a footgun, as there isn't an easy way to indicate to the client it needs to be done. Checking on every query is probably too expensive. Introducing a refresh mechanism makes implementations with lib/pq less portable. There are cases that could make this a bit of a rabbit hole like creating a type in a transaction and selecting rows.
  3. in theory the max number of OIDs is int32 max. So 2^32 k:v pairs, 4 byte key and a max 64 byte key name means the worst case is nearly 300GB. Its unlikely anyone actually has a table of that size, but there is a chance for surprise OOMs or timeouts from pulling it from the DB.

@arp242
Copy link
Collaborator

arp242 commented Mar 19, 2026

It should load and cache on use, rather than loading all of them up-front. I think that's also what e.g. libpq or pgx do. It already does the same with with timezone names.

I think what I'll do is close my "update to PostgreSQL 18.3" PR, merge this (in 2-3 weeks as mentioned) as it solves a practical problem, and create an issue for improving this in the future. AFAIK no one else has reported any problems with this, there's more pressing matters to work on, and this PR won't create any problems for further improvements down the line.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants