Skip to content

Truncate large names, with an opaleye tag to avoid collisions#400

Merged
TeofilC merged 2 commits intocircuithub:masterfrom
artificialio:cristovao/upstream-select-truncated
May 7, 2026
Merged

Truncate large names, with an opaleye tag to avoid collisions#400
TeofilC merged 2 commits intocircuithub:masterfrom
artificialio:cristovao/upstream-select-truncated

Conversation

@cristovaogferreira-artificial
Copy link
Copy Markdown
Contributor

See issue #366.

This PR adds a function selectTruncated that limits the alias names to a specific size (> 29 characters), using SHA1 to generate a unique (modulo SHA1 collisions and a column with a deliberately colliding name) suffix.

This allows us to make Postgres happy about column name size while at the same time keeping some debug information around.

@TeofilC
Copy link
Copy Markdown
Contributor

TeofilC commented May 5, 2026

Did you see this comment? #366 (comment) My impression is that this is already fixed, since we truncate and then add a unique tag

@cristovaogferreira-artificial
Copy link
Copy Markdown
Contributor Author

cristovaogferreira-artificial commented May 5, 2026

Did you see this comment? #366 (comment) My impression is that this is already fixed, since we truncate and then add a unique tag

I did see the comment, it doesn't fix another issue which is select aliases still being generated with too long names (we still see a bunch of warnings in our service, that's why I have been working on this). See the tests for details.

(edit: I realized the tests don't show the behaviour, because they're using a 42 chars identifier, but it's there. I'll add another test.)

@TeofilC
Copy link
Copy Markdown
Contributor

TeofilC commented May 5, 2026

Would it be possible to take the same approach as before here too then, ie, use a Opaleye.fresh? Rather than using SHA1?

@cristovaogferreira-artificial cristovaogferreira-artificial force-pushed the cristovao/upstream-select-truncated branch from c7c050c to d265e5a Compare May 5, 2026 16:06
@cristovaogferreira-artificial
Copy link
Copy Markdown
Contributor Author

Would it be possible to take the same approach as before here too then, ie, use a Opaleye.fresh? Rather than using SHA1?

Done

@TeofilC
Copy link
Copy Markdown
Contributor

TeofilC commented May 5, 2026

Could you use the same helper here:

suffix = Opaleye.tagWith tag (Opaleye.tagWith subtag "")
to avoid duplication.

Also is it necessary to make the size user configurable? I think using the same constant as before would be good, and then we wouldn't need to add extra stuff to the interface?

@cristovaogferreira-artificial
Copy link
Copy Markdown
Contributor Author

Could you use the same helper here:

suffix = Opaleye.tagWith tag (Opaleye.tagWith subtag "")

to avoid duplication.

Just to be sure: You mean extract the helper from that function and using that, right? Something like

statementReturning :: Table Expr a 
  => State Opaleye.Tag Doc -> Statement (Query a)
statementReturning pp = Statement $ do
  (binding, query) <- lift $ do
    doc <- pp
    tag <- Opaleye.fresh
    let
      relation = Opaleye.tagWith tag "statement"
      names = namesFromLabelsWithA (symbol tag) `evalState` Opaleye.start
      columns = Just $ showNames names
      query =
        fromCols <$> each
          TableSchema
            { name = fromString relation
            , columns = names
            }
      returning = Returning (countRows query)
      binding = Binding {..}
    pure (binding, query)
  tell (Endo (binding :))
  pure $ Unmodified query

symbol :: Tag -> NonEmpty String -> State Opaleye.Tag String
symbol tag labels = do
  subtag <- Opaleye.fresh
  let
    suffix = Opaleye.tagWith tag (Opaleye.tagWith subtag "")
  pure $ take (63 - length suffix) label ++ suffix
  where
    label = fold (intersperse "/" labels)

and use symbol in the implementation of a namesFromLabelsTruncated?

Also is it necessary to make the size user configurable? I think using the same constant as before would be good, and then we wouldn't need to add extra stuff to the interface?

That's fine, 63 seems to be the number in most postgres implementations that people are worried about.

@TeofilC
Copy link
Copy Markdown
Contributor

TeofilC commented May 5, 2026

Yep that's the sort of thing. I'm not sure if these are exactly the same, but they seem quite similar. Whatever you think is best

@cristovaogferreira-artificial cristovaogferreira-artificial force-pushed the cristovao/upstream-select-truncated branch from d265e5a to 18423bb Compare May 5, 2026 16:50
@cristovaogferreira-artificial
Copy link
Copy Markdown
Contributor Author

Done. I also renamed selectTruncated to selectShort because "truncate" is a very overloaded term in DB-land.

@cristovaogferreira-artificial cristovaogferreira-artificial changed the title Truncate large names, with a hash to avoid collisions Truncate large names, with an opaleye tag to avoid collisions May 5, 2026
@TeofilC
Copy link
Copy Markdown
Contributor

TeofilC commented May 5, 2026

Sorry one last thing. Could you just make this the default (or maybe there's a reason not to?)? So don't change the interface of Rel8 at all on this PR.

@cristovaogferreira-artificial
Copy link
Copy Markdown
Contributor Author

Sorry one last thing. Could you just make this the default (or maybe there's a reason not to?)? So don't change the interface of Rel8 at all on this PR.

Two concerns:

  • Is it possible that someone actually changed their postgres installation to keep long names possible for debugging reasons? If so this change would break their usage.

    • My guess is "no-one is", because it requires a recompile with a config change, and even if it were "yes" anyone who's recompiling postgres probably doesn't shy away from patching and recompiling rel8, but you know the consumers of the library better than me.
  • Do I change the top-level namesFromLabels function, then?

    • What is the expectation of someone calling namesFromLabels, is it that it does the same scheme that Rel8 uses internally?
    • Even if the expectation is just "generate some names that are good to use as postgres column names", it should change.

@TeofilC
Copy link
Copy Markdown
Contributor

TeofilC commented May 6, 2026

I agree that it's unlikely that someone will change the default length.

Yeah I think the nameFromLabels function should match internal usage.

AI disclosure: Generated with Claude Code to figure out how to test the
SQL string.
@cristovaogferreira-artificial cristovaogferreira-artificial force-pushed the cristovao/upstream-select-truncated branch from 18423bb to 1e58aa9 Compare May 6, 2026 09:18
@cristovaogferreira-artificial
Copy link
Copy Markdown
Contributor Author

Done. Let me know if there's anything else!

Copy link
Copy Markdown
Contributor

@TeofilC TeofilC left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great thanks!

I'll try to cut a release soon btw

@TeofilC TeofilC merged commit cb0bdab into circuithub:master May 7, 2026
1 check passed
cristovaogferreira-artificial added a commit to artificialio/rel8 that referenced this pull request May 8, 2026
…thub#400)

* Truncate large names, with a tag to avoid collisions

* Add test for long label case

AI disclosure: Generated with Claude Code to figure out how to test the
SQL string.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants