Skip to content

fix truncating of HNY ids#2220

Merged
damonmcc merged 1 commit intomainfrom
dm-fix-hpd-ids
Feb 9, 2026
Merged

fix truncating of HNY ids#2220
damonmcc merged 1 commit intomainfrom
dm-fix-hpd-ids

Conversation

@damonmcc
Copy link
Copy Markdown
Member

@damonmcc damonmcc commented Feb 5, 2026

resolves #2213

all builds on this branch

checks

used the new query in dbeaver and basically looked at records in the modified subquery where hny_id_old_length != hny_id_new_length. there are 789 such records and they all went from 12 to 13 characters

also used in-development table comparison tool on the devdb_hny_lookup table that's created by the modified SQL script. it shows the expected diffs in the hny_id columns

❯ python3 -m dcpy lifecycle scripts compare_build_tables compare db-devdb dm_fix_hpd_ids devdb_hny_lookup
11:30:21 INFO:dcpy:Comparing dm_fix_hpd_ids.devdb_hny_lookup (left) to nightly_qa.devdb_hny_lookup (right) ...
________________________________________________________________________________
tables
    left: devdb_hny_lookup
    right: devdb_hny_lookup
________________________________________________________________________________
row_count
    left: 8361
    right: 8438
________________________________________________________________________________
column_comparison
    both
        all_hny_units
        classa_hnyaff
        hny_id
        hny_jobrelate
        job_number
    left_only: None
    right_only: None
    type_differences: None
________________________________________________________________________________
data_comparison
    compared_columns
        all_hny_units
        classa_hnyaff
        hny_id
        hny_jobrelate
        job_number
    ignored_columns: None
    columns_coerced_to_numeric: None
    left_only
        1319 rows. First 20 shown
           classa_hnyaff job_number         hny_id hny_jobrelate all_hny_units                          row_hash  match_count  dev_count  prod_count
        0              0  321596803       Multiple   one-to-many             2  a60df2018094a1bca0399aee1cdf8446            1          1           0
        1              1  S00555943  69582/1015477    one-to-one             1  ad681317e627cdb03f016013d4b078bb            1          1           0
        2              1  S00570356  69582/1015478    one-to-one             1  ea0245559eb5c4e25c9cc0423d588985            1          1           0
        3              1  S00570405  69582/1015476    one-to-one             1  b6d374fedd7d7a6058a08c208281cef2            1          1           0
        4              1  S00570422  69582/1015474    one-to-one             1  777aa93345f2405e0692431d3e631472            1          1           0
        5              1  S00570509  69582/1015480    one-to-one             1  3b1a8545d89de1bbece37ccc15537683            1          1           0
        6              1  S00570523  69582/1015483    one-to-one             1  5dc4e75124f220ad1943bee276025eda            1          1           0
        7             10  220152242  74169/1005039    one-to-one            10  81e177b4b600e6df2c0d519635a92cc8            1          1           0
        8             10  220640251  74578/1010365    one-to-one            32  c605f74b9798a769d83f00381725f2a3            1          1           0
        9             10  220672742  73246/1009578    one-to-one            31  c8efaca5f67ad1fa50f80c0e4c8f65ec            1          1           0
        10            10  321590578  70913/1017549    one-to-one            10  fcb3d9ecda54d75577228b9a5f3116ef            1          1           0
        11            10  321594388  73698/1007968    one-to-one            33  dfe80edfdc9abff1cdaa7aa0c4d123fe            1          1           0
        12            10  321600772  68222/1005398    one-to-one            10  9c596befa1d5b10247410e512c4ca4dc            1          1           0
        13            10  321954364  74772/1013466    one-to-one            51  a89fedaf46887c90b20cdfed2d6f912b            1          1           0
        14            10  321995917  73763/1005140    one-to-one            33  f480fe1832149f7ba27b8f0416c68666            1          1           0
        15            10  340754945  71939/1006281    one-to-one            32  b65ef2d80655173280cf5014a2245492            1          1           0
        16            10  421133026  70072/1014128    one-to-one            35  a6a945a8e5f9993b307264eb6901d49b            1          1           0
        17           100  X00696576  76752/1016054    one-to-one           100  d3b54554ddb5a6b45b565013a88b5d71            1          1           0
        18           101  X00554868  74615/1009388    one-to-one           101  b268b3889264f01718207db06378f067            1          1           0
        19           103  321592763  69611/1017598    one-to-one           103  0dbb18152dcd538865b87edc6fd5d71c            1          1           0
    right_only
        1396 rows. First 20 shown
           classa_hnyaff job_number        hny_id hny_jobrelate all_hny_units                          row_hash  match_count  dev_count  prod_count
        0              0  321596778      Multiple  many-to-many             2  7ed5528245d9d25cfba8f6811ad7ff48            1          0           1
        1              1  S00555943  69582/101547   many-to-one             1  2d86f0739ad1b395db0e9e423d10e552            1          0           1
        2             10  220152242  74169/100503    one-to-one            10  4f3494eaa68a9a92637fadebe7209e03            1          0           1
        3             10  220640251  74578/101036    one-to-one            32  088c19fee3aec359291da5fb50a38eb0            1          0           1
        4             10  220672742  73246/100957    one-to-one            31  dea67baa2570865726422dad2954d140            1          0           1
        5             10  321590578  70913/101754    one-to-one            10  3061093328357794ad9908985b581d58            1          0           1
        6             10  321594388  73698/100796    one-to-one            33  3ac3d69b2b80c92d509fed15fa93809b            1          0           1
        7             10  321600772  68222/100539   many-to-one            10  aa6ff47071d02982d15f18a14a3413d9            1          0           1
        8             10  321954364  74772/101346    one-to-one            51  4c189b2f028854c0e0cf64b3235935a5            1          0           1
        9             10  321995917  73763/100514    one-to-one            33  40e02cb9618e21f778d98c506b6e2cfb            1          0           1
        10            10  340754945  71939/100628    one-to-one            32  d180d27f48576c6e35dfc5ee601ac513            1          0           1
        11            10  421133026  70072/101412    one-to-one            35  4a5e6cc748419f1dc94cf38f981b207b            1          0           1
        12            10  421249615  73211/100912    one-to-one            31  301f22b55dee5ae71983d359f422ba0d            1          0           1
        13           100  X00696576  76752/101605    one-to-one           100  16fe78ec07f238f4ac376385b419b9d6            1          0           1
        14           101  X00554868  74615/100938    one-to-one           101  6eb9584e81fa73008f3fd85951f26b34            1          0           1
        15           103  321592763  69611/101759    one-to-one           103  0ef2c0a41f5b13b19908483d492cf719            1          0           1
        16           105  210182309  69565/100458    one-to-one           105  3bb8c988df8b5991c746b8caaf86b4df            1          0           1
        17           106  Q08012514  72051/100874    one-to-one           106  eb41e81cc137c063d608eb3961656b96            1          0           1
        18           108  421133151  70115/100285    one-to-one           614  e475d7d70718124d71070d702f6a9e82            1          0           1
        19           109  B00568816  74457/101012    one-to-one           109  d300152d134742d045eb60c8794a6a48            1          0           1
    are_equal: False

789 out of ~22K IDs were being truncated from 7 to 6 digits
@codecov
Copy link
Copy Markdown

codecov Bot commented Feb 5, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 74.24%. Comparing base (eb4bc59) to head (c025a12).
⚠️ Report is 5 commits behind head on main.

Additional details and impacted files

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@damonmcc damonmcc marked this pull request as ready for review February 6, 2026 19:47
@damonmcc damonmcc requested a review from a team February 6, 2026 19:47
Copy link
Copy Markdown
Contributor

@alexrichey alexrichey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

-- 1) Merge with geocoding results and create a unique ID
WITH hny AS (
SELECT
a.project_id || '/' || coalesce(lpad(a.building_id, 6, '0'), '') AS hny_id,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a big reason to not just lpad to 7 digits? Or not lpad at all and have this not be a fixed length field?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should've checked whether the lpad is even needed at all, I just assumed it was here because the other keys have historically been 6 digits and Housing didn't seem to see any issues other than the 7-to-6 problem

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fvankrieken these are the input and output lengths of building_id values in the subquery. looks like the lpad is needed. Sam's described the "underlying HPD data" that they're trying to join to with their corrections as having 6 and 7 digit buildings IDs

source_id_length subquery_id_length count
3 6 2
4 6 41
5 6 402
6 6 6907
7 7 788

wish the queries weren't so hard to untangle so I could point and even add tests for those assumptions. shall refactor soon!

@damonmcc damonmcc merged commit bd7e2ad into main Feb 9, 2026
39 checks passed
@damonmcc damonmcc deleted the dm-fix-hpd-ids branch February 9, 2026 16:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

HPD project IDs in DevDB

3 participants