Skip to content

perf: replaces list of postcodes in ja/address.yml with a 7-digit format#3201

Merged
thdaraujo merged 2 commits intomainfrom
ta/locales/change-ja-address-locale-to-a-format
Jan 31, 2026
Merged

perf: replaces list of postcodes in ja/address.yml with a 7-digit format#3201
thdaraujo merged 2 commits intomainfrom
ta/locales/change-ja-address-locale-to-a-format

Conversation

@thdaraujo
Copy link
Contributor

@thdaraujo thdaraujo commented Jan 31, 2026

(fixes #3200)

This PR replaces the full list of postcodes with a 7-digit format for ja/address.yml.

The reason is for performance, as generating a postcode in this locale
is too expensive.

Japanese postal codes were changed from a simple pattern to a list of real postal codes, introduced in #2297.

The problem is that when running Faker on locale = 'ja', about 60% of the time to load and call Faker::Address.postalcode is spent reading and parsing the ja/address.yml file, which has 2.2mb.

Benchmark:

Postcode - Smaller Address File:  1.2 i/s
Postcode:                         0.7 i/s - 1.65x  slower

1.34 s/i vs 811.55 ms/i

This has the potential to improve the performance of generating
postcodes in japanese by 65%, as well as loading in general.

Gem size will also decrease by 200kb. 🎉
1.6M -> 1.4M

@thdaraujo thdaraujo self-assigned this Jan 31, 2026
@thdaraujo thdaraujo force-pushed the ta/locales/change-ja-address-locale-to-a-format branch from da6d5e1 to f216fbf Compare January 31, 2026 20:36
…ormat

Replaces the full list of japanese postcodes in favor of a simple 7-digit format:
"###-####".

The reason is for performance, as generating a postcode in this locale
is too expensive.

Japanese postal codes were changed from a simple pattern to a list of real postal codes, introduced in #2297.

The problem is that when running Faker on `locale = 'ja'`, about 60% of the time to load and call `Faker::Address.postalcode` is spent reading and parsing the [ja/address.yml](https://github.com/faker-ruby/faker/blob/main/lib/locales/ja/address.yml) file, which has 2.2mb.

Benchmark:
```
Postcode - Smaller Address File:  1.2 i/s
Postcode:                         0.7 i/s - 1.65x  slower

1.34 s/i vs 811.55 ms/i
```

This has the potential to improve the performance of generating
postcodes in japanese by 65%.
@thdaraujo thdaraujo force-pushed the ta/locales/change-ja-address-locale-to-a-format branch from f216fbf to 08f2ade Compare January 31, 2026 20:37
assert_kind_of String, Faker::Subscription.status
assert_not_english(Faker::Subscription.status)
assert_kind_of String, Faker::Subscription.payment_method
assert Array.new(10) { Faker::Subscription.payment_method }.any? { |word| !word.match?(/[a-zA-Z]/) }
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this test would randomly generate words that often wouldn't match the filter.

The weird thing is that it should have been happening on main too.

@thdaraujo thdaraujo force-pushed the ta/locales/change-ja-address-locale-to-a-format branch from b2cf8df to 4dfe746 Compare January 31, 2026 21:37
Copy link
Contributor

@stefannibrasil stefannibrasil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

We don't need to generate every single possible post code out there since we are not Wikipedia 😅 We just need to generate a few samples. It's good to have this done, thanks!

🐡 🅾️
Image

@thdaraujo thdaraujo merged commit aa70c59 into main Jan 31, 2026
9 checks passed
@thdaraujo thdaraujo deleted the ta/locales/change-ja-address-locale-to-a-format branch January 31, 2026 21:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Speed up ja/address.yml loading

2 participants