Add shared FakeValuesService for multi-threaded Faker instances#1819
Add shared FakeValuesService for multi-threaded Faker instances#1819mferretti wants to merge 2 commits into
Conversation
Closes datafaker-net#1814 This allows multiple `Faker` instances to share a single, pre-initialized `FakeValuesService` per locale. Each `Faker` can then supply its own `Random` instance, avoiding redundant YAML loading and improving performance in concurrent scenarios.
PR Summary
|
There was a problem hiding this comment.
Pull request overview
Adds an opt-in way to share a single FakeValuesService across threads to avoid repeated YAML loading, plus a Faker factory that pairs the shared service with a per-thread Random. New concurrency tests exercise the singleton identity, behavior under load, and parity with a normally constructed Faker.
Changes:
- New
FakeValuesService.getShared(Locale)lazily caches per-locale instances in aConcurrentHashMap. - New
Faker.withSharedService(FakeValuesService, Locale, Random)factory wires a shared service with a per-threadRandomService. - New
SharedFakeValuesServiceTestcovering singleton identity, concurrent usage, and output parity.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| src/main/java/net/datafaker/service/FakeValuesService.java | Adds SHARED_INSTANCES map and getShared(Locale) factory. |
| src/main/java/net/datafaker/Faker.java | Adds withSharedService static factory for thread-local Fakers reusing a shared service. |
| src/test/java/net/datafaker/SharedFakeValuesServiceTest.java | New tests for concurrent identity, no-errors under load, and output parity. |
| private static final ConcurrentHashMap<Locale, FakeValuesService> SHARED_INSTANCES = new ConcurrentHashMap<>(); | ||
|
|
||
| private final Map<RegExpContext, ValueResolver> REGEXP2SUPPLIER_MAP = new CopyOnWriteMap<>(HashMap::new); | ||
|
|
||
| /** | ||
| * Returns a lazily-initialized per-locale singleton. Safe to share across threads: | ||
| * all mutable instance state uses idempotent copy-on-write caches. | ||
| */ | ||
| public static FakeValuesService getShared(Locale locale) { | ||
| return SHARED_INSTANCES.computeIfAbsent(locale, l -> new FakeValuesService()); |
There was a problem hiding this comment.
The locale isn't passed to the constructor — it doesn't need to be. Looking at updateFakeValuesInterfaceMap (line 85–88), locale-specific YAML is loaded lazily into fakeValuesInterfaceMap after construction, triggered by the BaseFaker constructor. The Locale key in SHARED_INSTANCES does exactly what it needs to: it ensures all threads using Locale.ENGLISH share one FakeValuesService instance which then lazily accumulates the English data once. Locale.FRENCH gets a separate one. The whole point is that the n-thread scenario from the issue — same locale, different Random — gets one service loaded once.
addPath and addUrl are non-idempotent mutators: calling them on a shared instance would silently affect every consumer of that singleton. Address this by adding a volatile boolean `shared` flag set by getShared(), and guarding both methods with an UnsupportedOperationException fail-fast. Also rewrites the getShared Javadoc to accurately describe the design: the locale parameter is a cache-partition key (not a constructor arg), YAML is loaded lazily by BaseFaker after construction, and mixing locales between getShared and withSharedService is unsupported. Adds sharedInstanceRejectsAddPathAndAddUrl test to cover both guards.
|
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1819 +/- ##
============================================
+ Coverage 92.24% 92.46% +0.21%
- Complexity 3479 3518 +39
============================================
Files 341 343 +2
Lines 6838 6970 +132
Branches 670 684 +14
============================================
+ Hits 6308 6445 +137
+ Misses 362 359 -3
+ Partials 168 166 -2 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
@mferretti Please don't take my words as critics, I am just researching the problem. I feel that this solution heavily depends on the internal structure of DF. Ideally, the end-user should use only The bottleneck is redundant YAML loading, right? If we want to avoid redundant YAML loading, then shouldn't we just extract the YML loading to some smaller class, and cache its results? |
Summary
FakeValuesService.getShared(Locale)— a lazily-initialised, per-locale singleton backed by aConcurrentHashMap, safe to share across threads because all mutable instance state uses idempotent copy-on-write caches.Faker.withSharedService(FakeValuesService, Locale, Random)— a factory that lets multiple threads share oneFakeValuesServicewhile each supplying its ownRandom, avoiding redundant YAML loading.CyclicBarrier), no errors under heavy concurrent load (16 threads × 10 000 iterations), and output parity with a normally constructedFaker.Test plan
SharedFakeValuesServiceTest#getSharedReturnsSameInstanceUnderConcurrency— all 8 concurrent callers get the same instanceSharedFakeValuesServiceTest#concurrentFakersWithSharedServiceProduceNoErrors— 16 threads × 10 000 iterations with no exceptionsSharedFakeValuesServiceTest#withSharedServiceOutputMatchesNormalFaker— deterministic output matches a seed-equivalent normalFaker🤖 Generated with Claude Code