Skip to content

FakeValuesService sharing across Faker instances — by design or possible enhancement? #1814

@mferretti

Description

@mferretti

Hi,
I use DataFaker heavily in SeedStream; here I run multiple parallel threads to generate test data and I noticed a significant CPU overhead and traced it back to Faker instantiation and per-instance YAML loading but fixed most of it with a thread-local Faker:

private static final ThreadLocal<Map<Locale, Faker>> CACHE = ThreadLocal.withInitial(HashMap::new);
public static Faker getOrCreate(Locale locale, Random random) {
    return CACHE.get().computeIfAbsent(locale, loc -> new Faker(loc, random))
}   

Profiling again after this change, I can see each Faker instance still builds its own FakeValuesService with its own fakeValuesInterfaceMap. With 8 threads all using the same Locale, the same YAML gets loaded 8 times into 8 separate maps. Same goes with expression templates: compiled and cached inside each FakeValuesService independently via EXPRESSION_2_SPLITTED and expression2generex.
Digging some more I noticed that BaseFaker has a constructor:
public BaseFaker(FakeValuesService fakeValuesService, FakerContext context)

This makes me think that sharing a single FakeValuesService across multiple Faker instances is at least architecturally possible.
In the average scenario I am working on, all threads use the same locale and only differ in their Random: a shared, read-only FakeValuesService with per-instance FakerContext could work.

At this point I have a few questions

  1. Is the current design — one FakeValuesService per Faker — intentional for thread-safety or isolation reasons I might be missing?
  2. Are there concurrency issues in sharing a FakeValuesService across Faker instances with different Random instances?
  3. Would a factory method like Faker.withSharedService(FakeValuesService shared, Locale locale, Random random) be something you would consider? It would let callers manage a shared, pre-warmed service and pass it in, with each Faker only owning its FakerContext.
  4. Alternatively, would a FakeValuesServiceFactory.getShared(Locale) singleton pattern fit the library's design philosophy?

Happy to contribute if the approach sounds reasonable and there are no blockers I'm missing.
Thanks for any insight.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions