Skip to content

JAMES-4182 Proposal for changes in BlobStoreDAO interface#2960

Draft
chibenwa wants to merge 10 commits intoapache:masterfrom
chibenwa:JAMES-4182-api
Draft

JAMES-4182 Proposal for changes in BlobStoreDAO interface#2960
chibenwa wants to merge 10 commits intoapache:masterfrom
chibenwa:JAMES-4182-api

Conversation

@chibenwa
Copy link
Contributor

@chibenwa chibenwa commented Mar 3, 2026

Opened to collect initial feedback

See https://issues.apache.org/jira/browse/JAMES-4182 for context

Remaining points:

  • better validation for metadata
  • Store metadata in blob stores
    • xattr for files
    • extra collumn for PG, cassandra
    • Object metadata for S3
    • Direct storage for memory
    • Passing those around for other blobStore DAO

@chibenwa chibenwa changed the title James 4182 api JAMES-4182 Proposal for changes in BlobStoreDAO interface Mar 3, 2026
@chibenwa chibenwa requested a review from jeantil March 3, 2026 16:25
@jeantil
Copy link
Contributor

jeantil commented Mar 10, 2026

* [ ]  better validation for metadata 
* [ ]  Store metadata in blob stores       
  * [ ]  xattr for files
  * [ ]  extra collumn for PG, cassandra
  * [ ]  Object metadata for S3
  * [ ]  Direct storage for memory
  * [ ]  Passing those around for other blobStore DAO

I think we should only use dedicated properties to store acitvely managed metadata, the remaining metadata as a Map<String,String> can be serializaed with the payload. this avoids possible capability conflicts between blog stores.

Copy link
Contributor

@jeantil jeantil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a good direction to me, I left some comments, I'm sure there are which you are aware of and will addres, feel free to close on sight

class ReactiveByteSource {
private final long size;
private final Publisher<ByteBuffer> content;
record BlobMetadataName(String name) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about using an enum instead of a record ?
it would mean we support less metadata fields but we would have stronger semantics,

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not fan of closing the data model and limiting the ability for people to do what they like in extensions.

Saying "we gonna get all use cases so we can close this is presomptuous. Exemple I did not get until yesterday that this mechanism could be used for AES key rotation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about the mixed strategy : known/actively used by james metadata is materialized as such, the rest can be added as unstructured data ?

Copy link
Contributor Author

@chibenwa chibenwa Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OR we could document known values as constants in the interface and keep the underlying map.

Edited. Please see a better proposal below that allign both visions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

at the very least do you mind hiding the Map type in a Metadata type ? and exposing a readonly api

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO the map shall be immutable

}

sealed interface Blob {
Map<BlobMetadataName, BlobMetadataValue> metadata();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentionned above it would be better for keys to be known,
alternatively this could be an object with kmown fields which james cares about and a map for extra fields which are not interpreted by james


@Override
public ByteSourceBlob asByteSource() throws IOException {
try (FileBackedOutputStream fileBackedOutputStream = new FileBackedOutputStream(100 * 1024)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

100*1024 magic number
if I understand correctly this will write the stream to a file if the length is about 100kbits

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I can add JVM args in order to easily control this if need be.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant more naming the constant IN_MEMORY_PROCESSING_SIZE_TRESHOLD or something

* or an IOObjectStoreException when an unexpected IO error occurs
*/
Publisher<byte[]> readBytes(BucketName bucketName, BlobId blobId);
Publisher<BytesBlob> readBytesBlob(BucketName bucketName, BlobId blobId);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

while the description was useless the details in the return had some value
the whole blob library probably deserves more documentation instead of less 😬

public Publisher<BytesBlob> readBytesBlob(BucketName bucketName, BlobId blobId) {
return Mono.fromCallable(() -> blobs.get(bucketName, blobId))
.switchIfEmpty(Mono.error(() -> new ObjectNotFoundException(String.format("blob '%s' not found in bucket '%s'", blobId.asString(), bucketName.asString()))))
.map(BytesBlob::of);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 shouldn't the in memory be adapted to actually store the metadata ? the MemoryBlobStoreDao will always loose all metadata.

I know we can derive the size from the content but why have a Map<X,Y> for metadatas if all the stores canx extract is the size>?

same comment applied to all leafs implementations

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 shouldn't the in memory be adapted to actually store the metadata ? the MemoryBlobStoreDao will always loose all metadata.

Short answer: later. I don't like spending time writing non consensual code that do not get integrated.

return Mono.from(underlying.readBlobReactive(bucketName, blobId))
.map(InputStreamBlob::payload)
.map(Throwing.function(this::decrypt))
.map(InputStreamBlob::of);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is stripping any metadata returned by the underlying store

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes indeed. I focussed so far on adding the metadata to the interface, not on making it work correctly.

Given an agreement on the design, I'd be happy to spend the 1 week to complete the work along with decent amount of testing.

private final long size;
private final Publisher<ByteBuffer> content;
record BlobMetadataName(String name) {
// TODO validation (a-z,A-Z,0-9 -_) &length 128 char & non empty
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which store's constraint is that ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather be strict, none precisely but I believe with this we shall be good with almost all...

@chibenwa
Copy link
Contributor Author

@jeantil do we have an agreement on continuing this work with an open metadata model?

@jeantil
Copy link
Contributor

jeantil commented Mar 12, 2026

@jeantil do we have an agreement on continuing this work with an open metadata model?

I'm not comfortable working with only the open metadata model. I understand the extension point argument but
I would much prefer an hybrid approach where metadata which is actively used by james is materialized as such in the metadata object and accessed through structured accessors instead of iterating through a map comparing strings every time we need the information
The rest of available metadata can be left as an unstructured map as some kind of extension point

@chibenwa
Copy link
Contributor Author

chibenwa commented Mar 12, 2026

I would much prefer an hybrid approach where metadata which is actively used by james is materialized as such in the metadata object and accessed through structured accessors instead of iterating through a map comparing strings every time we need the information

I propose:

  • Keeping the undlying data as (typed) Map<String, String>
  • Adding convenience factory methods for manipulating known value

Essentially:

record BlobMetadata(Map<BlobMetadataName, BlobMetadataValue> metadata) {
   public static BlobMetadata empty() {...}
   
   public ContentTransferEncoding contentTransferEncoding() {
       return ContentTransferEncoding.fromBlobMetadataValue(metatdata.get(BlobMetadataName.CONTENT_TRANSFER_ENCODING));
   }
   
   public BlobMetadata withMetadata(BlobMetadataName name, BlobMetadataValue value) {
       return new BlobMetadata(ImmutableList.builder()
           .putAll(metadata)
           .put(name, value)
           .build());
   }
   
   public BlobMetadata withContentTransferEncoding(ContentTransferEncoding encoding) {
       return withMetadata(BlobMetadataName.CONTENT_TRANSFER_ENCODING, encoding.asBlobMetadataValue());
   }
}

I think it keeps the underlying data model simple, extensible, easy to work with in DAOs, and it address your concerns and makes James-used metadata a prime citizen.

@jeantil
Copy link
Contributor

jeantil commented Mar 12, 2026

this looks ok for me, ideally we can enforce usage of an immutable map especially if it remains accessible from outside

@chibenwa
Copy link
Contributor Author

ideally we can enforce usage of an immutable map especially if it remains accessible from outside

Of course ;-)

@chibenwa
Copy link
Contributor Author

I'd like to thank you for the review effort @jeantil

I will put together the effort to align the code with the comments, and continue the implementation likely next week.

@chibenwa
Copy link
Contributor Author

Hello @jeantil

I did rework the metadata model and wrote memory implementation.

I'll carry other with other impelementation when I have time, but wanted to push now to enable early review, in case people have time.


}

record BlobMetadata(Map<BlobMetadataName, BlobMetadataValue> metadata) {
Copy link
Contributor

@jeantil jeantil Mar 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

record BlobMetadata(ImmutableMap<BlobMetadataName, BlobMetadataValue> value) {
  • java.util.Map allows for mutable implementations and doesn't inform consumers that the property is immutable
  • naming it metadata results in weird looking assertions calling .metadata().metadata(), I think .metadata().value() would be clearer. (ideally the containsAllOf would be applicablle to type BlobMetadata :D

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not using a mutable map here but explicitly immutable one which is a common pattern for ASF james.

As for naming, do you have suggestions? I fall a bit sort here... Happy to update if we can come up with something relevant.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry if I was unclear, for some reason githib's UI didn't want to handle diff mode yesterday somehow

for naming my suggestion is

Suggested change
record BlobMetadata(Map<BlobMetadataName, BlobMetadataValue> metadata) {
record BlobMetadata(Map<BlobMetadataName, BlobMetadataValue> value) {

The tests are in the next commit but that's where I noticed the .metadata().metadata() changing that to value ( turns it in .metadata().value())` which is a pattern I use ahdn have seen used for ValueObjectś or objects with a similar intent.
I'm quite sure there are already cases of such a pattern in james

would you mind adding a get(BlobMetadataName name) directly on the BlobMetadata which delegates to the map obviously ?
This would reduce the need for consumers to dereference the full underlying map in order to get an attribute other than content encoding? and improve encapsulation as well as promote the immutability of the api.

Copy link
Contributor Author

@chibenwa chibenwa Mar 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

record BlobMetadata(Map<BlobMetadataName, BlobMetadataValue> value) {

Ok to me

Or even record BlobMetadata(Map<BlobMetadataName, BlobMetadataValue> underlyingMap) ... ?

would you mind adding a get(BlobMetadataName name) directly on the BlobMetadata which delegates to the map obviously ?

+1 very good idea it will avoid most access to the underlying map. Will be cleaner indeed.


Mono.from(testee.save(TEST_BUCKET_NAME, TEST_BLOB_ID, bytesBlob)).block();

assertThat(Mono.from(testee.readBytes(TEST_BUCKET_NAME, TEST_BLOB_ID)).block().metadata().metadata())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll use this one as an example to let you pick the variant you prefer, I'm fine with both :D

Suggested change
assertThat(Mono.from(testee.readBytes(TEST_BUCKET_NAME, TEST_BLOB_ID)).block().metadata().metadata())
assertThat(Mono.from(testee.readBytes(TEST_BUCKET_NAME, TEST_BLOB_ID)).block().metadata().values())
assertThat(Mono.from(testee.readBytes(TEST_BUCKET_NAME, TEST_BLOB_ID)).block().metadata().underlyingMap())

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants