JAMES-4182 Proposal for changes in BlobStoreDAO interface#2960
JAMES-4182 Proposal for changes in BlobStoreDAO interface#2960chibenwa wants to merge 10 commits intoapache:masterfrom
Conversation
I think we should only use dedicated properties to store acitvely managed metadata, the remaining metadata as a Map<String,String> can be serializaed with the payload. this avoids possible capability conflicts between blog stores. |
jeantil
left a comment
There was a problem hiding this comment.
Looks like a good direction to me, I left some comments, I'm sure there are which you are aware of and will addres, feel free to close on sight
| class ReactiveByteSource { | ||
| private final long size; | ||
| private final Publisher<ByteBuffer> content; | ||
| record BlobMetadataName(String name) { |
There was a problem hiding this comment.
how about using an enum instead of a record ?
it would mean we support less metadata fields but we would have stronger semantics,
There was a problem hiding this comment.
I'm not fan of closing the data model and limiting the ability for people to do what they like in extensions.
Saying "we gonna get all use cases so we can close this is presomptuous. Exemple I did not get until yesterday that this mechanism could be used for AES key rotation.
There was a problem hiding this comment.
what about the mixed strategy : known/actively used by james metadata is materialized as such, the rest can be added as unstructured data ?
There was a problem hiding this comment.
OR we could document known values as constants in the interface and keep the underlying map.
Edited. Please see a better proposal below that allign both visions.
There was a problem hiding this comment.
at the very least do you mind hiding the Map type in a Metadata type ? and exposing a readonly api
There was a problem hiding this comment.
IMO the map shall be immutable
| } | ||
|
|
||
| sealed interface Blob { | ||
| Map<BlobMetadataName, BlobMetadataValue> metadata(); |
There was a problem hiding this comment.
As mentionned above it would be better for keys to be known,
alternatively this could be an object with kmown fields which james cares about and a map for extra fields which are not interpreted by james
server/blob/blob-api/src/main/java/org/apache/james/blob/api/BlobStoreDAO.java
Show resolved
Hide resolved
|
|
||
| @Override | ||
| public ByteSourceBlob asByteSource() throws IOException { | ||
| try (FileBackedOutputStream fileBackedOutputStream = new FileBackedOutputStream(100 * 1024)) { |
There was a problem hiding this comment.
100*1024 magic number
if I understand correctly this will write the stream to a file if the length is about 100kbits
There was a problem hiding this comment.
Yes I can add JVM args in order to easily control this if need be.
There was a problem hiding this comment.
I meant more naming the constant IN_MEMORY_PROCESSING_SIZE_TRESHOLD or something
| * or an IOObjectStoreException when an unexpected IO error occurs | ||
| */ | ||
| Publisher<byte[]> readBytes(BucketName bucketName, BlobId blobId); | ||
| Publisher<BytesBlob> readBytesBlob(BucketName bucketName, BlobId blobId); |
There was a problem hiding this comment.
while the description was useless the details in the return had some value
the whole blob library probably deserves more documentation instead of less 😬
| public Publisher<BytesBlob> readBytesBlob(BucketName bucketName, BlobId blobId) { | ||
| return Mono.fromCallable(() -> blobs.get(bucketName, blobId)) | ||
| .switchIfEmpty(Mono.error(() -> new ObjectNotFoundException(String.format("blob '%s' not found in bucket '%s'", blobId.asString(), bucketName.asString())))) | ||
| .map(BytesBlob::of); |
There was a problem hiding this comment.
🤔 shouldn't the in memory be adapted to actually store the metadata ? the MemoryBlobStoreDao will always loose all metadata.
I know we can derive the size from the content but why have a Map<X,Y> for metadatas if all the stores canx extract is the size>?
same comment applied to all leafs implementations
There was a problem hiding this comment.
🤔 shouldn't the in memory be adapted to actually store the metadata ? the MemoryBlobStoreDao will always loose all metadata.
Short answer: later. I don't like spending time writing non consensual code that do not get integrated.
| return Mono.from(underlying.readBlobReactive(bucketName, blobId)) | ||
| .map(InputStreamBlob::payload) | ||
| .map(Throwing.function(this::decrypt)) | ||
| .map(InputStreamBlob::of); |
There was a problem hiding this comment.
this is stripping any metadata returned by the underlying store
There was a problem hiding this comment.
Yes indeed. I focussed so far on adding the metadata to the interface, not on making it work correctly.
Given an agreement on the design, I'd be happy to spend the 1 week to complete the work along with decent amount of testing.
| private final long size; | ||
| private final Publisher<ByteBuffer> content; | ||
| record BlobMetadataName(String name) { | ||
| // TODO validation (a-z,A-Z,0-9 -_) &length 128 char & non empty |
There was a problem hiding this comment.
which store's constraint is that ?
There was a problem hiding this comment.
I'd rather be strict, none precisely but I believe with this we shall be good with almost all...
|
@jeantil do we have an agreement on continuing this work with an open metadata model? |
I'm not comfortable working with only the open metadata model. I understand the extension point argument but |
I propose:
Essentially: I think it keeps the underlying data model simple, extensible, easy to work with in DAOs, and it address your concerns and makes James-used metadata a prime citizen. |
|
this looks ok for me, ideally we can enforce usage of an immutable map especially if it remains accessible from outside |
Of course ;-) |
|
I'd like to thank you for the review effort @jeantil I will put together the effort to align the code with the comments, and continue the implementation likely next week. |
f813896 to
afa20c1
Compare
|
Hello @jeantil I did rework the metadata model and wrote memory implementation. I'll carry other with other impelementation when I have time, but wanted to push now to enable early review, in case people have time. |
|
|
||
| } | ||
|
|
||
| record BlobMetadata(Map<BlobMetadataName, BlobMetadataValue> metadata) { |
There was a problem hiding this comment.
record BlobMetadata(ImmutableMap<BlobMetadataName, BlobMetadataValue> value) {
- java.util.Map allows for mutable implementations and doesn't inform consumers that the property is immutable
- naming it metadata results in weird looking assertions calling
.metadata().metadata(), I think.metadata().value()would be clearer. (ideally the containsAllOf would be applicablle to type BlobMetadata :D
There was a problem hiding this comment.
I am not using a mutable map here but explicitly immutable one which is a common pattern for ASF james.
As for naming, do you have suggestions? I fall a bit sort here... Happy to update if we can come up with something relevant.
There was a problem hiding this comment.
sorry if I was unclear, for some reason githib's UI didn't want to handle diff mode yesterday somehow
for naming my suggestion is
| record BlobMetadata(Map<BlobMetadataName, BlobMetadataValue> metadata) { | |
| record BlobMetadata(Map<BlobMetadataName, BlobMetadataValue> value) { |
The tests are in the next commit but that's where I noticed the .metadata().metadata() changing that to value ( turns it in .metadata().value())` which is a pattern I use ahdn have seen used for ValueObjectś or objects with a similar intent.
I'm quite sure there are already cases of such a pattern in james
would you mind adding a get(BlobMetadataName name) directly on the BlobMetadata which delegates to the map obviously ?
This would reduce the need for consumers to dereference the full underlying map in order to get an attribute other than content encoding? and improve encapsulation as well as promote the immutability of the api.
There was a problem hiding this comment.
record BlobMetadata(Map<BlobMetadataName, BlobMetadataValue> value) {
Ok to me
Or even record BlobMetadata(Map<BlobMetadataName, BlobMetadataValue> underlyingMap) ... ?
would you mind adding a get(BlobMetadataName name) directly on the BlobMetadata which delegates to the map obviously ?
+1 very good idea it will avoid most access to the underlying map. Will be cleaner indeed.
|
|
||
| Mono.from(testee.save(TEST_BUCKET_NAME, TEST_BLOB_ID, bytesBlob)).block(); | ||
|
|
||
| assertThat(Mono.from(testee.readBytes(TEST_BUCKET_NAME, TEST_BLOB_ID)).block().metadata().metadata()) |
There was a problem hiding this comment.
I'll use this one as an example to let you pick the variant you prefer, I'm fine with both :D
| assertThat(Mono.from(testee.readBytes(TEST_BUCKET_NAME, TEST_BLOB_ID)).block().metadata().metadata()) | |
| assertThat(Mono.from(testee.readBytes(TEST_BUCKET_NAME, TEST_BLOB_ID)).block().metadata().values()) | |
| assertThat(Mono.from(testee.readBytes(TEST_BUCKET_NAME, TEST_BLOB_ID)).block().metadata().underlyingMap()) |
Opened to collect initial feedback
See https://issues.apache.org/jira/browse/JAMES-4182 for context
Remaining points: