Skip to content

Efficiency in dask applications. #213

@bnlawrence

Description

@bnlawrence

It is clear we want to avoid pyactivestorage doing a file open inside each dask chunk if such a file open requires a remote index read each and every time.

A quick hack to fix this (in the pyfive branch) would be to avoid keeping the File instance open (the optimal_kerchunk branch already does this). With that one change, users could at least use active storage instances many times without worrying about the file open count.

A better solution long term may involve lifting the internal s3fs outside so we can take advantage of the s3fs caching.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions