Skip to content

Towards an HDF5 type provider in FSharp.Data.Toolbox? #34

@dsyme

Description

@dsyme

There are discussions about an HDF5 type provider, for example see the discussion and links here https://twitter.com/RodhernACT/status/662202241768665088

FSharp.Data.Toolbox would be a natural eventual landing home for this given the inclusion of an SAS type provider in this project already.

Here are some relevant resources. If you know of more please chime in below

  • You can find out more about HDF5 here https://www.hdfgroup.org/HDF5/
  • The HDF5 User Guide is long but very comprehensive
  • There is an existing wrapper of the HDF5 native binaries for .NET. This looks a little old (circa 2005?) and some users have reported some issue with using this wrapper. It's also not clear if the library is under active development.
  • There is also a relevant PowerShell adaptor for HDF5 that is more recent (circa 2011?) which uses PowerShell's "provider" model to give a file-system like embedding of the information space into the programming model. This may provide inspiration for what an HDF5 type provider for F# can look like.
  • Robert Nielsen (twitter) has done a prototype HDF5 provider and is looking at getting the source out onto github. Having the source out as a compiling project would be a great starting base for a next round of work on this. Robert has blogged about F# and the prototype HDF5 provider. The source is linked from the blog.
  • @memura is also interested in participating in this work and we're opening this issue as a place to discuss what the design of the type provider would be, e.g. what would be the experience of using typical HDF5 data files and what would it mean for the provider to be high performance, complete etc.
  • There is a python pandas reader for HDF5 that may act as inspiration. For example, Deedle may want to include HDF5 access. (Coincidentally, the other data formats being read by Pandas mentioned on that page may also be of interest for FSharp.Data and Deedle over time)
  • There is an HDF5 interface to R

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions