JohnLCaron · JohnLCaron · Jul 16, 2025 · Jul 15, 2025 · Jul 16, 2025
diff --git a/.idea/artifacts/netchdf_0_5_0.xml b/.idea/artifacts/netchdf_0_5_0.xml
diff --git a/Readme.md b/Readme.md
@@ -1,12 +1,12 @@
 # netchdf
-_last updated: 7/13/2025_
+_last updated: 7/16/2025_
 
 This is a rewrite in Kotlin of parts of the devcdm and netcdf-java libraries. 
 
-The intention is to create a maintainable, read-only, pure JVM library allowing full access to 
+The intention is to create a maintainable, read-only, thread-safe, pure JVM library allowing full access to 
 netcdf3, netcdf4, hdf4, hdf5, hdf-eos2, and hdf-eos5 data files. 
 
-Evaluating if support for superblock 4 is feasible.
+The library is close to feature complete. We are currently extensively testing and comparing to the reference libraries.
 
 Please contact me if you'd like to help out. Especially needed are test datasets from all the important data archives!!
 
@@ -28,25 +28,25 @@ Also see:
 
 ### Why this library? 
 
-There is so much important scientific data stored in the NetCDF and HDF file formats, that those formats will 
-never go away. It is important that there be maintainable, independent libraries to read these files forever.
+The scientific data stored in NetCDF and HDF file formats must remain forever readable. 
 
 The Netcdf-Java library prototyped a "Common Data Model" (CDM) to provide a single API to access various file formats. 
 The netcdf* and hdf* file formats are similar enough to make a common API a practical and useful goal. 
-By focusing on read-only access to just these formats, the API and the code are kept simple.
+By focusing on read-only access to just these formats, the API and the code are kept simple. In short, a library that 
+focuses on simplicity and clarity is a safeguard for the irreplaceable investment in these scientific datasets.
 
-In short, a library that focuses on simplicity and clarity is a safeguard for the irreplaceable investment in these
-scientific datasets.
+A second motivation is to allow multithreaded access to these files. The lack of thread safety in the HDF5-C library
+is a major failing that needs to be fixed.
 
 ### Why do we need another library besides the standard reference libraries?
 
 It's necessary to have independent implementations of any standard. If you don't have multiple implementations, it's
 easy for the single implementer to mistake the implementation for the actual standard. It's easy to hide problems 
-that are actually in the standard by adding work-arounds in the code, instead of documenting problems and creating new
+that are actually in the standard by adding work arounds in the code, instead of documenting problems and creating new
 versions of the standard with clear fixes. For Netcdf/Hdf, the standard is the file formats, along with their semantic 
 descriptions. The API is language and library specific, and is secondary to the standard.
 
-More subtly, its very hard to see the elegance (or otherwise) of your own design, you need independent review of your
+More subtly, it's very hard to see the elegance (or otherwise) of your own design; you need independent review of your
 data structures and API by people truly invested in getting them right.
 
 Having multiple implementations is a huge win for the reference library, in that bugs are more quickly found, and 
@@ -59,7 +59,7 @@ and keep bug free, with implications for memory safety and security. The librari
 toolchains. Shifts in funding could wipe out much of the institutional knowledge needed to maintain them.
 
 The HDF file formats are overly complicated, which impacts code complexity and clarity. The data structures do not
-always map to a user understandable data model. Semantics are left to data-writers to document (or not). 
+always map to a user-understandable data model. Semantics are left to data-writers to document (or not). 
 While this problem isn't specific to HDF file users, it is exacerbated by a "group of messages" design approach. 
 
 The HDF4 C library is a curious hodgepodge of disjointed APIs. The HDF5 API is better and the Netcdf4 API much better.
@@ -73,7 +73,7 @@ library for data access is less clear. For now, we will provide a "best-effort"
 contents of the file.
 
 Currently, the Netcdf-4 and HDF5 libraries are not thread safe, not even for read-only applications. 
-This is a serious limitation for high performance, scalable applications, and it is disappointing that it hasnt been fixed.
+This is a serious limitation for high performance, scalable applications, and it is disappointing that it hasn't been fixed.
 See [Toward Multi-Threaded Concurrency in HDF5](https://www.hdfgroup.org/wp-content/uploads/2022/05/Toward-MT-HDF5.pdf),
 and [RFC:Multi-Thread HDF5](https://support.hdfgroup.org/releases/hdf5/documentation/rfc/RFC_multi_thread.pdf) for more information.
 
@@ -96,45 +96,44 @@ other parts of the code faster.
 We will investigate using Kotlin coroutines to speed up performance bottlenecks.
 
 
-### What version of the JVM, Kotlin, and Gradle?
-
-We will always use the latest LTS (long term support) Java version, and will not be explicitly supporting older versions.
-Currently that is Java 21.
-
-We also use the latest stable version of Kotlin that is compatible with the Java version. Currently that is Kotlin 2.1.
-
-Gradle is our build system. We will use the latest stable version of Gradle compatible with our Java and Kotlin versions.
-Currently that is Gradle 8.14.
-
-For now, you must download and build the library yourself. Eventually we will publish it to Maven Central. 
-The IntelliJ IDE is highly recommended for all JVM development.
-
-
 ### Goals and scope
 
 Our goal is to give read access to all the content in NetCDF, HDF5, HDF4, and HDF-EOS files.
 
-The library will be thread-safe for reading multiple files concurrently.
+The library will be thread-safe for reading multiple files concurrently. We are also exploring concurrent reading within
+the same file.
 
 We are focussing on earth science data, and don't plan to support other uses except as a byproduct.
 
-The core module will remain pure Kotlin with very minimal dependencies and no write capabilities. In particular, 
+The core module will remain pure Kotlin with very minimal library dependencies and no write capabilities. In particular, 
 there will be no dependency on the reference C libraries (except for testing). 
 
 There will be no dependencies on native libraries in the core module, but other modules or
-projects that use the core are free to use dependencies as needed. We will likely add runtime discovery to facilitate this, 
-for example, to use HDF5 filters that link to native libraries.
+projects that use the core are free to use dependencies as needed, for example to use HDF5 filters that link to native libraries.
 
 
 ### Non-goals
 
-Its not a goal to duplicate netcdf-java functionality.
+It's not a goal to duplicate netcdf-java functionality.
 
-Its not a goal to duplicate Netcdf-C library functionality.
+It's not a goal to duplicate Netcdf-C library functionality.
 
-Its not a goal to provide remote access to files.
+It's not a goal to provide remote access to files.
 
 
+### What version of the JVM, Kotlin, and Gradle?
+
+We will always use the latest LTS (long term support) Java version, and will not be explicitly supporting older versions.
+Currently that is Java 21.
+
+We also use the latest stable version of Kotlin that is compatible with the Java version. Currently that is Kotlin 2.1.
+
+Gradle is our build system. We will use the latest stable version of Gradle compatible with our Java and Kotlin versions.
+Currently that is Gradle 8.14.
+
+For now, you must download and build the library yourself. Eventually we will publish it to Maven Central.
+The IntelliJ IDE is highly recommended for all JVM development.
+
 
 ### Testing
 
@@ -157,7 +156,7 @@ Currently we have this test coverage from core/test:
 
 The core library has ~6500 LOC.
 
-More and deeper test coverage is provided in the clibs module, which compares netchdf metadata and data against
+More and deeper test coverage is provided in the testclibs module, which compares netchdf metadata and data against
 the Netcdf, HDF5, and HDF4 C libraries. The clibs module is not part of the released netchdf library and is 
 only supported for test purposes.
 

diff --git a/core/src/commonMain/kotlin/com/sunya/netchdf/hdf5/FilterPipeline.kt b/core/src/commonMain/kotlin/com/sunya/netchdf/hdf5/FilterPipeline.kt
@@ -2,7 +2,7 @@ package com.sunya.netchdf.hdf5
 
 import com.sunya.cdm.iosp.decode
 
-// should work even when filterType unknowm as long as its registered
+// should work even when filterType is unknown, as long as filter is registered with correct id.
 enum class FilterType(val id: Int) {
     none(0), deflate(1), shuffle(2), fletcher32(3), szip(4), nbit(5), scaleoffset(6),
     bzip2(307),

diff --git a/core/src/commonMain/kotlin/com/sunya/netchdf/hdf5/H5builder.kt b/core/src/commonMain/kotlin/com/sunya/netchdf/hdf5/H5builder.kt
@@ -12,8 +12,8 @@ import com.fleeksoft.charset.Charsets
 import com.sunya.cdm.array.makeString
 import com.sunya.cdm.util.InternalLibraryApi
 
-private const val debugStart = true
-private const val debugSuperblock = true
+private const val debugStart = false
+private const val debugSuperblock = false
 internal const val debugTypedefs = false
 internal const val debugFlow = false
 

diff --git a/core/src/commonMain/kotlin/com/sunya/netchdf/hdf5/MessageDataType.kt b/core/src/commonMain/kotlin/com/sunya/netchdf/hdf5/MessageDataType.kt
@@ -16,7 +16,6 @@ import kotlin.math.min
 // the dataspace message is used for that purpose. Datatype messages that are part of a committed datatype (formerly
 // named datatype) message describe a common datatype that can be shared by multiple datasets in the file.
 
-@InternalLibraryApi
 enum class Datatype5(val num : Int) {
     Fixed(0), Floating(1), Time(2), String(3), BitField(4), Opaque(5),
     Compound(6), Reference(7), Enumerated(8), Vlen(9), Array(10), Unknown(999);

diff --git a/gradle/libs.versions.toml b/gradle/libs.versions.toml
@@ -6,6 +6,7 @@ kotlin = "2.1.21"
 kotlinx-cli = "0.3.6"
 kotlinx-coroutines = "1.10.2"
 
+jhdf-version = "0.9.4"
 fleeksoft-version = "0.0.4"
 lzf-version = "1.1.2"
 lz4-version = "1.8.0"
@@ -49,6 +50,7 @@ kotlinx-coroutines-test = { module = "org.jetbrains.kotlinx:kotlinx-coroutines-t
 junit-jupiter-params = { module = "org.junit.jupiter:junit-jupiter-params", version.ref = "junit-jupiter-params" }
 kotest-property = { module = "io.kotest:kotest-property", version.ref = "kotest" }
 mockk = { module = "io.mockk:mockk", version.ref = "mockk" }
+jhdf = { module = "io.jhdf:jhdf", version.ref = "jhdf-version" }
 
 [bundles]
 jvmtest = ["junit-jupiter-params", "kotlin-test-junit5", "logback-classic", "mockk"]

diff --git a/testfiles/build.gradle.kts b/testfiles/build.gradle.kts
@@ -3,17 +3,17 @@ plugins {
 }
 
 dependencies {
-    implementation(project(":core"))
-    implementation(libs.oshai.logging)
+    api(project(":core"))
+    implementation(project(":cli"))
+    implementation(libs.jhdf)
     implementation(libs.kotlinx.coroutines.core)
     implementation(libs.okio)
     implementation(libs.fleeksoft)
+    implementation(libs.oshai.logging)
+    implementation(libs.logback.classic)
 
     testImplementation(kotlin("test"))
     testImplementation(libs.bundles.jvmtest)
-
-    testImplementation(libs.oshai.logging)
-    testImplementation(libs.kotlinx.coroutines.core)
 }
 
 kotlin {
@@ -59,4 +59,4 @@ tasks {
 }
 
 // Declare an explicit dependency on ':core:allMetadataJar' from ':testclibs:compileJava' using Task#dependsOn
-project.tasks["compileJava"].dependsOn(":core:allMetadataJar")
+// project.tasks["compileJava"].dependsOn(":core:allMetadataJar")