|
| 1 | +# Docling Java – Copilot Instructions |
| 2 | + |
| 3 | +## Project Overview |
| 4 | + |
| 5 | +`docling-java` is a multi-module Java library providing a Java API for [Docling](https://github.com/docling-project), an IBM Research project for AI-based document processing (PDF, DOCX, PPTX, images, audio, etc.). |
| 6 | + |
| 7 | +The project is published to Maven Central under the `ai.docling` group ID. |
| 8 | + |
| 9 | +## Repository Structure |
| 10 | + |
| 11 | +``` |
| 12 | +docling-java/ |
| 13 | +├── buildSrc/ # Convention plugins (Gradle Kotlin DSL) |
| 14 | +│ └── src/main/kotlin/ |
| 15 | +│ ├── docling-shared.gradle.kts # group/version resolution |
| 16 | +│ ├── docling-java-shared.gradle.kts # java-library + jacoco + JUnit 5 test config |
| 17 | +│ ├── docling-lombok.gradle.kts # Lombok setup |
| 18 | +│ └── docling-release.gradle.kts # maven-publish setup |
| 19 | +├── gradle/libs.versions.toml # Version catalog (single source of truth for deps) |
| 20 | +├── gradle.properties # Gradle settings (java.version=17, parallel, caching) |
| 21 | +├── settings.gradle.kts # Module declarations |
| 22 | +├── docling-core/ # Core DoclingDocument model (no runtime deps) |
| 23 | +├── docling-serve/ |
| 24 | +│ ├── docling-serve-api/ # Framework-agnostic API interfaces + request/response types |
| 25 | +│ └── docling-serve-client/ # Reference HTTP client (Java HttpClient + Jackson) |
| 26 | +├── docling-testcontainers/ # Testcontainers module for Docling Serve |
| 27 | +├── docling-testing/ |
| 28 | +│ └── docling-version-tests/ # Quarkus/Picocli CLI for version-compatibility testing |
| 29 | +├── docs/ # MkDocs documentation site |
| 30 | +├── test-report-aggregation/ # Aggregated JaCoCo + JUnit reports |
| 31 | +└── .github/ |
| 32 | + ├── project.yml # release.current-version (source of truth for version) |
| 33 | + └── workflows/ # CI: build.yml, release.yml, docs.yml, version-tests.yml |
| 34 | +``` |
| 35 | + |
| 36 | +### Module ↔ Gradle project name mapping |
| 37 | + |
| 38 | +| Directory path | Gradle project name | |
| 39 | +|---|---| |
| 40 | +| `docling-core` | `:docling-core` | |
| 41 | +| `docling-serve/docling-serve-api` | `:docling-serve-api` | |
| 42 | +| `docling-serve/docling-serve-client` | `:docling-serve-client` | |
| 43 | +| `docling-testcontainers` | `:docling-testcontainers` | |
| 44 | +| `docling-testing/docling-version-tests` | `:docling-version-tests` | |
| 45 | + |
| 46 | +## Build System |
| 47 | + |
| 48 | +- **Gradle** with **Kotlin DSL** (`build.gradle.kts`, `settings.gradle.kts`). |
| 49 | +- **Java 17** is the baseline; CI also tests Java 21 and 25. |
| 50 | +- Convention plugins in `buildSrc/` keep module `build.gradle.kts` files minimal. |
| 51 | +- All dependency versions live in `gradle/libs.versions.toml` (version catalog). |
| 52 | +- Project version is read from `.github/project.yml` (`release.current-version`) by `docling-shared.gradle.kts`. |
| 53 | + |
| 54 | +## Common Build Commands |
| 55 | + |
| 56 | +```bash |
| 57 | +# Build and test a single module (recommended during development) |
| 58 | +./gradlew :docling-serve-api:clean :docling-serve-api:build |
| 59 | +./gradlew :docling-serve-client:clean :docling-serve-client:build |
| 60 | +./gradlew :docling-testcontainers:clean :docling-testcontainers:build |
| 61 | +./gradlew :docling-core:clean :docling-core:build |
| 62 | + |
| 63 | +# Run tests for a specific module |
| 64 | +./gradlew :docling-serve-api:test |
| 65 | +./gradlew :docling-serve-client:test |
| 66 | + |
| 67 | +# Specify a different Java version |
| 68 | +./gradlew -Pjava.version=21 :docling-serve-client:build |
| 69 | + |
| 70 | +# Generate aggregated test report |
| 71 | +./gradlew :test-report-aggregation:check |
| 72 | + |
| 73 | +# Build the documentation site |
| 74 | +./gradlew :docs:build |
| 75 | + |
| 76 | +# Run the version-compatibility CLI (dev mode, requires Docker) |
| 77 | +./gradlew :docling-version-tests:quarkusDev |
| 78 | +``` |
| 79 | + |
| 80 | +> **Note:** Tests in `docling-serve-client` that use WireMock also start a `DoclingServeContainer` via Testcontainers and therefore require a running Docker daemon. Tests in `docling-testcontainers` that use Testcontainers likewise require Docker, while tests in `docling-serve-api` do not use WireMock and can run without Docker. |
| 81 | +
|
| 82 | +## Java Code Conventions |
| 83 | + |
| 84 | +### Lombok |
| 85 | + |
| 86 | +All model/value types use **Lombok** annotations. The standard pattern for immutable value objects is: |
| 87 | + |
| 88 | +```java |
| 89 | +@lombok.Builder(toBuilder = true) |
| 90 | +@lombok.Getter |
| 91 | +@lombok.ToString |
| 92 | +@lombok.extern.jackson.Jacksonized // for Jackson deserialization via builder |
| 93 | +public class MyType { |
| 94 | + @Nullable |
| 95 | + private String optionalField; |
| 96 | + private String requiredField; |
| 97 | +} |
| 98 | +``` |
| 99 | + |
| 100 | +- Use `@lombok.Singular` on collection fields for builder singular-adder methods. |
| 101 | +- A `lombok.config` file exists in module source roots; do not remove it. |
| 102 | + |
| 103 | +### Nullability |
| 104 | + |
| 105 | +Use **JSpecify** annotations for nullability: |
| 106 | + |
| 107 | +```java |
| 108 | +import org.jspecify.annotations.Nullable; |
| 109 | + |
| 110 | +@Nullable |
| 111 | +private String mayBeNull; // field/param/return that may be null |
| 112 | +// no annotation = non-null by default |
| 113 | +``` |
| 114 | + |
| 115 | +### Jackson (dual Jackson 2 & 3 support) |
| 116 | + |
| 117 | +The project supports **both Jackson 2** (`com.fasterxml.jackson`) and **Jackson 3** (`tools.jackson`). When annotating models: |
| 118 | + |
| 119 | +- Use `com.fasterxml.jackson.annotation.*` for Jackson 2/3-compatible annotations (`@JsonProperty`, `@JsonInclude`, `@JsonSetter`, etc.). |
| 120 | +- Use `@tools.jackson.databind.annotation.JsonDeserialize` for Jackson 3-specific deserializer wiring. |
| 121 | +- Jackson is a `compileOnly` / `testImplementation` dependency in most modules — it must not leak as a transitive `api` dependency. |
| 122 | + |
| 123 | +### Module System (JPMS) |
| 124 | + |
| 125 | +Most library modules have a `module-info.java` (some non-library/test modules, such as `docling-version-tests`, do not). When adding new packages to a library module, export them in its `module-info.java`. Jackson and Lombok are `requires static` (optional at runtime). |
| 126 | + |
| 127 | +### Javadoc |
| 128 | + |
| 129 | +All public types and methods require Javadoc. The build runs Javadoc with `-Xdoclint:syntax,html`, but the `Javadoc` task is configured with `isFailOnError = false`, so Javadoc issues do not currently fail the build. Treat Javadoc warnings and errors as if they were fatal when contributing and keep Javadoc accurate and complete. |
| 130 | + |
| 131 | +### Code Style |
| 132 | + |
| 133 | +- `.editorconfig` at the repository root defines formatting rules. Follow it. |
| 134 | +- Follow [Conventional Commits](https://www.conventionalcommits.org/) for **all** commit messages and PR titles. |
| 135 | +- Commits should be atomic and squashed before merging. |
| 136 | + |
| 137 | +#### Conventional Commits format |
| 138 | + |
| 139 | +``` |
| 140 | +<type>[optional scope]: <short description> |
| 141 | +
|
| 142 | +[optional body] |
| 143 | +
|
| 144 | +[optional footer(s)] |
| 145 | +``` |
| 146 | + |
| 147 | +Common types used in this project: |
| 148 | + |
| 149 | +| Type | When to use | |
| 150 | +|---|---| |
| 151 | +| `feat` | A new feature or capability | |
| 152 | +| `fix` | A bug fix | |
| 153 | +| `docs` | Documentation-only changes (including `copilot-instructions.md`) | |
| 154 | +| `chore` | Build scripts, CI config, dependency bumps, tooling | |
| 155 | +| `refactor` | Code restructuring with no behaviour change | |
| 156 | +| `test` | Adding or updating tests | |
| 157 | +| `style` | Formatting / code style (no logic change) | |
| 158 | + |
| 159 | +Examples: |
| 160 | + |
| 161 | +``` |
| 162 | +docs: add copilot-instructions.md for coding agent onboarding |
| 163 | +feat(serve-client): add retry support to DoclingServeClient |
| 164 | +fix(core): handle null RefItem in DoclingDocument resolution |
| 165 | +chore: bump jackson to 2.18.3 |
| 166 | +``` |
| 167 | + |
| 168 | +> **Important:** Every commit pushed to this repository — including automated commits made by coding agents — **must** follow this format. PRs with non-conforming commit messages will be asked to reword or squash before merging. |
| 169 | +
|
| 170 | +## Testing Conventions |
| 171 | + |
| 172 | +- **JUnit 5** (`org.junit.jupiter`) is the test framework for all modules. |
| 173 | +- **AssertJ** is used for assertions (`assertThat(...)`, `assertThatThrownBy(...)`). |
| 174 | +- **WireMock** is used to mock HTTP servers in `docling-serve-client` tests. |
| 175 | +- **Testcontainers** is used for integration tests that need a real Docling Serve container (requires Docker). |
| 176 | +- Avoid adding new testing frameworks; use the ones already present. |
| 177 | + |
| 178 | +Test class naming convention: `*Tests.java` (e.g., `DoclingServeClientTests.java`). |
| 179 | + |
| 180 | +Run a single test class: |
| 181 | + |
| 182 | +```bash |
| 183 | +./gradlew :docling-serve-client:test --tests "ai.docling.serve.client.DoclingServeJackson3ClientTests" |
| 184 | +``` |
| 185 | + |
| 186 | +## Dependency Management |
| 187 | + |
| 188 | +- **Never hardcode dependency versions in `build.gradle.kts`**; add versions to `gradle/libs.versions.toml` (or use the BOM) and reference dependencies via the `libs.*` version catalog. |
| 189 | +- Check for security advisories before adding new libraries. |
| 190 | +- Keep Jackson as `compileOnly` where it is already `compileOnly` — it must not become a transitive API dependency. |
| 191 | + |
| 192 | +## Key APIs |
| 193 | + |
| 194 | +### `DoclingServeApi` (docling-serve-api) |
| 195 | + |
| 196 | +The main entry point for consumers. Uses SPI (`ServiceLoader`) to discover the client implementation at runtime: |
| 197 | + |
| 198 | +```java |
| 199 | +DoclingServeApi api = DoclingServeApi.builder() // discovers docling-serve-client on classpath |
| 200 | + .baseUrl("http://localhost:5001") |
| 201 | + .build(); |
| 202 | + |
| 203 | +ConvertDocumentResponse response = api.convertSource(request); |
| 204 | +``` |
| 205 | + |
| 206 | +### `DoclingServeContainer` (docling-testcontainers) |
| 207 | + |
| 208 | +Wraps a Testcontainers `GenericContainer` for `ghcr.io/docling-project/docling-serve`: |
| 209 | + |
| 210 | +```java |
| 211 | +DoclingServeContainer container = new DoclingServeContainer( |
| 212 | + DoclingServeContainerConfig.builder() |
| 213 | + .image(DoclingServeContainerConfig.DOCLING_IMAGE) // default image |
| 214 | + .enableUi(false) |
| 215 | + .startupTimeout(Duration.ofMinutes(2)) |
| 216 | + .build() |
| 217 | +); |
| 218 | +// Default port: 5001 (automatically mapped) |
| 219 | +// container.getApiUrl() → "http://localhost:<mapped_port>" |
| 220 | +``` |
| 221 | + |
| 222 | +Default image constant pattern: `DoclingServeContainerConfig.DOCLING_IMAGE` → `ghcr.io/docling-project/docling-serve:<DOCLING_IMAGE_VERSION>` (where `<DOCLING_IMAGE_VERSION>` is a concrete tag such as `v1.13.0`). |
| 223 | + |
| 224 | +## CI/CD |
| 225 | + |
| 226 | +GitHub Actions workflows in `.github/workflows/`: |
| 227 | + |
| 228 | +| Workflow | Trigger | What it does | |
| 229 | +|---|---|---| |
| 230 | +| `build.yml` | push/PR to `main` | Builds and tests all modules on Java 17, 21, 25 | |
| 231 | +| `release.yml` | manual/tag | Publishes to Maven Central via JReleaser | |
| 232 | +| `docs.yml` | push to `main` | Publishes MkDocs site to GitHub Pages | |
| 233 | +| `version-tests.yml` | scheduled / manual | Runs `docling-version-tests` CLI against GHCR image tags | |
| 234 | +| `dependabot-automerge.yml` | Dependabot PRs | Auto-merges minor/patch dependency updates | |
| 235 | + |
| 236 | +The `build.yml` matrix runs each module independently (`docling-serve-api`, `docling-serve-client`, `docling-testcontainers`, `docling-version-tests`) across Java versions. |
| 237 | + |
| 238 | +## Versioning and Release |
| 239 | + |
| 240 | +- Current version is in `.github/project.yml` under `release.current-version`. |
| 241 | +- Releases follow semantic versioning and are managed by [JReleaser](https://jreleaser.org/) (`jreleaser.yml`). |
| 242 | +- **Do not bump version manually** in build files; the version is read dynamically from `.github/project.yml`. |
| 243 | + |
| 244 | +## Documentation |
| 245 | + |
| 246 | +- Docs live in `docs/src/doc/docs/` as Markdown files, built with [MkDocs Material](https://squidfunk.github.io/mkdocs-material/). |
| 247 | +- When adding a new public API or module, add or update the corresponding `.md` file. |
| 248 | +- Template variables like `{{ gradle.project_version }}` are substituted during the Gradle docs build. |
| 249 | + |
| 250 | +## Known Errors and Workarounds |
| 251 | + |
| 252 | +- **`delombok` and `module-info.java`**: The Lombok Gradle plugin cannot process `module-info.java`. The `docling-lombok.gradle.kts` convention plugin temporarily renames `module-info.java` to `module-info.java.bak` during delombok and restores it afterward. Do not remove this workaround. |
| 253 | +- **Docker not available in CI for Testcontainers tests**: The `docling-serve-client` and `docling-testcontainers` modules include Testcontainers-based integration tests. These require Docker and may be skipped in environments without a Docker daemon. The `build.yml` CI workflow runs on `ubuntu-latest` which has Docker available. |
| 254 | +- **Jackson 2 vs Jackson 3**: The project supports both Jackson 2 and Jackson 3. Some annotations (`@JsonProperty`, `@JsonInclude`) work with both; others are version-specific. Use `com.fasterxml.jackson.annotation` for shared annotations and `tools.jackson.*` for Jackson 3-specific ones. |
0 commit comments