Skip to content

Commit 7a25e4e

Browse files
CopilotedeandreaCopilot
authored
feat: Add .github/copilot-instructions.md for coding agent onboarding (#362)
* Initial plan * Add .github/copilot-instructions.md for coding agent onboarding Co-authored-by: edeandrea <363447+edeandrea@users.noreply.github.com> * docs: expand Conventional Commits guidance in copilot-instructions.md Co-authored-by: edeandrea <363447+edeandrea@users.noreply.github.com> * docs: Update .github/copilot-instructions.md Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Signed-off-by: Eric Deandrea <eric@ericdeandrea.dev> * docs: Update .github/copilot-instructions.md Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Signed-off-by: Eric Deandrea <eric@ericdeandrea.dev> * docs: fix accuracy issues in copilot-instructions.md Co-authored-by: edeandrea <363447+edeandrea@users.noreply.github.com> --------- Signed-off-by: Eric Deandrea <eric@ericdeandrea.dev> Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: edeandrea <363447+edeandrea@users.noreply.github.com> Co-authored-by: Eric Deandrea <eric.deandrea@ibm.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
1 parent 188b0d4 commit 7a25e4e

1 file changed

Lines changed: 254 additions & 0 deletions

File tree

.github/copilot-instructions.md

Lines changed: 254 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,254 @@
1+
# Docling Java – Copilot Instructions
2+
3+
## Project Overview
4+
5+
`docling-java` is a multi-module Java library providing a Java API for [Docling](https://github.com/docling-project), an IBM Research project for AI-based document processing (PDF, DOCX, PPTX, images, audio, etc.).
6+
7+
The project is published to Maven Central under the `ai.docling` group ID.
8+
9+
## Repository Structure
10+
11+
```
12+
docling-java/
13+
├── buildSrc/ # Convention plugins (Gradle Kotlin DSL)
14+
│ └── src/main/kotlin/
15+
│ ├── docling-shared.gradle.kts # group/version resolution
16+
│ ├── docling-java-shared.gradle.kts # java-library + jacoco + JUnit 5 test config
17+
│ ├── docling-lombok.gradle.kts # Lombok setup
18+
│ └── docling-release.gradle.kts # maven-publish setup
19+
├── gradle/libs.versions.toml # Version catalog (single source of truth for deps)
20+
├── gradle.properties # Gradle settings (java.version=17, parallel, caching)
21+
├── settings.gradle.kts # Module declarations
22+
├── docling-core/ # Core DoclingDocument model (no runtime deps)
23+
├── docling-serve/
24+
│ ├── docling-serve-api/ # Framework-agnostic API interfaces + request/response types
25+
│ └── docling-serve-client/ # Reference HTTP client (Java HttpClient + Jackson)
26+
├── docling-testcontainers/ # Testcontainers module for Docling Serve
27+
├── docling-testing/
28+
│ └── docling-version-tests/ # Quarkus/Picocli CLI for version-compatibility testing
29+
├── docs/ # MkDocs documentation site
30+
├── test-report-aggregation/ # Aggregated JaCoCo + JUnit reports
31+
└── .github/
32+
├── project.yml # release.current-version (source of truth for version)
33+
└── workflows/ # CI: build.yml, release.yml, docs.yml, version-tests.yml
34+
```
35+
36+
### Module ↔ Gradle project name mapping
37+
38+
| Directory path | Gradle project name |
39+
|---|---|
40+
| `docling-core` | `:docling-core` |
41+
| `docling-serve/docling-serve-api` | `:docling-serve-api` |
42+
| `docling-serve/docling-serve-client` | `:docling-serve-client` |
43+
| `docling-testcontainers` | `:docling-testcontainers` |
44+
| `docling-testing/docling-version-tests` | `:docling-version-tests` |
45+
46+
## Build System
47+
48+
- **Gradle** with **Kotlin DSL** (`build.gradle.kts`, `settings.gradle.kts`).
49+
- **Java 17** is the baseline; CI also tests Java 21 and 25.
50+
- Convention plugins in `buildSrc/` keep module `build.gradle.kts` files minimal.
51+
- All dependency versions live in `gradle/libs.versions.toml` (version catalog).
52+
- Project version is read from `.github/project.yml` (`release.current-version`) by `docling-shared.gradle.kts`.
53+
54+
## Common Build Commands
55+
56+
```bash
57+
# Build and test a single module (recommended during development)
58+
./gradlew :docling-serve-api:clean :docling-serve-api:build
59+
./gradlew :docling-serve-client:clean :docling-serve-client:build
60+
./gradlew :docling-testcontainers:clean :docling-testcontainers:build
61+
./gradlew :docling-core:clean :docling-core:build
62+
63+
# Run tests for a specific module
64+
./gradlew :docling-serve-api:test
65+
./gradlew :docling-serve-client:test
66+
67+
# Specify a different Java version
68+
./gradlew -Pjava.version=21 :docling-serve-client:build
69+
70+
# Generate aggregated test report
71+
./gradlew :test-report-aggregation:check
72+
73+
# Build the documentation site
74+
./gradlew :docs:build
75+
76+
# Run the version-compatibility CLI (dev mode, requires Docker)
77+
./gradlew :docling-version-tests:quarkusDev
78+
```
79+
80+
> **Note:** Tests in `docling-serve-client` that use WireMock also start a `DoclingServeContainer` via Testcontainers and therefore require a running Docker daemon. Tests in `docling-testcontainers` that use Testcontainers likewise require Docker, while tests in `docling-serve-api` do not use WireMock and can run without Docker.
81+
82+
## Java Code Conventions
83+
84+
### Lombok
85+
86+
All model/value types use **Lombok** annotations. The standard pattern for immutable value objects is:
87+
88+
```java
89+
@lombok.Builder(toBuilder = true)
90+
@lombok.Getter
91+
@lombok.ToString
92+
@lombok.extern.jackson.Jacksonized // for Jackson deserialization via builder
93+
public class MyType {
94+
@Nullable
95+
private String optionalField;
96+
private String requiredField;
97+
}
98+
```
99+
100+
- Use `@lombok.Singular` on collection fields for builder singular-adder methods.
101+
- A `lombok.config` file exists in module source roots; do not remove it.
102+
103+
### Nullability
104+
105+
Use **JSpecify** annotations for nullability:
106+
107+
```java
108+
import org.jspecify.annotations.Nullable;
109+
110+
@Nullable
111+
private String mayBeNull; // field/param/return that may be null
112+
// no annotation = non-null by default
113+
```
114+
115+
### Jackson (dual Jackson 2 & 3 support)
116+
117+
The project supports **both Jackson 2** (`com.fasterxml.jackson`) and **Jackson 3** (`tools.jackson`). When annotating models:
118+
119+
- Use `com.fasterxml.jackson.annotation.*` for Jackson 2/3-compatible annotations (`@JsonProperty`, `@JsonInclude`, `@JsonSetter`, etc.).
120+
- Use `@tools.jackson.databind.annotation.JsonDeserialize` for Jackson 3-specific deserializer wiring.
121+
- Jackson is a `compileOnly` / `testImplementation` dependency in most modules — it must not leak as a transitive `api` dependency.
122+
123+
### Module System (JPMS)
124+
125+
Most library modules have a `module-info.java` (some non-library/test modules, such as `docling-version-tests`, do not). When adding new packages to a library module, export them in its `module-info.java`. Jackson and Lombok are `requires static` (optional at runtime).
126+
127+
### Javadoc
128+
129+
All public types and methods require Javadoc. The build runs Javadoc with `-Xdoclint:syntax,html`, but the `Javadoc` task is configured with `isFailOnError = false`, so Javadoc issues do not currently fail the build. Treat Javadoc warnings and errors as if they were fatal when contributing and keep Javadoc accurate and complete.
130+
131+
### Code Style
132+
133+
- `.editorconfig` at the repository root defines formatting rules. Follow it.
134+
- Follow [Conventional Commits](https://www.conventionalcommits.org/) for **all** commit messages and PR titles.
135+
- Commits should be atomic and squashed before merging.
136+
137+
#### Conventional Commits format
138+
139+
```
140+
<type>[optional scope]: <short description>
141+
142+
[optional body]
143+
144+
[optional footer(s)]
145+
```
146+
147+
Common types used in this project:
148+
149+
| Type | When to use |
150+
|---|---|
151+
| `feat` | A new feature or capability |
152+
| `fix` | A bug fix |
153+
| `docs` | Documentation-only changes (including `copilot-instructions.md`) |
154+
| `chore` | Build scripts, CI config, dependency bumps, tooling |
155+
| `refactor` | Code restructuring with no behaviour change |
156+
| `test` | Adding or updating tests |
157+
| `style` | Formatting / code style (no logic change) |
158+
159+
Examples:
160+
161+
```
162+
docs: add copilot-instructions.md for coding agent onboarding
163+
feat(serve-client): add retry support to DoclingServeClient
164+
fix(core): handle null RefItem in DoclingDocument resolution
165+
chore: bump jackson to 2.18.3
166+
```
167+
168+
> **Important:** Every commit pushed to this repository — including automated commits made by coding agents — **must** follow this format. PRs with non-conforming commit messages will be asked to reword or squash before merging.
169+
170+
## Testing Conventions
171+
172+
- **JUnit 5** (`org.junit.jupiter`) is the test framework for all modules.
173+
- **AssertJ** is used for assertions (`assertThat(...)`, `assertThatThrownBy(...)`).
174+
- **WireMock** is used to mock HTTP servers in `docling-serve-client` tests.
175+
- **Testcontainers** is used for integration tests that need a real Docling Serve container (requires Docker).
176+
- Avoid adding new testing frameworks; use the ones already present.
177+
178+
Test class naming convention: `*Tests.java` (e.g., `DoclingServeClientTests.java`).
179+
180+
Run a single test class:
181+
182+
```bash
183+
./gradlew :docling-serve-client:test --tests "ai.docling.serve.client.DoclingServeJackson3ClientTests"
184+
```
185+
186+
## Dependency Management
187+
188+
- **Never hardcode dependency versions in `build.gradle.kts`**; add versions to `gradle/libs.versions.toml` (or use the BOM) and reference dependencies via the `libs.*` version catalog.
189+
- Check for security advisories before adding new libraries.
190+
- Keep Jackson as `compileOnly` where it is already `compileOnly` — it must not become a transitive API dependency.
191+
192+
## Key APIs
193+
194+
### `DoclingServeApi` (docling-serve-api)
195+
196+
The main entry point for consumers. Uses SPI (`ServiceLoader`) to discover the client implementation at runtime:
197+
198+
```java
199+
DoclingServeApi api = DoclingServeApi.builder() // discovers docling-serve-client on classpath
200+
.baseUrl("http://localhost:5001")
201+
.build();
202+
203+
ConvertDocumentResponse response = api.convertSource(request);
204+
```
205+
206+
### `DoclingServeContainer` (docling-testcontainers)
207+
208+
Wraps a Testcontainers `GenericContainer` for `ghcr.io/docling-project/docling-serve`:
209+
210+
```java
211+
DoclingServeContainer container = new DoclingServeContainer(
212+
DoclingServeContainerConfig.builder()
213+
.image(DoclingServeContainerConfig.DOCLING_IMAGE) // default image
214+
.enableUi(false)
215+
.startupTimeout(Duration.ofMinutes(2))
216+
.build()
217+
);
218+
// Default port: 5001 (automatically mapped)
219+
// container.getApiUrl() → "http://localhost:<mapped_port>"
220+
```
221+
222+
Default image constant pattern: `DoclingServeContainerConfig.DOCLING_IMAGE``ghcr.io/docling-project/docling-serve:<DOCLING_IMAGE_VERSION>` (where `<DOCLING_IMAGE_VERSION>` is a concrete tag such as `v1.13.0`).
223+
224+
## CI/CD
225+
226+
GitHub Actions workflows in `.github/workflows/`:
227+
228+
| Workflow | Trigger | What it does |
229+
|---|---|---|
230+
| `build.yml` | push/PR to `main` | Builds and tests all modules on Java 17, 21, 25 |
231+
| `release.yml` | manual/tag | Publishes to Maven Central via JReleaser |
232+
| `docs.yml` | push to `main` | Publishes MkDocs site to GitHub Pages |
233+
| `version-tests.yml` | scheduled / manual | Runs `docling-version-tests` CLI against GHCR image tags |
234+
| `dependabot-automerge.yml` | Dependabot PRs | Auto-merges minor/patch dependency updates |
235+
236+
The `build.yml` matrix runs each module independently (`docling-serve-api`, `docling-serve-client`, `docling-testcontainers`, `docling-version-tests`) across Java versions.
237+
238+
## Versioning and Release
239+
240+
- Current version is in `.github/project.yml` under `release.current-version`.
241+
- Releases follow semantic versioning and are managed by [JReleaser](https://jreleaser.org/) (`jreleaser.yml`).
242+
- **Do not bump version manually** in build files; the version is read dynamically from `.github/project.yml`.
243+
244+
## Documentation
245+
246+
- Docs live in `docs/src/doc/docs/` as Markdown files, built with [MkDocs Material](https://squidfunk.github.io/mkdocs-material/).
247+
- When adding a new public API or module, add or update the corresponding `.md` file.
248+
- Template variables like `{{ gradle.project_version }}` are substituted during the Gradle docs build.
249+
250+
## Known Errors and Workarounds
251+
252+
- **`delombok` and `module-info.java`**: The Lombok Gradle plugin cannot process `module-info.java`. The `docling-lombok.gradle.kts` convention plugin temporarily renames `module-info.java` to `module-info.java.bak` during delombok and restores it afterward. Do not remove this workaround.
253+
- **Docker not available in CI for Testcontainers tests**: The `docling-serve-client` and `docling-testcontainers` modules include Testcontainers-based integration tests. These require Docker and may be skipped in environments without a Docker daemon. The `build.yml` CI workflow runs on `ubuntu-latest` which has Docker available.
254+
- **Jackson 2 vs Jackson 3**: The project supports both Jackson 2 and Jackson 3. Some annotations (`@JsonProperty`, `@JsonInclude`) work with both; others are version-specific. Use `com.fasterxml.jackson.annotation` for shared annotations and `tools.jackson.*` for Jackson 3-specific ones.

0 commit comments

Comments
 (0)