TIKA-4703: Add Docker CI pipelines for tika-server and tika-grpc#2715
TIKA-4703: Add Docker CI pipelines for tika-server and tika-grpc#2715nddipiazza wants to merge 3 commits intomainfrom
Conversation
Move Docker build infrastructure into the main tika repo so that Docker image releases are tied directly to Tika releases rather than requiring cross-repo coordination with tika-docker/tika-grpc-docker. Snapshot workflow (main branch push): - Builds tika-server minimal and full images from Maven output - Builds tika-grpc image from Maven output - Pushes snapshot tags to Docker Hub (e.g. 4.0.0-SNAPSHOT) Release workflow (version tag push): - Builds tika-server minimal/full from Apache mirror JARs with GPG verification (multi-arch: amd64, arm64, arm/v7, s390x) - Builds tika-grpc from Maven output (multi-arch: amd64, arm64) - Pushes versioned + latest tags to Docker Hub Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- tika-server snapshot Dockerfiles: use assembly tgz (thin JAR + lib/) instead of the thin JAR alone, matching the 4.x packaging model - tika-grpc: bundle default-tika-config.json so the server starts without requiring a config volume mount - tika-grpc: pass -c, -p, and --plugin-roots as CLI args instead of system properties so TikaGrpcServer actually picks them up - tika-grpc: default port is now 9090 (configurable via TIKA_GRPC_PORT) Tested locally: all three images (minimal, full, grpc) build and start successfully. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
TikaGrpcServer now falls back to a bundled default-tika-config.json from the classpath when no -c flag is provided, matching normal Java application conventions. The default config is empty (no pre-configured fetchers/emitters) — users configure these at runtime. This removes the need for a separate config file in the Docker image. The entrypoint only passes -c when TIKA_CONFIG env var is explicitly set. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Adding clarification: the Docker images are published to the existing Docker Hub repositories:
These are the same Docker Hub repos currently used by |
| && apt-get clean -y \ | ||
| && rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/* | ||
|
|
||
| EXPOSE 9090 |
There was a problem hiding this comment.
Use the ARG suggested previously to run as a nonroot user
| EXPOSE 9090 | |
| USER $UID_GID | |
| EXPOSE 9090 |
| # License for the specific language governing permissions and limitations under | ||
| # the License. | ||
|
|
||
| FROM ubuntu:plucky |
There was a problem hiding this comment.
The tika-server Dockerfiles run as USER 35002:35002 (matching the upstream tika-docker convention), but this Dockerfile has no USER directive. The gRPC server runs as root. docker-tool.sh even asserts 35002:35002 in its test function.
Should just need to add ARG UID_GID="35002:35002" like in the tika-server Dockerfile and reference that ARG is a USER directive.
| FROM ubuntu:plucky | |
| # "random" uid/gid hopefully not used anywhere else | |
| # This needs to be set globally and then referenced in | |
| # the subsequent stages -- see TIKA-3912 | |
| ARG UID_GID="35002:35002" | |
| FROM ubuntu:plucky |
| COPY plugins/ /tika/plugins/ | ||
| COPY config/ /tika/config/ | ||
| COPY bin/ /tika/bin | ||
| ARG JRE='openjdk-17-jre-headless' |
There was a problem hiding this comment.
The tika-server images default to openjdk-21-jre-headless. Any reason to pin grpc to 17? If intentional, might be worth a comment explaining why, otherwise someone will "fix" it later and potentially break something.
Summary
Moves Docker build infrastructure into the main tika repo so that Docker image releases are tied directly to Tika releases, eliminating the need for cross-repo coordination with
tika-dockerandtika-grpc-docker.apache/tika,apache/tika-full, andapache/tika-grpcsnapshot images to Docker Hublatesttags for all three imagestika-dockerrepo (source of truth), plus newDockerfile.snapshotvariants that use the Maven assembly output instead of downloading from Apache mirrorsdefault-tika-config.jsonfrom classpath when no-cflag is provided, matching standard Java application conventionsRequired Setup
DOCKERHUB_USERNAMEandDOCKERHUB_TOKENsecrets must be configured in the repo settings for the workflows to push images.Test plan
🤖 Generated with Claude Code