A curated, production-grade collection of Spring Boot modules demonstrating AI and LLM integrations — covering multi-provider chat, RAG pipelines, tool calling, MCP servers/clients, chat memory, multimodality, structured outputs, prompt engineering, observability, and much more.
- Overview
- Repository Structure
- Modules
- Prerequisites
- Getting Started
- Configuration
- HTTP Request Examples
- Technologies & Topics
Spring AI Integration is a hands-on, modular reference repository for developers who want to learn, explore, and build AI-powered applications using the Spring AI framework on top of Spring Boot.
Each sub-project is a self-contained Spring Boot application that showcases a specific Spring AI feature or integration pattern. The modules range from beginner-friendly basic chat completions all the way to advanced topics like Model Context Protocol (MCP) security, financial RAG pipelines, prompt caching, observability metrics, and Docker-based local model execution.
- Multi-provider support — OpenAI (GPT-4), Anthropic (Claude), and Ollama (local models) with easy provider swapping
- Production-focused patterns — advisors, memory management, vector stores, structured outputs, and prompt caching
- MCP ecosystem — full MCP server and client implementations across Stdio, WebFlux, and WebMVC transports
- No vendor lock-in — Spring AI's unified API abstracts away provider specifics
- Learning-first design — every module is focused, well-scoped, and independently runnable
Spring-AI-Integration/
│
├── spring-with-ai/ # Introductory Spring AI basics
├── spring-ai-02-chat-with-llms/ # Chat with OpenAI (GPT-4)
├── spring-ai-03-chat-with-claude/ # Chat with Anthropic Claude
├── spring-ai-04-chat-with-ollama/ # Chat with local Ollama models
├── spring-ai-chat-options/ # Runtime chat configuration options
│
├── prompt-templates/ # Prompt templating with variables
├── prompt-stuffing/ # Prompt stuffing patterns
│
├── structured-output/ # Bean/Map structured LLM outputs
├── native-structured-output/ # Native structured output (JSON mode)
│
├── chat-memory/ # In-memory and persistent chat history
├── compacting-chat-memory-advisor/ # Memory compaction with advisors
│
├── tool-calling/ # Function / tool calling integration
│
├── spring-ai-rag-vector-store/ # RAG with vector store (PGVector/simple)
├── spring-ai-financial-rag/ # Financial domain RAG pipeline
├── prompt-stuffing/ # In-context document injection
│
├── multimodality/ # Image + text multimodal inputs
│
├── mcp-server-stdio/ # MCP Server via standard I/O
├── mcp-server-webflux/ # MCP Server via WebFlux SSE
├── mcp-client-stdio/ # MCP Client for stdio transport
├── spring-ai-mcp-client/ # Spring AI MCP client integration
├── spring-io-mcp-server/ # Spring.io MCP server example
├── spring-ai-mcp-elicitation/ # MCP elicitation patterns
├── spring-ai-mcp-security/ # Secured MCP with OAuth2/JWT
│
├── spring-ai-metrics/ # Observability & Micrometer metrics
├── spring-ai-prompt-caching/ # Prompt caching for cost/latency
├── spring-ai-web-search/ # Web search tool integration
├── docker-model-runner/ # Docker-based local model runner
│
└── http-requests.http # HTTP client sample requests
The entry point into Spring AI. Demonstrates basic ChatClient usage, autoconfiguration, and simple request/response patterns. Ideal starting point for beginners.
Chat with OpenAI GPT-4 using Spring AI's ChatClient API. Shows how to configure the OpenAI starter, send prompts, handle responses, and stream tokens.
Key concepts: OpenAiChatModel, ChatClient, Flux<String> streaming, system/user message roles.
Chat with Anthropic Claude (claude-3-5-sonnet / claude-3-haiku). Demonstrates Spring AI's Anthropic integration including prompt configuration and response handling.
Key concepts: AnthropicChatModel, multi-turn conversation, system prompt configuration.
Chat with locally running LLMs via Ollama (e.g., Llama 3, Mistral, Phi-3). Zero cloud dependency — everything runs on your machine.
Key concepts: OllamaChatModel, local inference, Ollama Docker container setup.
Demonstrates runtime configuration of chat parameters — temperature, top-p, max tokens, frequency penalty, etc. — both at startup and per-request level.
Key concepts: ChatOptions, OpenAiChatOptions, OllamaChatOptions, per-call overrides.
Shows how to use Spring AI's PromptTemplate with parameterized variables, allowing dynamic prompt construction from templates and input maps.
Key concepts: PromptTemplate, Message types, variable interpolation, system vs. user templates.
Demonstrates the prompt stuffing pattern — injecting external document content directly into the prompt context rather than using a vector store, useful for smaller documents or quick prototyping.
Key concepts: Document content injection, context window usage, in-context retrieval.
Shows how to extract structured Java objects (POJOs, records, Maps, Lists) from LLM responses using BeanOutputConverter, MapOutputConverter, and ListOutputConverter.
Key concepts: OutputConverter, BeanOutputConverter<T>, format instructions, JSON parsing.
Uses native JSON mode (where supported by the provider) for guaranteed-valid JSON output from the LLM, bypassing prompt-based format instructions.
Key concepts: responseFormat, native JSON mode (OpenAI structured outputs), schema enforcement.
Implements conversation memory to maintain chat history across turns. Covers both in-memory (for development) and persistent storage strategies.
Key concepts: MessageChatMemoryAdvisor, InMemoryChatMemory, ChatMemory, conversationId.
Demonstrates how to handle long conversations using a memory compaction advisor that summarizes older messages when the context window limit approaches.
Key concepts: AbstractChatMemoryAdvisor, compaction strategy, token-aware summarization.
Full example of Spring AI's function/tool calling — registering Java methods as callable tools that the LLM can invoke during a conversation to fetch real-time data or execute logic.
Key concepts: @Tool, FunctionCallback, FunctionCallbackWrapper, tool registration, result handling.
@Bean
public FunctionCallback weatherFunction() {
return FunctionCallbackWrapper.builder(new WeatherService())
.withName("getWeather")
.withDescription("Get the current weather for a given city")
.withInputType(WeatherRequest.class)
.build();
}Full RAG pipeline implementation: document ingestion, chunking, embedding generation, vector store persistence, and similarity-based retrieval at query time.
Key concepts: VectorStore, SimpleVectorStore, TokenTextSplitter, EmbeddingModel, QuestionAnswerAdvisor, document readers (PDF, text).
Architecture:
Document → Splitter → EmbeddingModel → VectorStore
↓
User Query → EmbeddingModel → Similarity Search → Retrieved Chunks
↓
ChatClient + Context → LLM → Answer
A domain-specific RAG application focused on financial documents. Ingests financial reports, filings, or market data and enables natural language Q&A over the content.
Key concepts: Domain-specific chunking strategies, finance-tuned prompts, retrieval confidence, source attribution.
Demonstrates vision + text multimodal capabilities — sending images alongside text prompts to multimodal models (e.g., GPT-4o, Claude 3, LLaVA via Ollama).
Key concepts: UserMessage with media attachments, Media type, image URL and base64 inputs, vision model configuration.
UserMessage userMessage = new UserMessage(
"Describe what you see in this image.",
List.of(new Media(MimeTypeUtils.IMAGE_PNG, imageResource))
);Spring AI Integration provides a comprehensive set of MCP modules covering server implementations, client integrations, security, and advanced patterns.
A Stdio-transport MCP server — communicates with the client via standard input/output streams. Ideal for local tool use with AI assistants like Claude Desktop.
Key concepts: StdioServerTransport, tool registration, MCP spec compliance.
A reactive MCP server using WebFlux SSE (Server-Sent Events) transport — suitable for HTTP-based, cloud-deployed MCP deployments.
Key concepts: WebFluxSseServerTransport, reactive streams, SSE endpoint, MCP tool exposure.
An MCP client that connects to a Stdio-based MCP server and invokes its registered tools through the Spring AI chat flow.
Key concepts: StdioClientTransport, McpSyncClient, tool discovery, function callback bridging.
A full Spring AI MCP client integration using the high-level Spring AI abstractions — connects to any MCP-compatible server and exposes its tools automatically to the ChatClient.
Key concepts: McpFunctionCallback, auto-tool-registration, Spring Boot autoconfiguration for MCP.
An MCP server modeled after the Spring.io content structure — exposes tools for querying Spring ecosystem resources, projects, and documentation.
Demonstrates the MCP elicitation pattern — the server proactively requests additional information from the user/client during a tool call.
Key concepts: Elicitation requests, dynamic input prompting, conversation-aware tool calls.
Implements OAuth2 / JWT-secured MCP — demonstrates how to protect MCP server endpoints with Spring Security, requiring proper bearer token authentication from MCP clients.
Key concepts: Spring Security OAuth2, JWT validation, SecurityFilterChain, protected tool endpoints.
Integrates Micrometer observability into Spring AI — tracking token usage, latency, model calls, and errors via meters and traces. Compatible with Prometheus, Grafana, and Zipkin.
Key concepts: ObservationRegistry, ChatClientObservation, custom metrics, Spring Boot Actuator, Micrometer.
Demonstrates prompt caching (supported by Anthropic Claude and other providers) to reduce latency and API cost when the same system prompt or context is reused across requests.
Key concepts: Cache control headers, Anthropic cache_control API, cost optimization, cache hit/miss metrics.
Integrates real-time web search as a tool available to the LLM — allowing the model to fetch up-to-date information from the internet during a conversation.
Key concepts: Web search tool registration, search result injection, citation handling, Brave Search / Tavily integration.
Shows how to use Docker's built-in Model Runner (available in Docker Desktop 4.40+) to run LLMs locally via a Docker-native endpoint, bypassing the need for a separate Ollama installation.
Key concepts: Docker Model Runner endpoint, spring.ai.openai.base-url override, local model execution, zero-dependency local AI.
| Requirement | Version | Notes |
|---|---|---|
| Java | 17+ | JDK 21 recommended |
| Maven | 3.8+ | Or use included ./mvnw wrapper |
| Spring Boot | 3.x | Auto-configured via Spring AI starters |
| Spring AI | 1.x | See individual module pom.xml |
| Docker | 24+ | Required for vector DBs, Ollama, Model Runner |
| OpenAI API Key | — | Required for OpenAI modules |
| Anthropic API Key | — | Required for Claude modules |
| Ollama | Latest | Required for local model modules |
git clone https://github.com/drissiOmar98/Spring-AI-Integration.git
cd Spring-AI-IntegrationCreate a .env file or export environment variables:
# OpenAI (GPT-4, embeddings)
export OPENAI_API_KEY=sk-your-openai-key
# Anthropic (Claude)
export ANTHROPIC_API_KEY=sk-ant-your-anthropic-keyFor modules using Ollama:
docker run -d -p 11434:11434 --name ollama ollama/ollama
docker exec -it ollama ollama pull llama3For modules using PGVector (RAG):
docker run -d \
-e POSTGRES_USER=postgres \
-e POSTGRES_PASSWORD=postgres \
-e POSTGRES_DB=vectordb \
-p 5432:5432 \
pgvector/pgvector:pg16Navigate to any module and start it:
cd spring-ai-02-chat-with-llms
./mvnw spring-boot:runOr build and run the JAR:
./mvnw clean package -DskipTests
java -jar target/*.jarEach module has its own application.properties or application.yml. Common configuration patterns:
# OpenAI
spring:
ai:
openai:
api-key: ${OPENAI_API_KEY}
chat:
options:
model: gpt-4o
temperature: 0.7
# Anthropic
spring:
ai:
anthropic:
api-key: ${ANTHROPIC_API_KEY}
chat:
options:
model: claude-3-5-sonnet-20241022
# Ollama (local)
spring:
ai:
ollama:
base-url: http://localhost:11434
chat:
options:
model: llama3
# Vector Store (PGVector)
spring:
ai:
vectorstore:
pgvector:
index-type: HNSW
distance-type: COSINE_DISTANCE
dimensions: 1536The root http-requests.http file contains ready-to-use REST client examples for all modules. These can be run directly in IntelliJ IDEA or VS Code with the REST Client extension.
### Chat with OpenAI
POST http://localhost:8080/api/chat
Content-Type: application/json
{
"message": "What is Spring AI?",
"conversationId": "session-1"
}
### RAG Query
POST http://localhost:8080/api/rag/query
Content-Type: application/json
{
"question": "What were the Q3 financial results?"
}
### Tool Calling
POST http://localhost:8080/api/chat/tools
Content-Type: application/json
{
"message": "What is the weather like in Paris right now?"
}
### Multimodal (image + text)
POST http://localhost:8080/api/multimodal
Content-Type: application/json
{
"message": "Describe this chart",
"imageUrl": "https://example.com/chart.png"
}| Category | Technologies |
|---|---|
| Core Framework | Spring Boot 3.x, Spring AI 1.x, Spring WebFlux |
| LLM Providers | OpenAI (GPT-4o), Anthropic (Claude 3.5), Ollama (Llama 3, Mistral, Phi-3) |
| Vector Stores | PGVector, SimpleVectorStore, In-Memory |
| Embeddings | OpenAI text-embedding-3-small/large, Ollama nomic-embed-text |
| MCP | Stdio, WebFlux SSE, WebMVC SSE transports |
| Security | Spring Security, OAuth2, JWT |
| Observability | Micrometer, Spring Boot Actuator, Prometheus, Zipkin |
| Persistence | PostgreSQL, JDBC Chat Memory |
| Build | Apache Maven, Spring Boot Maven Plugin |
| Infrastructure | Docker, Docker Model Runner, Docker Compose |
GitHub Topics: java spring-ai springboot llms rag mcp mcp-server mcp-client mcp-security tool-calling prompt-engineering advisors multimodality structured-output vector-stores embedding open-ai ollama web-search docker-model-runner
Built with ❤️ using Spring AI — the portable, provider-agnostic AI framework for Java developers.