Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
48 changes: 37 additions & 11 deletions spring-batch-file-examples/README.md
Original file line number Diff line number Diff line change
@@ -1,22 +1,41 @@
# Spring Batch Examples | DB And Async
# Spring Batch File Readers

This project is a **Spring Boot** application demonstrating a **fully asynchronous Spring Batch job**, designed with a focus on **performance** and **scalability**.
This project is a **Spring Boot** application that demonstrates how to build **custom file readers using Spring Batch**, with a strong focus on **performance**, **scalability**, and **clean design**.

The main goal of this repository is to showcase **different strategies for reading files** depending on their size and characteristics, following **real-world batch processing patterns**.

---

## 🚀 Overview

The example showcases how to configure and run an **asynchronous Spring Batch job** that processes a large dataset efficiently.
The job reads **10,000 records** from a database table, simulating item processing by printing
`"item processed"` for each entry.
The project currently provides **custom Spring Batch `ItemReader` implementations** for reading Excel files, using **different approaches for small and large files**:

- **Small Excel files**: loaded and processed entirely in memory
- **Large Excel files**: streamed row by row to minimize memory usage

The architecture is intentionally extensible, allowing additional file formats (such as **CSV**) to be added in the future without changing the core batch flow.

---

## ⚙️ How It Works
## 📂 Supported File Types

### ✅ Currently Implemented

- **Small Excel files (`.xlsx`)**
- Suitable for files that fit comfortably in memory
- Simple and fast processing

- **Large Excel files (`.xlsx`)**
- Streaming-based reader
- Designed for large datasets
- Low memory footprint
- Handles empty rows gracefully

### 🕒 Planned

- **CSV files**
- Other structured file formats (as needed)

- The job leverages Spring Batch’s asynchronous capabilities to read and process data concurrently.
- An **H2 in-memory database** is used to store the sample data.
- The asynchronous behavior is enabled through a specific Spring profile.

---

Expand All @@ -25,6 +44,13 @@ The job reads **10,000 records** from a database table, simulating item processi
- **Java 21**
- **Spring Batch**
- **Spring Boot**
- **H2 Database**
- **Apache POI (Streaming API)**
- **pjfanning**

---

## 🎯 Project Goals

---
- Demonstrate **production-ready Spring Batch readers**
- Show how to handle **large files efficiently**
- Provide clean, extensible examples without framework overengineering
35 changes: 21 additions & 14 deletions spring-batch-file-examples/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@
<spring.batch.excel.version>0.2.0</spring.batch.excel.version>
<h2.version>2.4.240</h2.version>
<jacoco.version>0.8.14</jacoco.version>
<excel.streaming.reader.version>5.2.0</excel.streaming.reader.version>
</properties>

<dependencies>
Expand Down Expand Up @@ -69,20 +70,6 @@
<version>${instancio.version}</version>
</dependency>

<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-test</artifactId>
<version>${spring.boot.version}</version>
<scope>test</scope>
</dependency>

<dependency>
<groupId>org.springframework.batch</groupId>
<artifactId>spring-batch-test</artifactId>
<scope>test</scope>
<version>${spring.batch.version}</version>
</dependency>

<dependency>
<groupId>org.springframework.batch.extensions</groupId>
<artifactId>spring-batch-excel</artifactId>
Expand All @@ -102,6 +89,26 @@
<version>${poi.ooxml.version}</version>
</dependency>

<dependency>
<groupId>com.github.pjfanning</groupId>
<artifactId>excel-streaming-reader</artifactId>
<version>${excel.streaming.reader.version}</version>
</dependency>

<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-test</artifactId>
<version>${spring.boot.version}</version>
<scope>test</scope>
</dependency>

<dependency>
<groupId>org.springframework.batch</groupId>
<artifactId>spring-batch-test</artifactId>
<scope>test</scope>
<version>${spring.batch.version}</version>
</dependency>

</dependencies>

<build>
Expand Down
38 changes: 0 additions & 38 deletions spring-batch-file-examples/src/main/java/com/io/example/README.md

This file was deleted.

This file was deleted.

Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
package com.io.example.config;

import com.io.example.dto.StudentDto;
import com.io.example.reader.StreamingExcelItemReader;
import com.io.example.service.StudentService;
import lombok.RequiredArgsConstructor;
import org.springframework.batch.core.Step;
import org.springframework.batch.core.configuration.annotation.StepScope;
import org.springframework.batch.core.repository.JobRepository;
import org.springframework.batch.core.step.builder.StepBuilder;
import org.springframework.batch.item.ItemProcessor;
import org.springframework.batch.item.ItemWriter;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.core.io.ClassPathResource;
import org.springframework.transaction.PlatformTransactionManager;

import java.time.LocalDate;

@Configuration
@RequiredArgsConstructor
public class LargeExcelReadBatchConfig {

private final StudentService studentService;

@Bean
@StepScope
public StreamingExcelItemReader<StudentDto> largeExcelReader(
@Value("#{jobParameters['filePath']}") String filePath
) {
return new StreamingExcelItemReader<>(
new ClassPathResource(filePath),
row -> new StudentDto(
row.getCell(0).getStringCellValue(),
row.getCell(1).getStringCellValue(),
LocalDate.parse(row.getCell(2).getStringCellValue())
)
Comment on lines +34 to +38
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add defensive checks for cell values and date parsing.

The inline row mapper is fragile:

  1. getStringCellValue() will throw if a cell is null or not a string type
  2. LocalDate.parse() assumes ISO-8601 format and will throw DateTimeParseException on invalid input

The SmallExcelReadBatchConfig uses a dedicated StudentMapper class, which likely handles these edge cases. Consider reusing that mapper or adding similar defensive logic here.

Proposed defensive implementation
-                row -> new StudentDto(
-                        row.getCell(0).getStringCellValue(),
-                        row.getCell(1).getStringCellValue(),
-                        LocalDate.parse(row.getCell(2).getStringCellValue())
-                )
+                row -> {
+                    var cell0 = row.getCell(0);
+                    var cell1 = row.getCell(1);
+                    var cell2 = row.getCell(2);
+                    return new StudentDto(
+                            cell0 != null ? cell0.getStringCellValue() : null,
+                            cell1 != null ? cell1.getStringCellValue() : null,
+                            cell2 != null ? LocalDate.parse(cell2.getStringCellValue()) : null
+                    );
+                }

Based on learnings, this is a demonstration project, so minimal error handling may be acceptable if intentional.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
row -> new StudentDto(
row.getCell(0).getStringCellValue(),
row.getCell(1).getStringCellValue(),
LocalDate.parse(row.getCell(2).getStringCellValue())
)
row -> {
var cell0 = row.getCell(0);
var cell1 = row.getCell(1);
var cell2 = row.getCell(2);
return new StudentDto(
cell0 != null ? cell0.getStringCellValue() : null,
cell1 != null ? cell1.getStringCellValue() : null,
cell2 != null ? LocalDate.parse(cell2.getStringCellValue()) : null
);
}
🤖 Prompt for AI Agents
In
@spring-batch-file-examples/src/main/java/com/io/example/config/LargeExcelReadBatchConfig.java
around lines 34 - 38, The inline row mapper in LargeExcelReadBatchConfig that
creates StudentDto from row.getCell(...).getStringCellValue() and
LocalDate.parse(...) is fragile; replace it with reuse of the existing
StudentMapper from SmallExcelReadBatchConfig or implement equivalent defensive
logic: check for null cells and cell types (use fallback to cell.toString() or
empty string), catch DateTimeParseException around LocalDate.parse and apply a
safe parse or default value, and ensure StudentDto construction uses the
sanitized values; locate the lambda that constructs new StudentDto and either
call new StudentMapper().map(row) or add the null/type checks and try/catch
around date parsing before creating StudentDto.

);
}

@Bean
public ItemProcessor<StudentDto, StudentDto> largeExcelProcessor() {
return student -> student;
}

@Bean
public ItemWriter<StudentDto> largeExcelWriter() {
return items -> items.forEach(studentService::print);
}

@Bean
public Step largeExcelStep(
JobRepository jobRepository,
PlatformTransactionManager transactionManager,
StreamingExcelItemReader<StudentDto> largeExcelReader,
ItemProcessor<StudentDto, StudentDto> largeExcelProcessor,
ItemWriter<StudentDto> largeExcelWriter,
@Value("${spring.batch.chunk-size}") int chunkSize
) {
return new StepBuilder("largeExcelStep", jobRepository)
.<StudentDto, StudentDto>chunk(chunkSize, transactionManager)
.reader(largeExcelReader)
.processor(largeExcelProcessor)
.writer(largeExcelWriter)
.build();
}

}
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
package com.io.example.config;

import com.io.example.dto.StudentDto;
import com.io.example.mapper.StudentMapper;
import com.io.example.service.StudentService;
import lombok.RequiredArgsConstructor;
import org.springframework.batch.core.Step;
import org.springframework.batch.core.configuration.annotation.StepScope;
import org.springframework.batch.core.repository.JobRepository;
import org.springframework.batch.core.step.builder.StepBuilder;
import org.springframework.batch.extensions.excel.poi.PoiItemReader;
import org.springframework.batch.item.ItemProcessor;
import org.springframework.batch.item.ItemWriter;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.core.io.ClassPathResource;
import org.springframework.transaction.PlatformTransactionManager;

@Configuration
@RequiredArgsConstructor
public class SmallExcelReadBatchConfig {

private final StudentService studentService;

@Bean
@StepScope
public PoiItemReader<StudentDto> smallExcelReader(
@Value("#{jobParameters['filePath']}") String filePath
) {
PoiItemReader<StudentDto> reader = new PoiItemReader<>();
reader.setResource(new ClassPathResource(filePath));
reader.setLinesToSkip(1);
reader.setRowMapper(new StudentMapper());
return reader;
}

@Bean
public ItemProcessor<StudentDto, StudentDto> smallExcelProcessor() {
return student -> student;
}
Comment on lines +38 to +41
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Consider removing the identity processor.

The processor returns the input unchanged (student -> student). While this pattern is consistent with LargeExcelReadBatchConfig, you could simplify the step configuration by omitting the processor entirely if no transformation is needed.

♻️ Simplification option

If no processing is required, remove the processor bean and update the step configuration:

-    @Bean
-    public ItemProcessor<StudentDto, StudentDto> smallExcelProcessor() {
-        return student -> student;
-    }
-
     @Bean
     public Step smallExcelStep(JobRepository jobRepository,
                      PlatformTransactionManager transactionManager,
                      PoiItemReader<StudentDto> smallExcelReader,
-                     ItemProcessor<StudentDto, StudentDto> smallExcelProcessor,
                      ItemWriter<StudentDto> smallExcelWriter,
                      @Value("${spring.batch.chunk-size}") int chunkSize) {

         return new StepBuilder("smallExcelStep", jobRepository)
                 .<StudentDto, StudentDto>chunk(chunkSize, transactionManager)
                 .reader(smallExcelReader)
-                .processor(smallExcelProcessor)
                 .writer(smallExcelWriter)
                 .build();
     }

Note: Apply the same simplification to LargeExcelReadBatchConfig for consistency.

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In
@spring-batch-file-examples/src/main/java/com/io/example/config/SmallExcelReadBatchConfig.java
around lines 38 - 41, Remove the unnecessary identity ItemProcessor bean in
SmallExcelReadBatchConfig by deleting the smallExcelProcessor() method and
update the step configuration that currently references it to build the step
without a processor (i.e., use .<chunk/reader/writer> configuration without
.processor(...)); apply the same change to LargeExcelReadBatchConfig for
consistency so both configs omit the identity processor bean and steps run
directly from reader to writer.


@Bean
public ItemWriter<StudentDto> smallExcelWriter() {
return items -> items.forEach(studentService::print);
}

@Bean
public Step smallExcelStep(JobRepository jobRepository,
PlatformTransactionManager transactionManager,
PoiItemReader<StudentDto> smallExcelReader,
ItemProcessor<StudentDto, StudentDto> smallExcelProcessor,
ItemWriter<StudentDto> smallExcelWriter,
@Value("${spring.batch.chunk-size}") int chunkSize) {

return new StepBuilder("smallExcelStep", jobRepository)
.<StudentDto, StudentDto>chunk(chunkSize, transactionManager)
.reader(smallExcelReader)
.processor(smallExcelProcessor)
.writer(smallExcelWriter)
.build();
}
}
Loading