[Discuss] Doris Roadmap 2026

**_"Scale Intelligence, Accelerate Insights"_**

Building on 2025's achievements in vector search and indexing capabilities, Apache Doris continues to deepen its AI support in 2026. This roadmap focuses on advancing **AI & Hybrid Search** capabilities while enhancing query performance, storage efficiency, and data lake integration.

**AI & Hybrid Search Innovation:**
- Scale vector index to support **10 billion vectors** per table with disk-based ANN
- Enhance full-text search with query expressions, scoring, and multi-index support
- Extend hybrid search to **Iceberg** for unified analytics

**Core Enhancements:**
- Query engine optimization for complex data types and ETL processing
- Storage improvements for ultra-large tablets and compute-storage separation
- Data lake integration with Iceberg V3 and Paimon support

[Roadmap 2025](https://github.com/apache/doris/issues/47948)
[Roadmap 2024](https://github.com/apache/doris/issues/30669)
[Roadmap 2023](https://github.com/apache/doris/issues/16392)
[Roadmap 2022](https://github.com/apache/doris/issues/7502)

# AI & Hybrid Search

## Vector Index

- [ ] Implement index-only scan for vector index
- [ ] Implement disk-based ANN (Approximate Nearest Neighbor) for vector index
- [ ] Optimize compaction policy for vector index
- [ ] Enhance vector index capability to support 10 billion vectors in a single table
- [ ] Introduce vector index support for Iceberg tables

## Full-Text Search

- [ ] Introduce more query expressions: query string and Boolean query
- [ ] Implement scoring functionality in the text index
- [ ] Introduce multi-index support for a single column
- [ ] Add text index support for Iceberg tables
- [ ] Integrate scoring with global lazy materialization

# Query Engine

## Performance

- [ ] Optimize column pruning for complex data types (struct, array, map)
- [ ] Optimize expression execution for cases such as `CASE WHEN` and non-const `LIKE`
- [ ] Enhance partition pruning capability
- [ ] Optimize broadcast join performance
- [ ] Implement query condition cache functionality
- [ ] Enhance zonemap evaluation to support expressions

## ETL/Incremental Processing

- [ ] Enhance spill-to-disk capability to support TPC-DS 10TB workload using 16GB memory
- [ ] Implement `MERGE INTO` statement
- [ ] Implement binlog and incremental materialized view functionality
- [ ] Implement global query buffer management to reduce memory usage for single queries and make query usage more predictable
- [ ] Implement progress bar for long-running queries

## New Features

- [ ] Implement UNNEST functionality
- [ ] Implement recursive CTE (Common Table Expression)
- [ ] Implement ASOF join functionality
- [ ] Introduce Python UDF (User-Defined Function) support
- [ ] Introduce nested variant data type support
- [ ] Enhance function compatibility with Snowflake

## New DataTypes

- [ ] Introduce timestamp with timezone data type
- [ ] Introduce binary data type

## Enhancement

- [ ] Unify predicate and expression framework between external tables and internal tables
- [ ] Implement short-circuit expression evaluation
- [ ] Unify local exchange and global exchange, and move local exchange to FE planner

# Data Storage

## Storage Format

- [ ] Optimize compression ratio for string data
- [ ] Enhance storage format to support 10k columns in a single file
- [ ] Optimize column metadata management for random access
- [ ] Optimize nullable column read performance
- [ ] Optimize storage for sparse columns in variant data type
- [ ] Implement partial update functionality for variant sub-fields

## Data management

- [ ] Enhance tablet management to support ultra-large tablets (100GB+)
- [ ] Optimize MOW (Merge-On-Write) import performance for large tablets

## File Cache

- [ ] Implement table-level cross-compute group synchronized preheating
- [ ] Implement partition time-based TTL (Time-To-Live) support
- [ ] Enhance SQL query capability for more granular and reliable cache usage statistics
- [ ] Optimize diskless/slow disk scenarios to prevent local disk from becoming a file cache throughput bottleneck
- [ ] Implement cache black/white list policy for fine-grade cache management.

## Compute-Storage Separation

- [ ] Implement ultra-fast elastic balance scheduling
- [ ] Enhance read-write separation: bind compaction to write compute groups
- [ ] Implement distributed cache support for sharing cache across multiple compute groups
- [ ] Enhance persistent metadata caching to reduce dependency on metadata service and improve performance

### Data Import

- [ ] Optimize memory management for large imports with many active tablets that may result in many small files: implement memtable disk spill
- [ ] Optimize memory control for scenarios with very large single-row single-column data
- [ ] Introduce support for more data import sources, such as AWS Kinesis

# Data Lakes

## Lake Format Performance

- [ ] [Implement Parquet format Page Cache capability](https://github.com/apache/doris/pull/59307)
- [ ] Enable Data Cache by default
- [ ] Enhance metadata parsing, planning, and caching for ultra-large scale Iceberg and Paimon
- [ ] Implement Condition Cache for Iceberg and Paimon

## Materialized View

- [ ] Implement snapshot-level incremental refresh for materialized views based on Iceberg and Paimon
- [ ] Implement materialized view construction based on Paimon and Iceberg

## Data interoperability

- [ ] Implement comprehensive Iceberg V3 support
- [ ] Implement Iceberg data sorting functionality
- [ ] [Implement Iceberg Data Rewrite functionality](https://github.com/apache/doris/pull/56413)
- [ ] Implement Iceberg Delete/Update functionality
- [ ] Implement Iceberg/Parquet Variant data type support
- [ ] [Implement Paimon data write](https://github.com/apache/doris/issues/56005)
- [ ] [Implement native reader for Paimon MOR (Merge-On-Read) tables](https://github.com/apache/doris/issues/56005)
- [ ] [Implement Fluss integration](https://github.com/apache/doris/issues/59220)
- [ ] Implement Paimon Vector and Blob data type support
- [ ] Implement standardized Arrow Flight Data Catalog

## Metadata Interoperability

- [ ] Implement unified permission management for Iceberg REST Catalog
- [ ] Implement integration with third-party authentication and authorization systems
- [ ] Implement Open Metadata API

# Security

- [ ] Enhance object storage support for IAM role-based authentication from more cloud vendors

# Others

- [ ] Refactor all third-party builds to use CMake
- [ ] Implement hermetic build support


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Discuss] Doris Roadmap 2026 #60036

AI & Hybrid Search

Vector Index

Full-Text Search

Query Engine

Performance

ETL/Incremental Processing

New Features

New DataTypes

Enhancement

Data Storage

Storage Format

Data management

File Cache

Compute-Storage Separation

Data Import

Data Lakes

Lake Format Performance

Materialized View

Data interoperability

Metadata Interoperability

Security

Others

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Discuss] Doris Roadmap 2026 #60036

Description

AI & Hybrid Search

Vector Index

Full-Text Search

Query Engine

Performance

ETL/Incremental Processing

New Features

New DataTypes

Enhancement

Data Storage

Storage Format

Data management

File Cache

Compute-Storage Separation

Data Import

Data Lakes

Lake Format Performance

Materialized View

Data interoperability

Metadata Interoperability

Security

Others

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions