Skip to content

[Discuss] Doris Roadmap 2026 #60036

@morningman

Description

@morningman

"Scale Intelligence, Accelerate Insights"

Building on 2025's achievements in vector search and indexing capabilities, Apache Doris continues to deepen its AI support in 2026. This roadmap focuses on advancing AI & Hybrid Search capabilities while enhancing query performance, storage efficiency, and data lake integration.

AI & Hybrid Search Innovation:

  • Scale vector index to support 10 billion vectors per table with disk-based ANN
  • Enhance full-text search with query expressions, scoring, and multi-index support
  • Extend hybrid search to Iceberg for unified analytics

Core Enhancements:

  • Query engine optimization for complex data types and ETL processing
  • Storage improvements for ultra-large tablets and compute-storage separation
  • Data lake integration with Iceberg V3 and Paimon support

Roadmap 2025
Roadmap 2024
Roadmap 2023
Roadmap 2022

AI & Hybrid Search

Vector Index

  • Implement index-only scan for vector index
  • Implement disk-based ANN (Approximate Nearest Neighbor) for vector index
  • Optimize compaction policy for vector index
  • Enhance vector index capability to support 10 billion vectors in a single table
  • Introduce vector index support for Iceberg tables

Full-Text Search

  • Introduce more query expressions: query string and Boolean query
  • Implement scoring functionality in the text index
  • Introduce multi-index support for a single column
  • Add text index support for Iceberg tables
  • Integrate scoring with global lazy materialization

Query Engine

Performance

  • Optimize column pruning for complex data types (struct, array, map)
  • Optimize expression execution for cases such as CASE WHEN and non-const LIKE
  • Enhance partition pruning capability
  • Optimize broadcast join performance
  • Implement query condition cache functionality
  • Enhance zonemap evaluation to support expressions

ETL/Incremental Processing

  • Enhance spill-to-disk capability to support TPC-DS 10TB workload using 16GB memory
  • Implement MERGE INTO statement
  • Implement binlog and incremental materialized view functionality
  • Implement global query buffer management to reduce memory usage for single queries and make query usage more predictable
  • Implement progress bar for long-running queries

New Features

  • Implement UNNEST functionality
  • Implement recursive CTE (Common Table Expression)
  • Implement ASOF join functionality
  • Introduce Python UDF (User-Defined Function) support
  • Introduce nested variant data type support
  • Enhance function compatibility with Snowflake

New DataTypes

  • Introduce timestamp with timezone data type
  • Introduce binary data type

Enhancement

  • Unify predicate and expression framework between external tables and internal tables
  • Implement short-circuit expression evaluation
  • Unify local exchange and global exchange, and move local exchange to FE planner

Data Storage

Storage Format

  • Optimize compression ratio for string data
  • Enhance storage format to support 10k columns in a single file
  • Optimize column metadata management for random access
  • Optimize nullable column read performance
  • Optimize storage for sparse columns in variant data type
  • Implement partial update functionality for variant sub-fields

Data management

  • Enhance tablet management to support ultra-large tablets (100GB+)
  • Optimize MOW (Merge-On-Write) import performance for large tablets

File Cache

  • Implement table-level cross-compute group synchronized preheating
  • Implement partition time-based TTL (Time-To-Live) support
  • Enhance SQL query capability for more granular and reliable cache usage statistics
  • Optimize diskless/slow disk scenarios to prevent local disk from becoming a file cache throughput bottleneck
  • Implement cache black/white list policy for fine-grade cache management.

Compute-Storage Separation

  • Implement ultra-fast elastic balance scheduling
  • Enhance read-write separation: bind compaction to write compute groups
  • Implement distributed cache support for sharing cache across multiple compute groups
  • Enhance persistent metadata caching to reduce dependency on metadata service and improve performance

Data Import

  • Optimize memory management for large imports with many active tablets that may result in many small files: implement memtable disk spill
  • Optimize memory control for scenarios with very large single-row single-column data
  • Introduce support for more data import sources, such as AWS Kinesis

Data Lakes

Lake Format Performance

Materialized View

  • Implement snapshot-level incremental refresh for materialized views based on Iceberg and Paimon
  • Implement materialized view construction based on Paimon and Iceberg

Data interoperability

Metadata Interoperability

  • Implement unified permission management for Iceberg REST Catalog
  • Implement integration with third-party authentication and authorization systems
  • Implement Open Metadata API

Security

  • Enhance object storage support for IAM role-based authentication from more cloud vendors

Others

  • Refactor all third-party builds to use CMake
  • Implement hermetic build support

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions