Ingestion re-implement on updated Elastic.Ingest.Elasticsearch#2755
Open
Ingestion re-implement on updated Elastic.Ingest.Elasticsearch#2755
Conversation
…mappings Replace manual channel orchestration with IncrementalSyncOrchestrator<T> and source-generated ElasticsearchTypeContext from Elastic.Mapping 0.4.0. Add field type attributes ([Keyword], [Text], [Object], etc.) directly on DocumentationDocument to drive the mapping source generator, replacing verbose manual JSON mappings. - Update Elastic.Ingest.Elasticsearch 0.17.1 → 0.19.0, add Elastic.Mapping 0.4.0 - Add mapping attributes to DocumentationDocument and IndexedProduct - Create DocumentationMappingConfig.cs with two Entity variants (lexical/semantic) - Rewrite ElasticsearchMarkdownExporter to use orchestrator for dual-index mode - Delete ElasticsearchIngestChannel.cs and ElasticsearchIngestChannel.Mapping.cs - Remove unused ReindexAsync from ElasticsearchOperations - Update SearchBootstrapFixture to use IngestChannel with semantic type context
Replaces `ElasticsearchOptions` with `DocumentationEndpoints` as the single source of truth for
Elasticsearch configuration across all API apps, MCP server, and integration tests.
- Adds `IndexName` property to `ElasticsearchEndpoint` with a field-backed getter defaulting to
`{IndexNamePrefix}-dev-latest`.
- Creates `ElasticsearchEndpointFactory` in `ServiceDefaults` to centralize user-secrets and
environment variable reading, eliminating the duplicated `72f50f33` secrets ID pattern.
- Registers `DocumentationEndpoints` as a singleton in `AddDocumentationServiceDefaults`.
- Updates `ElasticsearchClientAccessor` to accept `DocumentationEndpoints` instead of
`ElasticsearchOptions`, supporting both API key and basic authentication.
- Updates all gateway consumers (`NavigationSearchGateway`, `FullSearchGateway`,
`DocumentGateway`, `ElasticsearchAskAiMessageFeedbackGateway`) to use endpoint properties.
- Simplifies all three integration test files (`SearchRelevanceTests`,
`McpToolsIntegrationTestsBase`, `SearchBootstrapFixture`) to use `ElasticsearchEndpointFactory`
and `ElasticsearchTransportFactory`, removing manual config construction.
- Deletes `ElasticsearchOptions.cs` and removes `Microsoft.Extensions.Configuration.UserSecrets`
from the Search project.
Move mapping context (DocumentationMappingContext, LexicalConfig, SemanticConfig, DocumentationAnalysisFactory) from Elastic.Markdown to Elastic.Documentation so both indexing and search derive index names from the same source. Add ContentHash helper to avoid Elastic.Ingest.Elasticsearch dependency in Elastic.Documentation. Remove IndexName from ElasticsearchEndpoint, add Namespace to DocumentationEndpoints. ElasticsearchEndpointFactory resolves namespace from DOCUMENTATION_ELASTIC_INDEX env var (backward compat), DOTNET_ENVIRONMENT, ENVIRONMENT, or falls back to "dev". ElasticsearchClientAccessor derives SearchIndex and RulesetName from namespace instead of parsing the old IndexName string. Remove ExtractRulesetName and all hardcoded "semantic-docs-dev-latest" assignments from tests and config files.
Enable IndexPatternUseBatchDate now that Elastic.Mapping supports it, and pass batchTimestamp to IngestChannelOptions in the lexical-only path so the channel uses the exporter's timestamp for index name computation.
…meter Simplify DocumentationTooling endpoint resolution by delegating to ElasticsearchEndpointFactory. Add missing skipOpenApi parameter to IsolatedIndexService.Index call.
The lexical-only code path manually reimplemented drain, delete-stale, refresh, and alias logic that the orchestrator handles automatically. Remove the flag end-to-end: CLI parameters, configuration, exporter branching, and CLI documentation.
🔍 Preview links for changed docs |
Add .jina-embeddings-v5-text-small inference on 6 fields (title, abstract, ai_rag_optimized_summary, ai_questions, ai_use_cases, stripped_body) to enable hybrid sparse+dense retrieval. Rename InferenceId to ElserInferenceId for clarity.
dfe279a to
50c89b2
Compare
Use source-generated IStaticMappingResolver delegates for auto-stamping BatchIndexDate and LastUpdated instead of manual assignment. Replace DocumentationAnalysisFactory.CreateContext with direct context customization via WithIndexName() and record-with expressions. Pass IndexSettings for default_pipeline conditionally at runtime.
…nment
Rename indexNamespace to buildType throughout the exporter pipeline so
callers pass the build type (assembler, isolated, codex) instead of the
environment name. Search services now hardcode "assembler" as the type
since they always target assembler indices.
ResolveNamespace renamed to ResolveEnvironment and updated to parse the
old production index format ({variant}-docs-{env}-{timestamp}) to
extract the environment name.
src/Elastic.Markdown/Exporters/Elasticsearch/ElasticsearchMarkdownExporter.cs
Fixed
Show fixed
Hide fixed
… to simplify index naming logic. Update Elasticsearch dependencies to version 0.28.0.
reakaleek
approved these changes
Feb 24, 2026
reakaleek
approved these changes
Feb 25, 2026
…entOrchestrator Upgrade Elastic.Ingest.Elasticsearch and Elastic.Mapping to 0.30.0 which includes source-generated AI enrichment support (elastic/elastic-ingest-dotnet#151). - Annotate DocumentationDocument with [AiInput]/[AiField] attributes - Add [AiEnrichment<DocumentationDocument>] to DocumentationMappingContext - Replace ElasticsearchEnrichmentCache + ElasticsearchLlmClient + EnrichPolicyManager with a single AiEnrichmentOrchestrator that runs post-indexing - Remove 7 handrolled enrichment files (~1650 lines) and associated tests Made-with: Cursor
Made-with: Cursor # Conflicts: # src/api/Elastic.Documentation.Mcp.Remote/Program.cs
- AiEnrichmentOrchestrator now takes (ITransport, ElasticsearchTypeContext) instead of (ITransport, IAiEnrichmentProvider) - EnrichAsync uses streaming IAsyncEnumerable<AiEnrichmentProgress> API with per-phase progress logging - Fix bug: AI enrichment pipeline only set on semantic (secondary) index, no longer wastefully applied to lexical (primary) index - Add OnReindexProgress and OnDeleteByQueryProgress logging callbacks - IConfigureElasticsearch<T> now requires ConfigureAnalysis and IndexSettings - AI enrichment enabled by default; CLI flag flipped to --no-ai-enrichment Made-with: Cursor
…etry config Configure AI enrichment with 2-minute completion timeout (down from 5m default) and explicit 2 retries for ES|QL COMPLETION calls that fail with HTTP 408/429/5xx. Made-with: Cursor
…tions Log bulk response details (HTTP status, item/error counts, buffer size) on every response. Emit diagnostics error when max retries are exhausted. Made-with: Cursor
Replace raw HTTP status/item dump with cumulative indexed count using per-channel Interlocked counters. Also bump default BufferSize to 100. Made-with: Cursor
Replace the ad-hoc `buildType` string parameter with `DocumentationEndpoints.DataSource` (resolved from `DOCS_BUILD_TYPE` env var, default "isolated") and rename `Namespace` to `Environment` (resolved from `DOTNET_ENVIRONMENT`/`ENVIRONMENT`, default "dev"). This ensures both write (indexing) and read (search) paths use a single source of truth for index naming. Remove legacy `DOCUMENTATION_ELASTIC_INDEX` env var parsing. API logging now uses `ElasticsearchClientAccessor.SearchIndex` instead of duplicating `CreateContext`. Made-with: Cursor
Add IndexVariant = "Semantic" to [AiEnrichment] so the provider attaches only to the semantic context and derives its AI cache name from the semantic write alias. Switch AiEnrichmentOrchestrator to use the semantic type context. Also gains binary-split batch reduction on COMPLETION timeouts and "dev" default namespace fallback from the upstream release. Made-with: Cursor
…ng workarounds Bump to 0.34.3 which fixes secondary index rollover (now creates new backing index when hash changes) and exposes IndexRolloverInfo diagnostics. Wire up OnRolloverDecision callback for per-index hash logging. Add explicit AddField declarations for ai_questions, ai_use_cases (base text type), and product/ related_products sub-fields (keyword with normalizer) to work around source generator gaps with dot-path merge and [Object] sub-type traversal. Made-with: Cursor
…st diagnostics Pass env: explicitly to CreateContext() instead of relying on ResolveDefaultNamespace() which reads DOTNET_ENVIRONMENT raw (returning "Development" instead of "dev"). Add environment parameter to ElasticsearchEndpointFactory.Create() so tests can pin the environment. Add diagnostic output (endpoint, index, doc count) to all integration tests for easier debugging when results are empty. Made-with: Cursor
Replace manually constructed SearchConfiguration with ConfigurationFileProvider.CreateSearchConfiguration() to keep tests in sync with the real config/search.yml. Made-with: Cursor
…ude environment in resource names Rename DocumentationEndpoints.DataSource to BuildType to match the DOCS_BUILD_TYPE env var. Update Elastic.Ingest.Elasticsearch and Elastic.Mapping to 0.34.5 which fixes [Text]+[JsonIgnore(WhenWritingNull)] and [Object] sub-type attribute traversal, removing workaround AddField calls for ai_questions, ai_use_cases, product.*, and related_products.*. Include environment in synonym set and ruleset names for proper isolation (e.g. docs-assembler-dev, docs-ruleset-assembler-dev). Made-with: Cursor
reakaleek
approved these changes
Mar 4, 2026
reakaleek
reviewed
Mar 4, 2026
| /// Build type identifier (assembler, isolated, codex). Controlled by DOCS_BUILD_TYPE env var. | ||
| /// </summary> | ||
| public string DataSource { get; set; } = "isolated"; | ||
| public string BuildType { get; set; } = "isolated"; |
Member
There was a problem hiding this comment.
We have an existing BuildType enum. Wondering if we should reuse it.
The synonymSetName for analysis config was updated but the setName in PublishSynonymsAsync was still using the old format without environment. Made-with: Cursor
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Migrate Elasticsearch indexing to source-generated mappings via
Elastic.Mappingand theIncrementalSyncOrchestratorfromElastic.Ingest.Elasticsearch, replacing ~2200 lines of hand-rolled ingest/enrichment code. Introduces a clear separation between build type ({type}) and environment ({env}) in all index and resource naming, with consistent resolution across write (indexing) and read (search) paths.Key changes
DocumentationMappingConfig.csdeclares index structure, field mappings, and analysis settings using[Index<T>]attributes. The source generator produces a typedCreateContext(type:, env:)factory, eliminating manual index name construction.IncrementalSyncOrchestratorreplaces the two manually managedElasticsearchLexicalIngestChannel/ElasticsearchSemanticIngestChannelclasses. Dual-index writes, alias rotation, and hash-based rollover detection are now handled by the library.AiEnrichmentOrchestrator— replaces the hand-rolled LLM client implementation (~1600 lines removed). Uses ES|QLCOMPLETIONfor server-side inference with configurableCompletionTimeout(2 min) andCompletionMaxRetries(2). AI enrichment is now the default for all index commands (opt out via--no-ai-enrichment).ElasticsearchEndpointFactoryresolves Elasticsearch URL, credentials,BuildType(DOCS_BUILD_TYPEenv var, default"isolated"), andEnvironment(fromDOTNET_ENVIRONMENT/ENVIRONMENT, default"dev") in one place. Both write and read paths useendpoints.BuildTypeandendpoints.Environmentconsistently.env:parameter toCreateContext— allCreateContextcalls now passenv: endpoints.Environmentto avoid the library'sResolveDefaultNamespace()picking up rawDOTNET_ENVIRONMENT=Developmentwithout lowercasing.Resource naming convention
All Elasticsearch resources now follow a structured naming convention that includes both build type and environment:
assembler, env=dev)docs-assembler.lexical-dev-2025.10.23.120521docs-assembler.lexical-dev-latestdocs-assembler.semantic-dev-2025.10.23.120521docs-assembler.semantic-dev-latestdocs-assembler-devdocs-ruleset-assembler-devDocumentationEndpointsconfigurationBuildTypeDOCS_BUILD_TYPE"isolated"assembler,isolated,codexEnvironmentDOTNET_ENVIRONMENT/ENVIRONMENT"dev"Library versions
Elastic.Ingest.ElasticsearchandElastic.Mapping: 0.17.1 → 0.34.5Key capabilities from the library upgrades:
IncrementalSyncOrchestrator<T>— dual-index writes with coordinated alias rotation and hash-based rolloverAiEnrichmentOrchestrator— streamingIAsyncEnumerable<AiEnrichmentProgress>API for AI enrichment lifecycleOnRolloverDecisioncallback — exposesIndexRolloverInfo(label, local/remote hash, rolled over status) for diagnosticsExportResponseCallback/ExportMaxRetriesCallback— bulk response logging with atomic running totals per channelCompletionTimeout/CompletionMaxRetriesonAiEnrichmentOptions— configurable ES|QL completion parameters[ElasticsearchMappingContext]— typed mapping builders withCreateContext(type:, env:)factoryIndexVarianton[AiEnrichment]— target AI enrichment to specific mapping contexts (semantic only)[Text]+[JsonIgnore(WhenWritingNull)]— source generator now emitstype: "text"so dot-path sub-fields merge as multi-fields, not object properties[Object]sub-type attribute traversal —[Keyword(Normalizer)]on nested object types now correctly emitted in mappingIntegration test improvements
Search.IntegrationTests,Mcp.Remote.IntegrationTests) log endpoint URL, search index, and ruleset name upfrontCountAsyncreports whether the index has documents at allElasticsearchEndpointFactory.Create()accepts optionalbuildTypeandenvironmentparameters for explicit test configurationSearchRelevanceTestsusesConfigurationFileProvider.CreateSearchConfiguration()instead of manually constructing search configDeleted code
ElasticsearchIngestChannel.csandElasticsearchIngestChannel.Mapping.cs(~420 lines)ElasticsearchLlmClient.csand related AI enrichment hand-rolled implementation (~1600 lines)ElasticsearchLlmClientTests.cs(304 lines)DOCUMENTATION_ELASTIC_INDEXenvironment variableNet effect: 104 files changed, 2258 insertions, 3590 deletions
Test plan
dotnet buildpasses./build.sh unit-testpassesenv:parameter (no moreDevelopmentin index names)docs-assembler.semantic-dev-*indexdocument_parsing_exceptionerrorsdocs-assembler-devdocs-ruleset-assembler-devdocs-assembler.semantic-{env}-latest-ai-cache