Enable internal text embedding API #3441
Open
ajtiwari07 wants to merge 70 commits intoAzure:mainfrom
Open
Conversation
Co-authored-by: JerryNixon <1749983+JerryNixon@users.noreply.github.com>
Co-authored-by: JerryNixon <1749983+JerryNixon@users.noreply.github.com>
…configuration Co-authored-by: JerryNixon <1749983+JerryNixon@users.noreply.github.com>
Co-authored-by: JerryNixon <1749983+JerryNixon@users.noreply.github.com>
Co-authored-by: JerryNixon <1749983+JerryNixon@users.noreply.github.com>
Co-authored-by: JerryNixon <1749983+JerryNixon@users.noreply.github.com>
Co-authored-by: JerryNixon <1749983+JerryNixon@users.noreply.github.com>
Co-authored-by: JerryNixon <1749983+JerryNixon@users.noreply.github.com>
Co-authored-by: JerryNixon <1749983+JerryNixon@users.noreply.github.com>
… and telemetry integration Co-authored-by: JerryNixon <1749983+JerryNixon@users.noreply.github.com>
Co-authored-by: JerryNixon <1749983+JerryNixon@users.noreply.github.com>
…ate empty embeddings Co-authored-by: JerryNixon <1749983+JerryNixon@users.noreply.github.com>
…oint/health sub-objects Co-authored-by: JerryNixon <1749983+JerryNixon@users.noreply.github.com>
…orization Co-authored-by: JerryNixon <1749983+JerryNixon@users.noreply.github.com>
Co-authored-by: JerryNixon <1749983+JerryNixon@users.noreply.github.com>
…lthCheckConfig in converter
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…dding controller tests, switching the dab schema for the embedding system to default to false
embeddings endpoint is now permanently fixed to /embed with no user-configurable path option. This removes unnecessary configuration surface since the feature has not been released yet, eliminating the need for backward compatibility. Changes: - Remove path property from dab.draft.schema.json - Remove Path, UserProvidedPath, and EffectivePath from EmbeddingsEndpointOptions - Remove EffectiveEndpointPath from EmbeddingsOptions - Remove path deserialization from EmbeddingsOptionsConverterFactory - Remove --runtime.embeddings.endpoint.path CLI option - Remove path configuration logic from ConfigGenerator - Remove endpoint path validation from RuntimeConfigValidator - Update Startup.cs logging to use DEFAULT_PATH constant - Update all tests to remove path references
|
Azure Pipelines successfully started running 6 pipeline(s). |
…om/ajtiwari07/data-api-builder into add-internal-text-embedding-system
Collaborator
|
/azp run |
|
Azure Pipelines successfully started running 6 pipeline(s). |
…om/ajtiwari07/data-api-builder into add-internal-text-embedding-system
Author
|
/azp run |
|
Commenter does not have sufficient privileges for PR 3441 in repo Azure/data-api-builder |
Collaborator
|
/azp run |
|
Azure Pipelines successfully started running 6 pipeline(s). |
Aniruddh25
requested changes
May 2, 2026
Collaborator
Aniruddh25
left a comment
There was a problem hiding this comment.
Main question around need for embedding cache options. Cant we reuse runtime cache options to decide what we do for embeddings, does it need to be granular?
…om/ajtiwari07/data-api-builder into add-internal-text-embedding-system
Aniruddh25
approved these changes
May 5, 2026
Collaborator
|
/azp run |
|
Azure Pipelines will not run the associated pipelines, because the pull request was updated after the run command was issued. Review the pull request again and issue a new run command. |
|
/azp run |
|
Azure Pipelines will not run the associated pipelines, because the pull request was updated after the run command was issued. Review the pull request again and issue a new run command. |
* Fix logs still appearing even when LogLevel is set to `none` bug (Azure#3318) ## Why make this change? - Closes issue Azure#3262 The logger for the Startup class is not initialized properly, since this logger is special due to the nature of the Startup class it needs to be continuously updated as DAB initializes. This causes two problems: - Some logs appear even when LogLevel is set to some value that would impede those logs to appear. - Some logs don't appear at all, even when LogLevel is set to a value that should allow them to be logged. - Closes issue Azure#3256 & Azure#3255 The CLI logger still outputs some logs even when the LogLevel is set to `none`. It is expected that if the LogLevel set is `none` or some other level that shouldn't output the `information` level, the logs will not appear. ## What is this change? Important Note: These changes currently only allow us to change the LogLevel from the CLI with the `default` namespace in the config file. An task was created to solve this issue: Azure#3451 In order to solve issue Azure#3262: - We removed the LogBuffer from the services inside of `Startup.cs`, this is necessary since we wanted each class to have its own LogBuffer so that we are able to tell from which logger the logs are being outputted. - Then, we also correctly initialized the `Startup` logger by changing the method that it was using to initialize the logger, it now uses `CreateLoggerFactoryForHostedAndNonHostedScenario` which checks if there are any LogLevel namespaces from the config file that can be applicable for the specific logger. It is important to note that there are multiple places where the logs are flushed in order to cover for the cases in which an exception is found and causes DAB to end abruptly, and when we there is an IsLateConfigured scenario. - We also changed the logger for the LogBuffer in all the missing places where it creates logs before the logger is able to properly initialize to add those logs to the LogBuffer and only flush them after the loggers are initialized. In order to solve issue Azure#3256 & Azure#3255: - We changed the CLI so that we add all the logs go to a single global LogBuffer that is created inside the `StartOptions.cs` until it is able to deserialize the RuntimeConfig and find which level to set the `LogLevel` in order to flush all the logs. - This is something that we only want to happen when we use the `dab start` command, which is why we only make this change in the `StartOptions.cs` file, on the function `TryStartEngineWithOptions` inside of `ConfigGenerator.cs`, and a few functions from `Utils.cs` and `ConfigMerger.cs` that are used inside the `TryStartEngine` function. ## How was this tested? - [ ] Integration Tests - [x] Unit Tests ## Sample Request(s) - dab start --LogLevel none - dab start --LogLevel error --------- Co-authored-by: Aniruddh Munde <anmunde@microsoft.com> * Update config validation logic for entities (Azure#3306) ## Why make this change? Closes Azure#3267 ## What is this change? Alters the validation logic in the following way. Is top-level config with data-source-files? (we call this a `Root` config file) ├── YES │ ├── Has datasource? → ValidateEntityPresence (same rules as non-root) │ ├── No datasource but has entities/autoentities? → ERROR │ └── No datasource, no entities → VALID (children provide everything) │ └── For each child → ValidateNonRootConfig(child, filename) │ └── NO (standalone or child config) ├── No datasource? → ERROR: "data source is required" └── Has datasource → ValidateEntityPresence Note: A top-level config file without any children data-source files is NOT considered a root. And an intermediary config file, ie: is a child, that also has child configs is NOT a root. Only a top-level config with children configs is a Root. #### ValidateEntityPresence Count resolved autoentities from AutoentityResolutionCounts total = manual entities + resolved autoentities total == 0? → ERROR: "No entities found" total > 0 but autoentities discovered nothing? → WARN: "Autoentities configured but none discovered" No double messaging. If total is 0, only the error is recorded, not the warning. ## How was this tested? ### Truth table — top-level config Variables (`1` = present / non-empty, `0` = absent / empty): - **DSF** — `data-source-files` present - **DS** — `data-source` present - **E** — manual `entities` count > 0 - **AE** — `autoentities` count > 0 (presence, *not* resolved count) Path is determined by `IsRootConfig = (DSF == 1) && !IsChildConfig`. | # | DSF | DS | E | AE | AE resolved | Path | Expected | Test | |---|:---:|:--:|:-:|:--:|:-----------:|------|----------|------| | 1 | 0 | 0 | 0 | 0 | — | Non-root | **Error**: "data source is required" | `TestNonRootWithNoDataSourceProducesError` | | 2 | 0 | 0 | 0 | 1 | — | Non-root | **Error**: "data source is required" | _covered by #1 — DS check fires first_ | | 3 | 0 | 0 | 1 | 0 | — | Non-root | **Error**: "data source is required" | _covered by #1_ | | 4 | 0 | 0 | 1 | 1 | — | Non-root | **Error**: "data source is required" | _covered by #1_ | | 5 | 0 | 1 | 0 | 0 | — | Non-root | **Error**: "No entities found" | `TestNonRootWithDataSourceAndNoEntitiesProducesError` | | 6a | 0 | 1 | 0 | 1 | 0 | Non-root | **Error**: "No entities found" | `TestNonRootWithDataSourceAndAutoentitiesResolvingZeroProducesError` | | 6b | 0 | 1 | 0 | 1 | >0 | Non-root | **Valid** | `TestNonRootWithDataSourceAndAutoentitiesResolvingEntitiesIsValid` | | 7 | 0 | 1 | 1 | 0 | — | Non-root | **Valid** | `TestNonRootWithDataSourceAndEntitiesIsValid` | | 8a | 0 | 1 | 1 | 1 | 0 | Non-root | **Valid** + **Warn** | `TestNonRootWithEntitiesAndAutoentitiesResolvingZeroLogsWarning` | | 8b | 0 | 1 | 1 | 1 | >0 | Non-root | **Valid** | _covered by #7 / #6b combined_ | | 9 | 1 | 0 | 0 | 0 | — | Root | **Valid** (children carry the load) | `TestRootWithNoDataSourceAndNoEntitiesIsValid`, `TestRootConfigWithNoDataSourceAndNoEntitiesParses` | | 10 | 1 | 0 | 0 | 1 | — | Root | **Error**: "must not define entities or autoentities" | `TestRootWithNoDataSourceButAutoentitiesProducesError` | | 11 | 1 | 0 | 1 | 0 | — | Root | **Error**: "must not define entities" | `TestRootWithNoDataSourceButEntitiesProducesError` | | 12 | 1 | 0 | 1 | 1 | — | Root | **Error** | _covered by #11_ | | 13 | 1 | 1 | 0 | 0 | — | Root (with own DS) | **Error**: "No entities found" | `TestRootWithDataSourceAndNoEntitiesProducesError` | | 14a | 1 | 1 | 0 | 1 | 0 | Root (with own DS) | **Error**: "No entities found" | `TestRootWithDataSourceAndAutoentitiesResolvingZeroProducesError` | | 14b | 1 | 1 | 0 | 1 | >0 | Root (with own DS) | **Valid** | `TestRootWithDataSourceAndAutoentitiesResolvingEntitiesIsValid` | | 15 | 1 | 1 | 1 | 0 | — | Root (with own DS) | **Valid** | `TestRootWithDataSourceAndEntitiesIsValid` | | 16a | 1 | 1 | 1 | 1 | 0 | Root (with own DS) | **Valid** + **Warn** | `TestRootWithEntitiesAndAutoentitiesResolvingZeroLogsWarning` | | 16b | 1 | 1 | 1 | 1 | >0 | Root (with own DS) | **Valid** | _covered by Azure#15 / #14b combined_ | ### Truth table — child config (validated when iterating `root.ChildConfigs`) Children are always treated as non-root regardless of their own `data-source-files`. | # | DS | E | AE | AE resolved | Expected | Test | |---|:--:|:-:|:--:|:-----------:|----------|------| | C1 | 0 | 0 | 0 | — | **Error** naming the child file: "data source is required" | `TestChildWithNoDataSourceProducesNamedError` | | C2 | 0 | * | * | — | **Error** naming the child file: "data source is required" | _covered by C1_ | | C3 | 1 | 0 | 0 | — | **Error** naming the child file: "No entities found" | `TestChildWithDataSourceAndNoEntitiesProducesNamedError` | | C4a | 1 | 0 | 1 | 0 | **Error** naming the child file: "No entities found" | `TestChildWithDataSourceAndAutoentitiesResolvingZeroProducesNamedError` | | C4b | 1 | 0 | 1 | >0 | **Valid** | _covered by C5 (resolved entities behave the same as manual entities)_ | | C5 | 1 | 1 | 0 | — | **Valid** | _implicitly via `TestRootWithDataSourceAndEntitiesIsValid` setup_ | | C6a | 1 | 1 | 1 | 0 | **Valid** + **Warn** naming the child file | `TestChildWithEntitiesAndAutoentitiesResolvingZeroLogsNamedWarning` | | C6b | 1 | 1 | 1 | >0 | **Valid** | _covered by C5_ | ### Other scenarios | Scenario | Expected | Test | |----------|----------|------| | Connection-string error gates entity validation (no entity error fires when DB unreachable) | `IsConfigValid == false` due to connection error only | `TestValidateNonRootZeroEntitiesWithInvalidConnectionString` | | Config with no entities parses cleanly (constructor no longer throws) and `IsConfigValid` returns false without throwing | parse OK, validate fails | `TestValidateConfigWithNoEntitiesProducesCleanError` _(modified)_ | | Root parses successfully without a data source | parse OK, `IsRootConfig == true` | `TestRootConfigWithNoDataSourceAndNoEntitiesParses` | | Non-root with DS and no entities parses successfully | parse OK, `IsRootConfig == false` | `TestNonRootConfigWithDataSourceAndNoEntitiesParses` | | Autoentities present but resolve to nothing — must not crash, must not double-message with "No entities found" | no crash; only "No entities found" if total = 0 | `ValidateAutoentitiesConfiguration` _(modified to `isValidateOnly: true`)_ | New tests: `TestRootConfigWithNoDataSourceAndNoEntitiesParses` Root config (has data-source-files) without datasource parses OK `TestNonRootConfigWithDataSourceAndNoEntitiesParses` Non-root config with datasource + no entities parses OK (validation catches it later) `TestNonRootWithDataSourceAndNoEntitiesProducesError` Calls ValidateDataSourceAndEntityPresence directly, error recorded `TestNonRootWithNoDataSourceProducesError` No datasource, error with "data source is required" `TestNonRootWithDataSourceAndEntitiesIsValid` Datasource + entities, no errors `TestRootWithNoDataSourceAndNoEntitiesIsValid` Root with child, no own datasource, valid `TestRootWithNoDataSourceButEntitiesProducesError` Root with entities but no datasource, error `TestRootWithDataSourceAndEntitiesIsValid` Root with own datasource + entities, valid `TestChildWithDataSourceAndNoEntitiesProducesNamedError` Child with no entities, error names the child file `TestChildWithNoDataSourceProducesNamedError` Child with no datasource, error names the child file `TestNonRootWithDataSourceAndAutoentitiesResolvingZeroProducesError` Non-root with only autoentities that resolve to 0 `TestNonRootWithDataSourceAndAutoentitiesResolvingEntitiesIsValid` Non-root with only autoentities resolving > 0 entities `TestNonRootWithEntitiesAndAutoentitiesResolvingZeroLogsWarning` Non-root with manual entities + autoentities resolving 0 `TestRootWithNoDataSourceButAutoentitiesProducesError` Root with no datasource but autoentities defined `TestRootWithDataSourceAndNoEntitiesProducesError` Root with own datasource and zero entities/autoentities `TestRootWithDataSourceAndAutoentitiesResolvingZeroProducesError` Root with own datasource and autoentities resolving 0 `TestRootWithDataSourceAndAutoentitiesResolvingEntitiesIsValid` Root with own datasource and autoentities resolving > 0 `TestRootWithEntitiesAndAutoentitiesResolvingZeroLogsWarning` Root with own datasource, manual entities, and autoentities resolving 0 `TestChildWithDataSourceAndAutoentitiesResolvingZeroProducesNamedError` Child with autoentities-only resolving 0 `TestChildWithEntitiesAndAutoentitiesResolvingZeroLogsNamedWarning` Child with manual entities + autoentities resolving 0 Modified tests: `TestValidateConfigWithNoEntitiesProducesCleanError` Replaced main's version (expected parse failure) with ours: parse succeeds, IsConfigValid returns false `ValidateAutoentitiesConfiguration` Changed to isValidateOnly: true, asserts no crashes instead of zero errors --------- Co-authored-by: Anusha Kolan <anushakolan10@gmail.com> --------- Co-authored-by: RubenCerna2079 <32799214+RubenCerna2079@users.noreply.github.com> Co-authored-by: Aniruddh Munde <anmunde@microsoft.com> Co-authored-by: aaronburtle <93220300+aaronburtle@users.noreply.github.com> Co-authored-by: Anusha Kolan <anushakolan10@gmail.com> Co-authored-by: Sayali Kudale <sayalikudale@microsoft.com>
* Fix logs still appearing even when LogLevel is set to `none` bug (Azure#3318) ## Why make this change? - Closes issue Azure#3262 The logger for the Startup class is not initialized properly, since this logger is special due to the nature of the Startup class it needs to be continuously updated as DAB initializes. This causes two problems: - Some logs appear even when LogLevel is set to some value that would impede those logs to appear. - Some logs don't appear at all, even when LogLevel is set to a value that should allow them to be logged. - Closes issue Azure#3256 & Azure#3255 The CLI logger still outputs some logs even when the LogLevel is set to `none`. It is expected that if the LogLevel set is `none` or some other level that shouldn't output the `information` level, the logs will not appear. ## What is this change? Important Note: These changes currently only allow us to change the LogLevel from the CLI with the `default` namespace in the config file. An task was created to solve this issue: Azure#3451 In order to solve issue Azure#3262: - We removed the LogBuffer from the services inside of `Startup.cs`, this is necessary since we wanted each class to have its own LogBuffer so that we are able to tell from which logger the logs are being outputted. - Then, we also correctly initialized the `Startup` logger by changing the method that it was using to initialize the logger, it now uses `CreateLoggerFactoryForHostedAndNonHostedScenario` which checks if there are any LogLevel namespaces from the config file that can be applicable for the specific logger. It is important to note that there are multiple places where the logs are flushed in order to cover for the cases in which an exception is found and causes DAB to end abruptly, and when we there is an IsLateConfigured scenario. - We also changed the logger for the LogBuffer in all the missing places where it creates logs before the logger is able to properly initialize to add those logs to the LogBuffer and only flush them after the loggers are initialized. In order to solve issue Azure#3256 & Azure#3255: - We changed the CLI so that we add all the logs go to a single global LogBuffer that is created inside the `StartOptions.cs` until it is able to deserialize the RuntimeConfig and find which level to set the `LogLevel` in order to flush all the logs. - This is something that we only want to happen when we use the `dab start` command, which is why we only make this change in the `StartOptions.cs` file, on the function `TryStartEngineWithOptions` inside of `ConfigGenerator.cs`, and a few functions from `Utils.cs` and `ConfigMerger.cs` that are used inside the `TryStartEngine` function. ## How was this tested? - [ ] Integration Tests - [x] Unit Tests ## Sample Request(s) - dab start --LogLevel none - dab start --LogLevel error --------- Co-authored-by: Aniruddh Munde <anmunde@microsoft.com> * Update config validation logic for entities (Azure#3306) ## Why make this change? Closes Azure#3267 ## What is this change? Alters the validation logic in the following way. Is top-level config with data-source-files? (we call this a `Root` config file) ├── YES │ ├── Has datasource? → ValidateEntityPresence (same rules as non-root) │ ├── No datasource but has entities/autoentities? → ERROR │ └── No datasource, no entities → VALID (children provide everything) │ └── For each child → ValidateNonRootConfig(child, filename) │ └── NO (standalone or child config) ├── No datasource? → ERROR: "data source is required" └── Has datasource → ValidateEntityPresence Note: A top-level config file without any children data-source files is NOT considered a root. And an intermediary config file, ie: is a child, that also has child configs is NOT a root. Only a top-level config with children configs is a Root. #### ValidateEntityPresence Count resolved autoentities from AutoentityResolutionCounts total = manual entities + resolved autoentities total == 0? → ERROR: "No entities found" total > 0 but autoentities discovered nothing? → WARN: "Autoentities configured but none discovered" No double messaging. If total is 0, only the error is recorded, not the warning. ## How was this tested? ### Truth table — top-level config Variables (`1` = present / non-empty, `0` = absent / empty): - **DSF** — `data-source-files` present - **DS** — `data-source` present - **E** — manual `entities` count > 0 - **AE** — `autoentities` count > 0 (presence, *not* resolved count) Path is determined by `IsRootConfig = (DSF == 1) && !IsChildConfig`. | # | DSF | DS | E | AE | AE resolved | Path | Expected | Test | |---|:---:|:--:|:-:|:--:|:-----------:|------|----------|------| | 1 | 0 | 0 | 0 | 0 | — | Non-root | **Error**: "data source is required" | `TestNonRootWithNoDataSourceProducesError` | | 2 | 0 | 0 | 0 | 1 | — | Non-root | **Error**: "data source is required" | _covered by #1 — DS check fires first_ | | 3 | 0 | 0 | 1 | 0 | — | Non-root | **Error**: "data source is required" | _covered by #1_ | | 4 | 0 | 0 | 1 | 1 | — | Non-root | **Error**: "data source is required" | _covered by #1_ | | 5 | 0 | 1 | 0 | 0 | — | Non-root | **Error**: "No entities found" | `TestNonRootWithDataSourceAndNoEntitiesProducesError` | | 6a | 0 | 1 | 0 | 1 | 0 | Non-root | **Error**: "No entities found" | `TestNonRootWithDataSourceAndAutoentitiesResolvingZeroProducesError` | | 6b | 0 | 1 | 0 | 1 | >0 | Non-root | **Valid** | `TestNonRootWithDataSourceAndAutoentitiesResolvingEntitiesIsValid` | | 7 | 0 | 1 | 1 | 0 | — | Non-root | **Valid** | `TestNonRootWithDataSourceAndEntitiesIsValid` | | 8a | 0 | 1 | 1 | 1 | 0 | Non-root | **Valid** + **Warn** | `TestNonRootWithEntitiesAndAutoentitiesResolvingZeroLogsWarning` | | 8b | 0 | 1 | 1 | 1 | >0 | Non-root | **Valid** | _covered by #7 / #6b combined_ | | 9 | 1 | 0 | 0 | 0 | — | Root | **Valid** (children carry the load) | `TestRootWithNoDataSourceAndNoEntitiesIsValid`, `TestRootConfigWithNoDataSourceAndNoEntitiesParses` | | 10 | 1 | 0 | 0 | 1 | — | Root | **Error**: "must not define entities or autoentities" | `TestRootWithNoDataSourceButAutoentitiesProducesError` | | 11 | 1 | 0 | 1 | 0 | — | Root | **Error**: "must not define entities" | `TestRootWithNoDataSourceButEntitiesProducesError` | | 12 | 1 | 0 | 1 | 1 | — | Root | **Error** | _covered by #11_ | | 13 | 1 | 1 | 0 | 0 | — | Root (with own DS) | **Error**: "No entities found" | `TestRootWithDataSourceAndNoEntitiesProducesError` | | 14a | 1 | 1 | 0 | 1 | 0 | Root (with own DS) | **Error**: "No entities found" | `TestRootWithDataSourceAndAutoentitiesResolvingZeroProducesError` | | 14b | 1 | 1 | 0 | 1 | >0 | Root (with own DS) | **Valid** | `TestRootWithDataSourceAndAutoentitiesResolvingEntitiesIsValid` | | 15 | 1 | 1 | 1 | 0 | — | Root (with own DS) | **Valid** | `TestRootWithDataSourceAndEntitiesIsValid` | | 16a | 1 | 1 | 1 | 1 | 0 | Root (with own DS) | **Valid** + **Warn** | `TestRootWithEntitiesAndAutoentitiesResolvingZeroLogsWarning` | | 16b | 1 | 1 | 1 | 1 | >0 | Root (with own DS) | **Valid** | _covered by Azure#15 / #14b combined_ | ### Truth table — child config (validated when iterating `root.ChildConfigs`) Children are always treated as non-root regardless of their own `data-source-files`. | # | DS | E | AE | AE resolved | Expected | Test | |---|:--:|:-:|:--:|:-----------:|----------|------| | C1 | 0 | 0 | 0 | — | **Error** naming the child file: "data source is required" | `TestChildWithNoDataSourceProducesNamedError` | | C2 | 0 | * | * | — | **Error** naming the child file: "data source is required" | _covered by C1_ | | C3 | 1 | 0 | 0 | — | **Error** naming the child file: "No entities found" | `TestChildWithDataSourceAndNoEntitiesProducesNamedError` | | C4a | 1 | 0 | 1 | 0 | **Error** naming the child file: "No entities found" | `TestChildWithDataSourceAndAutoentitiesResolvingZeroProducesNamedError` | | C4b | 1 | 0 | 1 | >0 | **Valid** | _covered by C5 (resolved entities behave the same as manual entities)_ | | C5 | 1 | 1 | 0 | — | **Valid** | _implicitly via `TestRootWithDataSourceAndEntitiesIsValid` setup_ | | C6a | 1 | 1 | 1 | 0 | **Valid** + **Warn** naming the child file | `TestChildWithEntitiesAndAutoentitiesResolvingZeroLogsNamedWarning` | | C6b | 1 | 1 | 1 | >0 | **Valid** | _covered by C5_ | ### Other scenarios | Scenario | Expected | Test | |----------|----------|------| | Connection-string error gates entity validation (no entity error fires when DB unreachable) | `IsConfigValid == false` due to connection error only | `TestValidateNonRootZeroEntitiesWithInvalidConnectionString` | | Config with no entities parses cleanly (constructor no longer throws) and `IsConfigValid` returns false without throwing | parse OK, validate fails | `TestValidateConfigWithNoEntitiesProducesCleanError` _(modified)_ | | Root parses successfully without a data source | parse OK, `IsRootConfig == true` | `TestRootConfigWithNoDataSourceAndNoEntitiesParses` | | Non-root with DS and no entities parses successfully | parse OK, `IsRootConfig == false` | `TestNonRootConfigWithDataSourceAndNoEntitiesParses` | | Autoentities present but resolve to nothing — must not crash, must not double-message with "No entities found" | no crash; only "No entities found" if total = 0 | `ValidateAutoentitiesConfiguration` _(modified to `isValidateOnly: true`)_ | New tests: `TestRootConfigWithNoDataSourceAndNoEntitiesParses` Root config (has data-source-files) without datasource parses OK `TestNonRootConfigWithDataSourceAndNoEntitiesParses` Non-root config with datasource + no entities parses OK (validation catches it later) `TestNonRootWithDataSourceAndNoEntitiesProducesError` Calls ValidateDataSourceAndEntityPresence directly, error recorded `TestNonRootWithNoDataSourceProducesError` No datasource, error with "data source is required" `TestNonRootWithDataSourceAndEntitiesIsValid` Datasource + entities, no errors `TestRootWithNoDataSourceAndNoEntitiesIsValid` Root with child, no own datasource, valid `TestRootWithNoDataSourceButEntitiesProducesError` Root with entities but no datasource, error `TestRootWithDataSourceAndEntitiesIsValid` Root with own datasource + entities, valid `TestChildWithDataSourceAndNoEntitiesProducesNamedError` Child with no entities, error names the child file `TestChildWithNoDataSourceProducesNamedError` Child with no datasource, error names the child file `TestNonRootWithDataSourceAndAutoentitiesResolvingZeroProducesError` Non-root with only autoentities that resolve to 0 `TestNonRootWithDataSourceAndAutoentitiesResolvingEntitiesIsValid` Non-root with only autoentities resolving > 0 entities `TestNonRootWithEntitiesAndAutoentitiesResolvingZeroLogsWarning` Non-root with manual entities + autoentities resolving 0 `TestRootWithNoDataSourceButAutoentitiesProducesError` Root with no datasource but autoentities defined `TestRootWithDataSourceAndNoEntitiesProducesError` Root with own datasource and zero entities/autoentities `TestRootWithDataSourceAndAutoentitiesResolvingZeroProducesError` Root with own datasource and autoentities resolving 0 `TestRootWithDataSourceAndAutoentitiesResolvingEntitiesIsValid` Root with own datasource and autoentities resolving > 0 `TestRootWithEntitiesAndAutoentitiesResolvingZeroLogsWarning` Root with own datasource, manual entities, and autoentities resolving 0 `TestChildWithDataSourceAndAutoentitiesResolvingZeroProducesNamedError` Child with autoentities-only resolving 0 `TestChildWithEntitiesAndAutoentitiesResolvingZeroLogsNamedWarning` Child with manual entities + autoentities resolving 0 Modified tests: `TestValidateConfigWithNoEntitiesProducesCleanError` Replaced main's version (expected parse failure) with ours: parse succeeds, IsConfigValid returns false `ValidateAutoentitiesConfiguration` Changed to isValidateOnly: true, asserts no crashes instead of zero errors --------- Co-authored-by: Anusha Kolan <anushakolan10@gmail.com> * Add MCP notifications/message for log streaming to clients (Azure#3484) ## Why make this change? Enables MCP clients (like MCP Inspector, Claude Desktop, VS Code Copilot) to receive real-time log output via MCP `notifications/message`. Related: Azure#3274 (depends on PR Azure#3419) ## What is this change? When `logging/setLevel` is called with a level other than "none", logs are sent to MCP clients as JSON-RPC notifications: ```json { "jsonrpc": "2.0", "method": "notifications/message", "params": { "level": "info", "logger": "Azure.DataApiBuilder.Service.Startup", "data": "Starting Data API builder..." } } ``` ### New files: - `McpLogNotificationWriter.cs` - Writes logs as MCP notifications to stdout - `McpLogger.cs` / `McpLoggerProvider.cs` - ILogger implementation for .NET logging pipeline - `McpLogNotificationTests.cs` - Unit tests (8 tests) ### Modified files: - `Program.cs` - Registers `McpNotificationWriter` and `McpLoggerProvider` for MCP mode - `McpStdioServer.cs` - Enables notifications when `logging/setLevel` is called ## How was this tested? - Unit tests: 6 tests covering level mapping, enable/disable, JSON format - Manual testing with MCP Inspector: verified notifications appear when `logging/setLevel` is sent ## Note This PR targets `dev/anushakolan/set-log-level` (PR Azure#3419) as it depends on the `logging/setLevel` implementation. --------- Co-authored-by: RubenCerna2079 <32799214+RubenCerna2079@users.noreply.github.com> Co-authored-by: Aniruddh Munde <anmunde@microsoft.com> Co-authored-by: aaronburtle <93220300+aaronburtle@users.noreply.github.com> Co-authored-by: Anusha Kolan <anushakolan10@gmail.com> Co-authored-by: Sayali Kudale <sayalikudale@microsoft.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* Fix logs still appearing even when LogLevel is set to `none` bug (Azure#3318) ## Why make this change? - Closes issue Azure#3262 The logger for the Startup class is not initialized properly, since this logger is special due to the nature of the Startup class it needs to be continuously updated as DAB initializes. This causes two problems: - Some logs appear even when LogLevel is set to some value that would impede those logs to appear. - Some logs don't appear at all, even when LogLevel is set to a value that should allow them to be logged. - Closes issue Azure#3256 & Azure#3255 The CLI logger still outputs some logs even when the LogLevel is set to `none`. It is expected that if the LogLevel set is `none` or some other level that shouldn't output the `information` level, the logs will not appear. ## What is this change? Important Note: These changes currently only allow us to change the LogLevel from the CLI with the `default` namespace in the config file. An task was created to solve this issue: Azure#3451 In order to solve issue Azure#3262: - We removed the LogBuffer from the services inside of `Startup.cs`, this is necessary since we wanted each class to have its own LogBuffer so that we are able to tell from which logger the logs are being outputted. - Then, we also correctly initialized the `Startup` logger by changing the method that it was using to initialize the logger, it now uses `CreateLoggerFactoryForHostedAndNonHostedScenario` which checks if there are any LogLevel namespaces from the config file that can be applicable for the specific logger. It is important to note that there are multiple places where the logs are flushed in order to cover for the cases in which an exception is found and causes DAB to end abruptly, and when we there is an IsLateConfigured scenario. - We also changed the logger for the LogBuffer in all the missing places where it creates logs before the logger is able to properly initialize to add those logs to the LogBuffer and only flush them after the loggers are initialized. In order to solve issue Azure#3256 & Azure#3255: - We changed the CLI so that we add all the logs go to a single global LogBuffer that is created inside the `StartOptions.cs` until it is able to deserialize the RuntimeConfig and find which level to set the `LogLevel` in order to flush all the logs. - This is something that we only want to happen when we use the `dab start` command, which is why we only make this change in the `StartOptions.cs` file, on the function `TryStartEngineWithOptions` inside of `ConfigGenerator.cs`, and a few functions from `Utils.cs` and `ConfigMerger.cs` that are used inside the `TryStartEngine` function. ## How was this tested? - [ ] Integration Tests - [x] Unit Tests ## Sample Request(s) - dab start --LogLevel none - dab start --LogLevel error --------- Co-authored-by: Aniruddh Munde <anmunde@microsoft.com> * Update config validation logic for entities (Azure#3306) ## Why make this change? Closes Azure#3267 ## What is this change? Alters the validation logic in the following way. Is top-level config with data-source-files? (we call this a `Root` config file) ├── YES │ ├── Has datasource? → ValidateEntityPresence (same rules as non-root) │ ├── No datasource but has entities/autoentities? → ERROR │ └── No datasource, no entities → VALID (children provide everything) │ └── For each child → ValidateNonRootConfig(child, filename) │ └── NO (standalone or child config) ├── No datasource? → ERROR: "data source is required" └── Has datasource → ValidateEntityPresence Note: A top-level config file without any children data-source files is NOT considered a root. And an intermediary config file, ie: is a child, that also has child configs is NOT a root. Only a top-level config with children configs is a Root. #### ValidateEntityPresence Count resolved autoentities from AutoentityResolutionCounts total = manual entities + resolved autoentities total == 0? → ERROR: "No entities found" total > 0 but autoentities discovered nothing? → WARN: "Autoentities configured but none discovered" No double messaging. If total is 0, only the error is recorded, not the warning. ## How was this tested? ### Truth table — top-level config Variables (`1` = present / non-empty, `0` = absent / empty): - **DSF** — `data-source-files` present - **DS** — `data-source` present - **E** — manual `entities` count > 0 - **AE** — `autoentities` count > 0 (presence, *not* resolved count) Path is determined by `IsRootConfig = (DSF == 1) && !IsChildConfig`. | # | DSF | DS | E | AE | AE resolved | Path | Expected | Test | |---|:---:|:--:|:-:|:--:|:-----------:|------|----------|------| | 1 | 0 | 0 | 0 | 0 | — | Non-root | **Error**: "data source is required" | `TestNonRootWithNoDataSourceProducesError` | | 2 | 0 | 0 | 0 | 1 | — | Non-root | **Error**: "data source is required" | _covered by #1 — DS check fires first_ | | 3 | 0 | 0 | 1 | 0 | — | Non-root | **Error**: "data source is required" | _covered by #1_ | | 4 | 0 | 0 | 1 | 1 | — | Non-root | **Error**: "data source is required" | _covered by #1_ | | 5 | 0 | 1 | 0 | 0 | — | Non-root | **Error**: "No entities found" | `TestNonRootWithDataSourceAndNoEntitiesProducesError` | | 6a | 0 | 1 | 0 | 1 | 0 | Non-root | **Error**: "No entities found" | `TestNonRootWithDataSourceAndAutoentitiesResolvingZeroProducesError` | | 6b | 0 | 1 | 0 | 1 | >0 | Non-root | **Valid** | `TestNonRootWithDataSourceAndAutoentitiesResolvingEntitiesIsValid` | | 7 | 0 | 1 | 1 | 0 | — | Non-root | **Valid** | `TestNonRootWithDataSourceAndEntitiesIsValid` | | 8a | 0 | 1 | 1 | 1 | 0 | Non-root | **Valid** + **Warn** | `TestNonRootWithEntitiesAndAutoentitiesResolvingZeroLogsWarning` | | 8b | 0 | 1 | 1 | 1 | >0 | Non-root | **Valid** | _covered by #7 / #6b combined_ | | 9 | 1 | 0 | 0 | 0 | — | Root | **Valid** (children carry the load) | `TestRootWithNoDataSourceAndNoEntitiesIsValid`, `TestRootConfigWithNoDataSourceAndNoEntitiesParses` | | 10 | 1 | 0 | 0 | 1 | — | Root | **Error**: "must not define entities or autoentities" | `TestRootWithNoDataSourceButAutoentitiesProducesError` | | 11 | 1 | 0 | 1 | 0 | — | Root | **Error**: "must not define entities" | `TestRootWithNoDataSourceButEntitiesProducesError` | | 12 | 1 | 0 | 1 | 1 | — | Root | **Error** | _covered by #11_ | | 13 | 1 | 1 | 0 | 0 | — | Root (with own DS) | **Error**: "No entities found" | `TestRootWithDataSourceAndNoEntitiesProducesError` | | 14a | 1 | 1 | 0 | 1 | 0 | Root (with own DS) | **Error**: "No entities found" | `TestRootWithDataSourceAndAutoentitiesResolvingZeroProducesError` | | 14b | 1 | 1 | 0 | 1 | >0 | Root (with own DS) | **Valid** | `TestRootWithDataSourceAndAutoentitiesResolvingEntitiesIsValid` | | 15 | 1 | 1 | 1 | 0 | — | Root (with own DS) | **Valid** | `TestRootWithDataSourceAndEntitiesIsValid` | | 16a | 1 | 1 | 1 | 1 | 0 | Root (with own DS) | **Valid** + **Warn** | `TestRootWithEntitiesAndAutoentitiesResolvingZeroLogsWarning` | | 16b | 1 | 1 | 1 | 1 | >0 | Root (with own DS) | **Valid** | _covered by Azure#15 / #14b combined_ | ### Truth table — child config (validated when iterating `root.ChildConfigs`) Children are always treated as non-root regardless of their own `data-source-files`. | # | DS | E | AE | AE resolved | Expected | Test | |---|:--:|:-:|:--:|:-----------:|----------|------| | C1 | 0 | 0 | 0 | — | **Error** naming the child file: "data source is required" | `TestChildWithNoDataSourceProducesNamedError` | | C2 | 0 | * | * | — | **Error** naming the child file: "data source is required" | _covered by C1_ | | C3 | 1 | 0 | 0 | — | **Error** naming the child file: "No entities found" | `TestChildWithDataSourceAndNoEntitiesProducesNamedError` | | C4a | 1 | 0 | 1 | 0 | **Error** naming the child file: "No entities found" | `TestChildWithDataSourceAndAutoentitiesResolvingZeroProducesNamedError` | | C4b | 1 | 0 | 1 | >0 | **Valid** | _covered by C5 (resolved entities behave the same as manual entities)_ | | C5 | 1 | 1 | 0 | — | **Valid** | _implicitly via `TestRootWithDataSourceAndEntitiesIsValid` setup_ | | C6a | 1 | 1 | 1 | 0 | **Valid** + **Warn** naming the child file | `TestChildWithEntitiesAndAutoentitiesResolvingZeroLogsNamedWarning` | | C6b | 1 | 1 | 1 | >0 | **Valid** | _covered by C5_ | ### Other scenarios | Scenario | Expected | Test | |----------|----------|------| | Connection-string error gates entity validation (no entity error fires when DB unreachable) | `IsConfigValid == false` due to connection error only | `TestValidateNonRootZeroEntitiesWithInvalidConnectionString` | | Config with no entities parses cleanly (constructor no longer throws) and `IsConfigValid` returns false without throwing | parse OK, validate fails | `TestValidateConfigWithNoEntitiesProducesCleanError` _(modified)_ | | Root parses successfully without a data source | parse OK, `IsRootConfig == true` | `TestRootConfigWithNoDataSourceAndNoEntitiesParses` | | Non-root with DS and no entities parses successfully | parse OK, `IsRootConfig == false` | `TestNonRootConfigWithDataSourceAndNoEntitiesParses` | | Autoentities present but resolve to nothing — must not crash, must not double-message with "No entities found" | no crash; only "No entities found" if total = 0 | `ValidateAutoentitiesConfiguration` _(modified to `isValidateOnly: true`)_ | New tests: `TestRootConfigWithNoDataSourceAndNoEntitiesParses` Root config (has data-source-files) without datasource parses OK `TestNonRootConfigWithDataSourceAndNoEntitiesParses` Non-root config with datasource + no entities parses OK (validation catches it later) `TestNonRootWithDataSourceAndNoEntitiesProducesError` Calls ValidateDataSourceAndEntityPresence directly, error recorded `TestNonRootWithNoDataSourceProducesError` No datasource, error with "data source is required" `TestNonRootWithDataSourceAndEntitiesIsValid` Datasource + entities, no errors `TestRootWithNoDataSourceAndNoEntitiesIsValid` Root with child, no own datasource, valid `TestRootWithNoDataSourceButEntitiesProducesError` Root with entities but no datasource, error `TestRootWithDataSourceAndEntitiesIsValid` Root with own datasource + entities, valid `TestChildWithDataSourceAndNoEntitiesProducesNamedError` Child with no entities, error names the child file `TestChildWithNoDataSourceProducesNamedError` Child with no datasource, error names the child file `TestNonRootWithDataSourceAndAutoentitiesResolvingZeroProducesError` Non-root with only autoentities that resolve to 0 `TestNonRootWithDataSourceAndAutoentitiesResolvingEntitiesIsValid` Non-root with only autoentities resolving > 0 entities `TestNonRootWithEntitiesAndAutoentitiesResolvingZeroLogsWarning` Non-root with manual entities + autoentities resolving 0 `TestRootWithNoDataSourceButAutoentitiesProducesError` Root with no datasource but autoentities defined `TestRootWithDataSourceAndNoEntitiesProducesError` Root with own datasource and zero entities/autoentities `TestRootWithDataSourceAndAutoentitiesResolvingZeroProducesError` Root with own datasource and autoentities resolving 0 `TestRootWithDataSourceAndAutoentitiesResolvingEntitiesIsValid` Root with own datasource and autoentities resolving > 0 `TestRootWithEntitiesAndAutoentitiesResolvingZeroLogsWarning` Root with own datasource, manual entities, and autoentities resolving 0 `TestChildWithDataSourceAndAutoentitiesResolvingZeroProducesNamedError` Child with autoentities-only resolving 0 `TestChildWithEntitiesAndAutoentitiesResolvingZeroLogsNamedWarning` Child with manual entities + autoentities resolving 0 Modified tests: `TestValidateConfigWithNoEntitiesProducesCleanError` Replaced main's version (expected parse failure) with ours: parse succeeds, IsConfigValid returns false `ValidateAutoentitiesConfiguration` Changed to isValidateOnly: true, asserts no crashes instead of zero errors --------- Co-authored-by: Anusha Kolan <anushakolan10@gmail.com> * Add MCP notifications/message for log streaming to clients (Azure#3484) ## Why make this change? Enables MCP clients (like MCP Inspector, Claude Desktop, VS Code Copilot) to receive real-time log output via MCP `notifications/message`. Related: Azure#3274 (depends on PR Azure#3419) ## What is this change? When `logging/setLevel` is called with a level other than "none", logs are sent to MCP clients as JSON-RPC notifications: ```json { "jsonrpc": "2.0", "method": "notifications/message", "params": { "level": "info", "logger": "Azure.DataApiBuilder.Service.Startup", "data": "Starting Data API builder..." } } ``` ### New files: - `McpLogNotificationWriter.cs` - Writes logs as MCP notifications to stdout - `McpLogger.cs` / `McpLoggerProvider.cs` - ILogger implementation for .NET logging pipeline - `McpLogNotificationTests.cs` - Unit tests (8 tests) ### Modified files: - `Program.cs` - Registers `McpNotificationWriter` and `McpLoggerProvider` for MCP mode - `McpStdioServer.cs` - Enables notifications when `logging/setLevel` is called ## How was this tested? - Unit tests: 6 tests covering level mapping, enable/disable, JSON format - Manual testing with MCP Inspector: verified notifications appear when `logging/setLevel` is sent ## Note This PR targets `dev/anushakolan/set-log-level` (PR Azure#3419) as it depends on the `logging/setLevel` implementation. * Fix OData filter format in JWT string claims (Azure#3510) ## Why make this change? Fixes the format of the OData filter in JWT string claims. ## What is this change? In `AuthorizationResolver` we now escape embedded single quotes in claim values by doubling them, before we wrap the value in single quotes for OData substitution. This conforms to the OData 4.01 ABNF rule for string literals (Section 7: Literal Data Values). Policy: `@item.col1 eq @claims.userId` Claim `userId` value: `alice' or 1 eq 1 or '` | | Resulting OData predicate | | --- | --- | | Before | `col1 eq 'alice' or 1 eq 1 or ''` <- injects `or 1 eq 1`, bypassing row-level auth | | After | `col1 eq 'alice'' or 1 eq 1 or '''` <- attacker payload contained inside a single string literal | ## How was this tested? New parameterized test `DbPolicy_StringClaim_SingleQuotesEscaped_PreventsODataInjection` in `src/Service.Tests/Authorization/AuthorizationResolverUnitTests.cs` covers: - Active OR-predicate injection attempt is neutralized. - Legitimate apostrophe-bearing value (e.g. `O'Brien`) is safely escaped. - Value composed solely of single quotes is fully escaped. - Value with no single quotes is unchanged aside from the enclosing quotes (no regression). ## Sample Request(s) ```json { "entities": { "Note": { "source": "dbo.Notes", "permissions": [ { "role": "authenticated", "actions": [ { "action": "read", "policy": { "database": "@item.ownerId eq @claims.userId" } } ] } ] } } } ``` Reproduction - `userId` claim value of `alice' or 1 eq 1 or '`: ```http GET /api/Note HTTP/1.1 Authorization: Bearer <jwt-with-crafted-userId-claim> X-MS-API-ROLE: authenticated ``` - Before fix: the engine emitted `WHERE ownerId = 'alice' or 1 eq 1 or ''`, returning rows owned by other users. - After fix: the engine emits `WHERE ownerId = 'alice'' or 1 eq 1 or '''`, which compares against the literal string `alice' or 1 eq 1 or '` and returns no unauthorized rows. Co-authored-by: Souvik Ghosh <souvikofficial04@gmail.com> Co-authored-by: Aniruddh Munde <anmunde@microsoft.com> --------- Co-authored-by: RubenCerna2079 <32799214+RubenCerna2079@users.noreply.github.com> Co-authored-by: Aniruddh Munde <anmunde@microsoft.com> Co-authored-by: aaronburtle <93220300+aaronburtle@users.noreply.github.com> Co-authored-by: Anusha Kolan <anushakolan10@gmail.com> Co-authored-by: Souvik Ghosh <souvikofficial04@gmail.com> Co-authored-by: Sayali Kudale <sayalikudale@microsoft.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR adds configurable text chunking capabilities to the embeddings API, enabling automatic text segmentation before embedding generation. This feature supports both single-text and multi-document batch processing with runtime configuration and query parameter overrides.
Changes
Configuration
Added EmbeddingsChunkingOptions.cs - Configuration model for chunking behavior
Enabled (bool) - Enable/disable chunking
SizeChars (int) - Chunk size in characters (default: 800)
OverlapChars (int) - Overlap between chunks (default: 100)
EffectiveSizeChars property ensures minimum valid chunk size
Modified EmbeddingsOptions.cs - Added Chunking property and IsChunkingEnabled helper
Removed EmbeddingsCacheOptions.cs - Simplified configuration by removing unused cache feature
API Enhancements
Modified Controllers/EmbeddingController.cs
Auto-detects request type (single text vs. document array)
Implements overlapping text chunking algorithm
Supports query parameter overrides: $chunking.enabled, $chunking.size-chars, $chunking.overlap-chars
Returns multiple embeddings per document when chunking is enabled
Added Models/EmbedDocumentRequest.cs - Request model for document arrays
Added Models/EmbedDocumentResponse.cs - Response model with chunked embeddings
Schema (schemas)
Modified dab.draft.schema.json - Added chunking configuration schema with validation rules
Testing (UnitTests)
Added EmbeddingsChunkingOptionsTests.cs (13 tests) - Configuration validation
Added ChunkTextTests.cs (21 tests) - Chunking algorithm validation including edge cases
Modified EmbeddingControllerTests.cs (+18 tests) - API endpoint tests for chunking and document arrays
Total Test Coverage: 72 tests (48 existing + 24 new) - All passing## Why make this change?
Testing
All 72 unit tests passing
Edge cases covered: empty text, very small chunks, overlap larger than chunk size, Unicode text
Query parameter parsing validated
Backward compatibility verified
Breaking Changes
None - This is a backward-compatible addition. Existing single-text requests continue to work without modification.