-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[pip] PIP-455: Support Namespace Bundle Lookup and Topic Preloading #25242
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,125 @@ | ||
|
|
||
| --- | ||
|
|
||
| # PIP-455: Support Namespace Bundle Lookup and Topic Preloading | ||
|
|
||
|
|
||
| --- | ||
|
|
||
| ## Background Knowledge | ||
|
|
||
| Apache Pulsar uses **namespace bundles** as the unit of ownership and load balancing. | ||
|
|
||
| Key concepts: | ||
|
|
||
| - **Namespace Bundle**: A subdivision of a namespace's hash space, representing a set of topics whose names hash into that range. | ||
| - **Bundle Ownership**: At any given time, each bundle is owned by exactly one broker, which is responsible for serving all topics within that bundle. | ||
| - **Lazy Topic Loading**: By default, topics are not loaded into memory until the first producer/consumer request arrives. This reduces startup overhead but increases first-call latency. | ||
| - **PulsarAdmin & pulsar-admin CLI**: The administrative interface for managing Pulsar clusters, including operations on namespaces, topics, bundles, etc. | ||
|
|
||
| Currently, there is **no API to proactively lookup a namespace bundle or load all topics within a bundle**. This forces users to trigger topic creation via producer/consumer requests, which is not suitable for: | ||
| - Warm-up scenarios (preloading topics before traffic arrives) | ||
| - Disaster recovery (forcing bundle ownership transfer and topic loading) | ||
| - Observability (checking which broker actually owns a bundle) | ||
|
|
||
| --- | ||
|
|
||
| ## Motivation | ||
|
|
||
| The current implementation of namespace and bundle management lacks support for **explicit lookup and preloading**. This leads to several pain points: | ||
|
|
||
| 1. **No way to warm up topics** | ||
| In production, after a broker restart or bundle unload, topics are loaded lazily. The first request experiences high latency due to topic metadata loading, cursor recovery, and ownership establishment. There is no API to proactively load topics in a bundle to avoid this cold-start penalty. Some use cases (e.g., migration validation, pre-warming for large-scale events) require loading all topics in a namespace. | ||
|
|
||
| 2. **Difficult to verify bundle ownership** | ||
| While internal lookup mechanisms exist, there is no admin-facing API to query the owner of a specific bundle and force-load it onto the current broker. This makes operational debugging and manual intervention cumbersome. | ||
|
|
||
| 3. **Client-Admin API inconsistency** | ||
| The `pulsar-admin` CLI provides `unload`, `split`, `clear-backlog` for bundles, but no `load` or `lookup` counterpart. This asymmetry complicates operational tooling. | ||
|
|
||
| 4. **Dependency cycle in the codebase** | ||
| The `LookupData` class resides in `pulsar-common`, but `pulsar-client-admin-api` cannot depend on it directly. This forced a workaround via a new interface to avoid cyclic dependencies. | ||
|
|
||
| This proposal introduces **bundle-level and namespace-level lookup + load** APIs, enabling operators to proactively control bundle ownership and topic lifecycle. | ||
|
|
||
| --- | ||
|
|
||
| ## Goals | ||
|
|
||
| ### In Scope | ||
|
|
||
| - Provide a new admin API to **lookup a namespace bundle**, returning the broker serving it (same as topic lookup but at bundle granularity). | ||
| - Provide a new admin API to **load all topics in a namespace bundle** onto the owning broker. | ||
| - Provide a new admin API to **load all topics in a namespace** (by iterating over its bundles). | ||
| - Extend the `pulsar-admin namespaces` CLI with `lookup` and `lookup-bundle` commands. | ||
| - Introduce `LookupDataInterface` to break the cyclic dependency between `pulsar-common` and `pulsar-client-admin-api`. | ||
|
|
||
|
|
||
|
|
||
| ## High-Level Design | ||
|
|
||
| The core idea is to extend the existing `Namespaces` admin resource to support **lookup operations at both namespace and bundle granularity**, with an optional flag to trigger topic loading. | ||
|
|
||
| ### 1. New REST Endpoints | ||
|
|
||
| **V2 :** | ||
|
|
||
| ``` | ||
| PUT /admin/v2/namespaces/{tenant}/{namespace}/lookup | ||
| PUT /admin/v2/namespaces/{tenant}/{namespace}/{bundle}/lookup | ||
| ``` | ||
|
|
||
| **V1 :** | ||
|
|
||
| ``` | ||
| PUT /admin/namespaces/{property}/{cluster}/{namespace}/lookup | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We don't need to support V1 API since we are working on removing the V1 endpoints from the repo. |
||
| PUT /admin/namespaces/{property}/{cluster}/{namespace}/{bundle}/lookup | ||
| ``` | ||
|
|
||
| **Query Parameters:** | ||
|
|
||
| | Parameter | Type | Default | Description | | ||
| |-----------|------|---------|-------------| | ||
| | `loadTopicInBundle` | boolean | `false` | If `true`, all topics in the bundle are loaded on the owning broker. | | ||
| | `authoritative` | boolean | `false` | If `true`, skip redirects and force the current broker to serve as owner (if no any other broker owns this bundle). | | ||
|
|
||
| **Response:** | ||
|
|
||
| - For namespace-level lookup: `204 No Content` on success. | ||
| - For bundle-level lookup: Returns a `LookupData` object (JSON) containing broker URLs, identical to the response of topic lookup. | ||
|
|
||
|
|
||
| ### 2. Admin Client API Extensions | ||
|
|
||
| **New methods in `Namespaces` interface:** | ||
|
|
||
| ```java | ||
| void lookupNamespace(String namespace, boolean loadTopicInBundle, boolean authoritative) | ||
| CompletableFuture<Void> lookupNamespaceAsync(...) | ||
|
|
||
| LookupDataInterface lookupNamespaceBundle(String namespace, String bundle, boolean loadTopicInBundle, boolean authoritative) | ||
| CompletableFuture<LookupDataInterface> lookupNamespaceBundleAsync(...) | ||
| ``` | ||
|
|
||
| ### 3. CLI Extensions | ||
|
|
||
| **New `lookup` command in `pulsar-admin namespaces`:** | ||
|
|
||
| ```bash | ||
| # Load all bundles in a namespace | ||
| pulsar-admin namespaces lookup my-tenant/my-ns | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. One concern I have with the namespace-level lookup / preloading flow is bundle placement fairness before traffic actually arrives. For pulsar-admin namespaces lookup my-tenant/my-ns, if the implementation just iterates bundles and triggers lookup/ownership acquisition one by one, I’m not sure this will produce an even bundle distribution across brokers in the cold-start case. At preload time, these bundles may have little or no traffic, so the load manager may not have enough signal to place them in a way that reflects the future workload. In the worst case, a single broker could end up owning many preloaded bundles, and then when traffic arrives later we would need another rebalance cycle, which brings us back to the same lazy-loading / cold-start problem that this PIP is trying to avoid. Could the PIP clarify this part of the design a bit more? How should namespace-level preload avoid concentrating too many bundles on one broker? Should the API return the final bundle -> broker mapping so operators can verify that the warm-up actually achieved a balanced placement? |
||
|
|
||
| # Load all topics in a specific bundle | ||
| pulsar-admin namespaces lookup my-tenant/my-ns -b 0x80000000_0xffffffff -l | ||
|
|
||
| # Lookup bundle owner without loading topics | ||
| pulsar-admin namespaces lookup my-tenant/my-ns -b 0x80000000_0xffffffff | ||
| ``` | ||
|
|
||
| ### 4. Dependency Resolution | ||
|
|
||
| To avoid cyclic dependencies, a new interface `LookupDataInterface` is introduced in `pulsar-client-admin-api`. The existing `LookupData` class in `pulsar-common` implements this interface. This allows the admin client to depend only on the interface, while the broker returns the concrete type. | ||
|
|
||
|
|
||
| --- | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please also define the response format of this API, it will be a map which mapping the service URL for each bundle?