Skip to content

Add an RFC For Job Execution Plugins to Enable Online Custom Scorers#2

Merged
B-Step62 merged 7 commits intomlflow:mainfrom
mprahl:online-scoring-plugin
Apr 1, 2026
Merged

Add an RFC For Job Execution Plugins to Enable Online Custom Scorers#2
B-Step62 merged 7 commits intomlflow:mainfrom
mprahl:online-scoring-plugin

Conversation

@mprahl
Copy link
Copy Markdown
Contributor

@mprahl mprahl commented Mar 20, 2026

This is the core design. The follow up for remote scorers supporting custom scoring securely is in #3.

@mprahl
Copy link
Copy Markdown
Contributor Author

mprahl commented Mar 20, 2026

@B-Step62 @etirelli @TomeHirata could you please review this?

@mprahl mprahl force-pushed the online-scoring-plugin branch from 8a013d6 to 511c662 Compare March 20, 2026 18:07
mprahl added a commit to mprahl/mlflow-rfcs that referenced this pull request Mar 20, 2026
This depends on mlflow#2 and adds safe online scoring for custom scorers.

Signed-off-by: mprahl <mprahl@users.noreply.github.com>
@mprahl mprahl force-pushed the online-scoring-plugin branch from 511c662 to 3524afe Compare March 20, 2026 18:11
mprahl added a commit to mprahl/mlflow-rfcs that referenced this pull request Mar 20, 2026
This depends on mlflow#2 and adds safe online scoring for custom scorers.

Signed-off-by: mprahl <mprahl@users.noreply.github.com>
def recover_jobs(self, unfinished_job_ids: list[str]) -> list[JobRecoveryResult]: ...

@property
def scorer_capabilities(self) -> ScorerCapability: ... # defaults to NONE and participates in backend routing
Copy link
Copy Markdown

@TomeHirata TomeHirata Mar 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A job is a higher abstraction than a scorer execution job, and it's a bit odd that a job executor has a property for what scorer type is supported. If the intention is to tell if UDF is supported by the backend or not, can we have a boolean flag like is_udf_supported, or more generally capabilities property that returns ["UDF"]. Also, I wonder if we need this property from the beginning. Any job executor should be able to execute any Python function, and it just has a different resource isolation level. For local development, users are free to use SubprocessJobExecutor, and for the remote tracking server, they can just switch to DockerJobExecutor/K8sJobExecutor, and this property is not used.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. We don't want to restrict to just "scorer" jobs, so we could make this a generic capabilities property.

The reason why I had this was mostly to be able to automatically block custom scorer code if the job executor did not have isolation capabilities. On second thought, we can just let the admin opt in to custom scorers explicitly with an environment variable and/or mlflow server CLI flag.

I'll make that change but let me know if you have a different idea.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On second thought, we can just let the admin opt in to custom scorers explicitly with an environment variable and/or mlflow server CLI flag.

Good point, I prefer to remove capabilities from the job executor interface altogether and support routing or executor<>job type mapping later if explicitly requested.

- `remote_execution` answers whether the job runs through the local direct-store path or through the remote executor
contract

This distinction matters for `optimize_prompts_job`. It is not arbitrary custom Python in the same way that a custom
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd refactor optimize_prompts_job rather than defining the job executor interface based on how optimize_prompts_job works. Btw, this is also an issue for online scorers that don't use MLflow gateway.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TomeHirata, my original thought was to not cause a breaking change for non-gateway users. I also don't think you can do online scoring without the MLflow AI Gateway today, but you can do one-off evaluations through the UI using direct. I may be wrong about that.

So maybe, if the plugin has remote_execution() return True, we can disallow non-gateway usage. Then for existing users, it's not a breaking change, because they would still use the default subprocess executor backend which does support local.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also don't think you can do online scoring without the MLflow AI Gateway today

This is true for the UI path, but I believe users can register judges via the Python API.

So maybe, if the plugin has remote_execution() return True, we can disallow non-gateway usage

Yeah, I can try adding this validation if that's not difficult. Otherwise, we can document the limitation and raise an error at runtime if the direct provider API is used and the secret is missing.


- **Job row claim**: the worker's conditional `PENDING -> RUNNING` transition that gives one MLflow instance ownership
of a queued job row
- **Exclusivity lock**: the higher-level lock stored in `job_locks`, typically for a key such as an experiment ID
Copy link
Copy Markdown

@TomeHirata TomeHirata Mar 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have any concrete use cases for this higher level locking (e.g., experiment id)?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to support the exclusive argument in the job decorator for run_online_trace_scorer_job and run_online_session_scorer_job. They don't allow running multiple per experiment ID. By bring this to a database level lock, we can replicate the same locking that exists in Huey today, except it would now support multiple MLflow replicas.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. @dbczumar, what was the main motivation to bring the resource-based exclusion to the online scorer?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concurrent job executions could result in duplicate logs (e.g. MLflow assessments) for jobs that process traces from a particular experiment, etc, such as the online scoring job.

- **Job lease**: the short-lived `RUNNING`-job lease tracked by `lease_expires_at`, used to detect stale monitored work
- **Scheduler lease**: the single-leader discovery lease stored in `scheduler_leases`

`JobLockManager` replaces Huey's lock helper and keeps the existing lock key computation model. Lock acquisition is an
Copy link
Copy Markdown

@TomeHirata TomeHirata Mar 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do we plan to implement the job queue based on the job table and the job_locks table in a multi-replica setting? Implementing a high-performing multi-process queue is non-trivial, and that's part of why huey was selected instead of a database-based job queue implementation.

Copy link
Copy Markdown
Contributor

@HumairAK HumairAK Mar 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wondering of possibly offloading this capability to a third party. However, Huey's current support for distributed task based queuing seems limited to mostly Redis, which adds another major dependency. We would like to provide users the option to be able to just leverage their current existing mlflow DB to reduce deployment overhead, but Huey doesn't seem to have proper support for sql based DBs.

We will evaluate the feasibility of leveraging this or something similar for task based queing and locking and follow up.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we looked into other alternatives. We were unable to find a strong existing alternative that cleanly fits all of our requirements: no additional infrastructure beyond the existing MLflow DB, multi-replica execution, SQL-backed durability/coordination, portability across PostgreSQL/MySQL/MSSQL, and preserving a built-in OSS/local experience.

The alternatives we discussed each seem to miss at least one of those constraints. Huey has SQL-backed storage, but not the broader distributed coordination model we need here. Celery would introduce an external broker. Other alternatives are specific to DB's like Postgresql.

So given the current constraints, I think the proposal is the right direction. I do agree this means we are taking on distributed queue / locking / recovery correctness as MLflow product scope.

@TomeHirata, what specific implementation detail would be most helpful to spell out next so we can move forward with implementation?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add:

  1. Huey supports SQLite but I didn't see support for other database types.
  2. Celery does support a SQLAlchemy plugin but it would not solve the exclusive lock mechanism already in MLflow jobs on the experiment.

Copy link
Copy Markdown

@TomeHirata TomeHirata Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems Huey supports other types of SQL-backed durability (https://github.com/coleifer/huey/blob/master/huey/contrib/sql_huey.py), wasn't this enough?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a section for a hybrid approach that leverages Huey.

Copy link
Copy Markdown
Contributor

@B-Step62 B-Step62 Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm on the fence between the original recommended option and hybrid approach with the custom implementation of BaseStorage, but leaning toward the former. The dispatch logic is definitely non-trivial, but my worry for the latter approach is that we will be in a weird spot where we build an abstraction framework over Huey with extra requirements (jobs table for lease/token management), while also customizing the low-level primitive (base storage) within Huey. This indicates the use case is out of the tool's main scope and we may not get a great support.

`JobExecutionContext.workspace`, while executors themselves remain workspace-unaware.

Multi-replica coordination assumes a transactional tracking database such as PostgreSQL, MySQL, or MSSQL. SQLite is
acceptable for single-process local use, but it is not a safe foundation for multi-replica lease and lock coordination.
Copy link
Copy Markdown

@TomeHirata TomeHirata Mar 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I overall agree with the statement, but note that by default, mlflow server spins up multiple uvicorn workers.


Each job token is granted only the permissions needed for the job that owns it:

- `EDIT` on the target experiment
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we create an attack vector that allows attackers to access a resource they are not permitted to through job execution. Also, the current design requires us to list required permissions for each job type, but identifying all required permissions for complex jobs like prompt optimization is not trivial.
So I wonder if we should just carry the caller's permission. Concretely,

  1. When a job is submitted, we authenticate the user and generate a short-term token (job token)
  2. The user ID and the job token are included in the HTTP header when the job executor calls the tracking server
  3. The tracking server verifies the token and authorize the request based on the user's permission

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for calling this out. I agree that identifying the required permissions is not trivial with the current shape of the job, especially for optimize_prompts_job, because some of that dependency resolution still happens inside the job at execution time.

That said, it seems like we can make this workable with a refactor rather than by carrying the caller's full permissions at runtime. In particular, if we move more of the dependency/resource resolution for optimize_prompts_job to the server side at submission time instead of inside the job body, we should be able to determine the required resources up front without causing breaking changes for existing users. For remote job executors, we can also require gateway-backed model usage so the set of required permissions stays explicit and bounded.

I still think we should keep the remote execution path least-privileged. We also should add a check to ensure that the user creating an online scorer already holds the permissions required for the job to run, which should help prevent privilege escalation.

There is still some residual risk here because MLflow permissions are not scoped at the run level today. So a token scoped to an experiment could theoretically read or modify other runs in that experiment that it does not strictly need. That is not ideal, but I think it is an acceptable and explicit limitation for now, and still safer than giving the job the caller's full live permissions.

I'll work on a section in the doc to proposes refactoring optimize prompt jobs.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Yeah, it's ideal to verify all required permissions at the submission time based on the job's business logic and the caller's permission. I just called out that bringing the user's permission is probably the most efficient way to avoid breaching the caller's permission. If we think we can check the required permission for all job types at the submission time, let's document the decision and what types of refactoring are necessary.

This is one of the core security benefits of the remote model. The remote backend gets a scoped token, not broad
provider credentials.

`optimize_prompts_job` is excluded from this path by design. It still participates in the common framework, but remains
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto, we should be able to use gateway:/... in optimize_prompts_job too.


## Drawbacks

1. This proposal moves more logic into the core MLflow job framework. Huey previously hid some of that complexity.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit concerning. cc: @WeichenXu123 who made the decision for huey


# Open questions

1. Should `python_env` remain part of the `@job` decorator contract? It is currently unused in practice, and keeping it
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Iirc, python_env is for installing extra packages required for the job. Don't we still need this if we want to allow users to use extra packages in the remote executor?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TomeHirata the main reason of the open question is to simplify things by not continuing to allow specifying the Python version. We'd still want the extra packages though.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. How do we handle an edge case where the Python version of the job executor cannot install the required packages for the job?


def start_executor(self) -> None: ...

def stop_executor(self) -> None: ...
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

q: when is stop_executor called?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The intent was on server shutdown to allow and daemons/processes shutdown gracefully. I'll add a note in the doc. The main motivation is so that each plugin implementation doesn't have to keep track of the server process state to determine when to clean up.

gateway_uri: str | None = None # optional MLflow AI Gateway base URI reachable from the job runtime
token: str | None = None # used by remote executors
workspace: str | None = None
pip_config: PipConfig | None = None # pip install settings for local or remote runtimes
Copy link
Copy Markdown

@TomeHirata TomeHirata Mar 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we reuse the existing _PythonEnv data model?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is your preference to expand _PythonEnv and reuse it here since it doesn't have the configuration for the PyPi index like the proposed PipConfig does?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These fields in JobExecutionContext are immutable, why not just treat them as part of job params ?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if JobExecutionContext is used by JobExecutor, can we pass JobExecutionContext to JobExecutor's constructor instead ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@WeichenXu123 good questions. My intent with JobExecutionContext is to keep framework-owned runtime metadata separate from the job's logical params.

Even if fields like tracking_uri, workspace, and the remote token are immutable for a given run, they are not part of the job function's business input. Some are deployment-derived and some are framework-generated at execution time, so putting them into params would blur the boundary between user/job inputs and runtime/executor metadata.

For the same reason, I don't think they belong on the executor constructor either, since the executor instance is deployment-scoped while this context is per job run. I think submit_job(..., context=...) is still the right shape.

That said, I'll go ahead and update the proposal to extend _PythonEnv rather than introduce a separate PipConfig type.

@mprahl mprahl requested a review from TomeHirata March 23, 2026 21:21
@mprahl
Copy link
Copy Markdown
Contributor Author

mprahl commented Mar 23, 2026

@TomeHirata thanks for the review! I addressed your comments or replied to them. Could you please take another look?

fn_fullname: str,
params: dict[str, Any],
context: JobExecutionContext,
python_env: Any | None = None,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we already can configure python_env for certain job function, do we need to support configuring python_env for individual job run ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the python_env on the job decorator passed down to the backend executor plugin.

Comment on lines +934 to +935
4. Land the Remote Executors RFC after the core abstractions are approved, or
review it in parallel if it helps make the core contract clearer.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

feel free to file the follow-up Remote Executors RFC . I want to review together and see if anything in current RFC needs to improve

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@WeichenXu123 thanks for offering to review! It's at #3.

mprahl added a commit to mprahl/mlflow-rfcs that referenced this pull request Mar 24, 2026
This depends on mlflow#2 and adds safe online scoring for custom scorers.

Signed-off-by: mprahl <mprahl@users.noreply.github.com>
@mprahl mprahl force-pushed the online-scoring-plugin branch from 6bc13da to c749254 Compare March 24, 2026 18:37
@mprahl mprahl requested review from WeichenXu123 and dbczumar March 24, 2026 19:20
@mprahl mprahl force-pushed the online-scoring-plugin branch from c749254 to 88059cf Compare March 25, 2026 17:36
it is set to the same backend name as the default, the deployment still has only
one distinct active backend.

The selected backend name is persisted on the job row as `executor_backend`.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the name here same concept as backend type specified in the env var (e.g. "subprocess")? I'm wondering if we need a unique identifier for the backend instance. For example, some users may want to isolate the task scheduling of scorer and prompt optimization, while using docker as a backend type for both.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By executor_backend, I mean the configured backend identifier selected by the router for that job. This is mostly for job recovery purposes. I intentionally kept the routing model small: one default backend plus one optional custom-scorer override.

Supporting multiple instances of the same backend type with distinct configs or scheduling domains seems useful, but I think that should be a follow-up once the core contract is approved. If you feel strongly about it, I'm happy to add this to the RFC though!

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mprahl Makes sense! Yeah I agree starting with minimum, the only request is to make this field extendable to that use case i.e., type it as sth more granular than backend type.


1. This RFC does not fully specify Docker or Kubernetes backend behavior. Those
backends are covered in a follow-up Remote Executors RFC.
2. This RFC does not define a guardrail execution framework for the MLflow AI
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

exclusive -->|yes| lock["Acquire exclusivity lock"]
exclusive -->|no| submit["Submit to executor"]
lock -->|lock acquired| submit
lock -->|lock busy| cancel["Mark CANCELED"]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How should the retry of the canceled job be handled? The CANCELED state can also represents user-cancelled job that should not be retried.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The intent is CANCELED state is not retried when it's a scheduled job because the next scheduler run will create a new job. This is similar to the current design.

For user submitted jobs, should we have a queue concept of waiting for the lock or should we reject the request at submission time? The latter is certainly simpler but not the best UX.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see. I think we can start from the latter, which is same as the current exclusive handling.


The per-job-type extraction rules are:

- `run_online_trace_scorer_job` and `run_online_session_scorer_job`: read
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is possible that custom code scorer invokes gateway endpoint inside it (e.g. run llm judge with a custom input parsing logic). Using static parsing to extract gateway endpoint name is doable but fragile. We could introduce a configuration knob for users to define explicit permission boundary for each scorer. Also inheriting user's permission as @TomeHirata brought up can be an option that users can choose.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@B-Step62 are you thinking a user is setting the permissions similar to how you assign permissions to a GitHub token or are these global permissions per job type the admin sets?

I really want to avoid the inheriting user permissions option if possible. This is to follow least privileged model and the ambiguity of if the job impersonates the user or somehow extracts the user's permissions.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, is the concern only around AI Gateway permissions or MLflow permissions more broadly?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My concern is particularly around the parsing logic for custom code scorer here. For LLM judges we can reliably determine which resource (gateway endpoint) it uses, but when the custom code scorer requires some resource, there is limitation on how accurately we can inspect the required permissions to execute it.

It is kinda edge case but a scorer like this can exist. Some customers actually do a simple version of this.

@scorer
def tool_grounded(outputs, trace):
    client = OpenAI(base_url="<gateway-url>")
    tool_spans = trace.search_traces(span_type="TOOL")
    judge_prompt = mlflow.genai.load_prompt("prompts://tool-grounded/1")
    text = judge_prompt.format(output=outputs, tools=tool_spans)
    response = client.responses.create(text)
    return parse_result(response)

I'm totally open to start with not supporting this case. However, eventually I believe we need a way for users to configure what resources that the scorer can access. The most intuitive way is RBAC but MLflow does not support it today, so one alternative was inheriting from user's permission.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to Yuki's point, I like the idea of minimizing the permission for the job token, but I'm a bit concerned about the feasibility to reliably infer the permission for each job at submission.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@B-Step62 @TomeHirata thanks for the context! How about asking the user to declare the permissions needed in the @scorer decorator for this special case? So the gateway permissions are a union of the inferred ones and the explicitly declared ones. This would be restricted to gateway endpoints and prompts but could be expanded in the future to other resources.

from dataclasses import dataclass
from typing import Literal


@dataclass(frozen=True)
class RequiredResource:
    type: Literal["gateway_endpoint", "prompt"]
    identifier: str


@scorer(
    required_resources=(
        RequiredResource(type="gateway_endpoint", identifier="endpoint-1"),
        RequiredResource(type="gateway_endpoint", identifier="endpoint-2"),
        RequiredResource(type="prompt", identifier="prompts://tool-grounded/1"),
    ),
)
def tool_grounded(outputs, trace):
    client = OpenAI(base_url="<gateway-url>")
    tool_spans = trace.search_traces(span_type="TOOL")
    judge_prompt = mlflow.genai.load_prompt("prompts://tool-grounded/1")
    text = judge_prompt.format(output=outputs, tools=tool_spans)
    response = client.responses.create(text)
    return parse_result(response)

Copy link
Copy Markdown
Contributor

@B-Step62 B-Step62 Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mprahl Yes I agree explicit permission is a good start! We may promote the Resource primitive that was added for managing custom model permission. It has been exclusively for databricks resources, but the base class is pretty generic.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I updated the PR with a new commit to address this.

I looked at the Resource primitive already there and I think it's different enough that I'd prefer not to reuse it but let me know if you disagree.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I stil prefer not to introduce two permission resource object in the platform, but we can follow-up on the actual PR.

Copy link
Copy Markdown
Contributor

@B-Step62 B-Step62 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great to me overall. The only remaining point seems to be this, but I also feel we can discuss with the concrete implementation of no-huey approach to understand better about the complexity. @TomeHirata @WeichenXu123 what do you think?

Copy link
Copy Markdown

@TomeHirata TomeHirata left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you for iterating on the proposal. I think we can start with having full job management logic in MLflow and revisit if the scope is larger than expected.

mprahl and others added 6 commits March 31, 2026 13:10
Co-authored-by: Humair Khan <HumairAK@users.noreply.github.com>
Signed-off-by: mprahl <mprahl@users.noreply.github.com>
Signed-off-by: mprahl <mprahl@users.noreply.github.com>
Signed-off-by: mprahl <mprahl@users.noreply.github.com>
Signed-off-by: mprahl <mprahl@users.noreply.github.com>
Signed-off-by: mprahl <mprahl@users.noreply.github.com>
Signed-off-by: mprahl <mprahl@users.noreply.github.com>
@mprahl mprahl force-pushed the online-scoring-plugin branch from 8244fc2 to d95f091 Compare March 31, 2026 17:17
@mprahl mprahl requested a review from B-Step62 March 31, 2026 17:18
mprahl added a commit to mprahl/mlflow-rfcs that referenced this pull request Mar 31, 2026
This depends on mlflow#2 and adds safe online scoring for custom scorers.

Signed-off-by: mprahl <mprahl@users.noreply.github.com>
Signed-off-by: mprahl <mprahl@users.noreply.github.com>
@B-Step62 B-Step62 merged commit 92e5212 into mlflow:main Apr 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants