Add Interactive task type and keeper by pxsit · Pull Request #2 · pxsit/cms

pxsit · 2026-04-08T09:00:59Z

Introduce interactive task support and related plumbing.

Add Interactive TaskType and a separate interactive_keeper to run controller and solution sandboxes, manage pipes and aggregate results (cms/grading/tasktypes/Interactive.py, cms/grading/tasktypes/interactive_keeper.py).
Update sandbox to accept integer file descriptors for stdin/stdout/stderr, add close_fds flag and propagate inherit/open-fds options to isolate, silence isolate/init output, and adjust Popen usage (cms/grading/Sandbox.py).
Extend evaluation step API to accept wall-clock limits, integer redirects and close_fds control, and wire these into Sandbox configuration (cms/grading/steps/evaluation.py).
Add ParameterTypeBool and ParameterTypeFloat to support new task parameters (cms/grading/ParameterTypes.py).
Extend YAML loader to detect controllers and stubs and configure Interactive task parameters; add validation preventing both manager and controller (cmscontrib/loaders/italy_yaml.py).
Improve ambiguous-file error message to include submission_format (cms/server/contest/submission/file_matching.py).
Add interactive tests, example managers, and sample codes to the test suite (cmstestsuite/* additions) and register tests in Tests.py.

These changes enable running interactive/communication-style problems with controllers that spawn solution instances, allow passing raw fds between processes, and provide new parameter types for configuration.

Introduce interactive task support and related plumbing. - Add Interactive TaskType and a separate interactive_keeper to run controller and solution sandboxes, manage pipes and aggregate results (cms/grading/tasktypes/Interactive.py, cms/grading/tasktypes/interactive_keeper.py). - Update sandbox to accept integer file descriptors for stdin/stdout/stderr, add close_fds flag and propagate inherit/open-fds options to isolate, silence isolate/init output, and adjust Popen usage (cms/grading/Sandbox.py). - Extend evaluation step API to accept wall-clock limits, integer redirects and close_fds control, and wire these into Sandbox configuration (cms/grading/steps/evaluation.py). - Add ParameterTypeBool and ParameterTypeFloat to support new task parameters (cms/grading/ParameterTypes.py). - Extend YAML loader to detect controllers and stubs and configure Interactive task parameters; add validation preventing both manager and controller (cmscontrib/loaders/italy_yaml.py). - Improve ambiguous-file error message to include submission_format (cms/server/contest/submission/file_matching.py). - Add interactive tests, example managers, and sample codes to the test suite (cmstestsuite/* additions) and register tests in Tests.py. These changes enable running interactive/communication-style problems with controllers that spawn solution instances, allow passing raw fds between processes, and provide new parameter types for configuration.

Copilot

Pull request overview

This PR introduces first-class support for Interactive tasks in CMS, including a dedicated keeper process to run controller/solution sandboxes and new sandbox/evaluation plumbing to support passing raw file descriptors for stdio redirection.

Changes:

Add Interactive TaskType and interactive_keeper.py to orchestrate controller-driven solution spawning and aggregate results/stats.
Extend sandbox + evaluation-step APIs to support integer stdin/stdout/stderr redirects, close_fds control, and wall-clock limits.
Add YAML loader detection/validation for interactive controllers, add new parameter types (bool/float), and expand the test suite with interactive tasks.

Reviewed changes

Copilot reviewed 32 out of 36 changed files in this pull request and generated 12 comments.

Show a summary per file

File	Description
`setup.py`	Registers the new Interactive task type entry point and adds interactive test tasks to package data.
`cmstestsuite/Tests.py`	Adds functional tests covering interactive correctness, WA, TLE, wall-timeout, and crashes (single/many instances).
`cmstestsuite/tasks/interactive/**`	Adds an example interactive task with controller + stubs and inputs.
`cmstestsuite/tasks/interactive_many/**`	Adds an example interactive task that spawns many solution instances.
`cmstestsuite/code/interactive-*.{cpp,py}`	Adds sample contestant solutions for interactive tests (correct/WA/protocol/TLE/etc).
`cmscontrib/loaders/italy_yaml.py`	Detects controllers and configures Interactive task parameters; adds validation against controller+manager.
`cms/server/contest/submission/file_matching.py`	Improves ambiguous-file error message by including `submission_format`.
`cms/grading/tasktypes/Interactive.py`	Implements compilation/evaluation for Interactive tasks via an external keeper process.
`cms/grading/tasktypes/interactive_keeper.py`	Runs controller + multiple solution sandboxes, wires pipes, merges stats, and returns JSON result.
`cms/grading/steps/evaluation.py`	Extends evaluation-step API with wall limits, integer redirects, and `close_fds`.
`cms/grading/Sandbox.py`	Adds fd-based stdio wiring, `close_fds` behavior, and adjusts isolate invocation/output suppression.
`cms/grading/ParameterTypes.py`	Adds `ParameterTypeBool` and `ParameterTypeFloat` for task parameters.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-08T09:09:58Z

cms/grading/tasktypes/interactive_keeper.py

+                assert admin_text is None
+                admin_text = line[len("ADMIN_MESSAGE: ") :].strip()
+            else:
+                raise ValueError(f"Unknown controller feedback command {f}")


In get_controller_text(), the ValueError message interpolates f (the file handle) instead of the unexpected line content, so the exception becomes unhelpful and hides the offending controller output. Use the parsed line (or the raw line) in the error message instead.

Suggested change

raise ValueError(f"Unknown controller feedback command {f}")

raise ValueError(f"Unknown controller feedback command {line}")

Copilot · 2026-04-08T09:09:59Z

cms/grading/tasktypes/interactive_keeper.py

+                stdin_redirect=p["c_to_u"][0],
+                stdout_redirect=p["u_to_c"][1],
+                multiprocess=False,
+                close_fds=False,


Solutions are started with close_fds=False. This makes each solution inherit all pipe FDs created for every potential solution (and any other open FDs in the keeper), which can keep pipe ends unintentionally open and lead to hangs (controller never seeing EOF) or cross-talk between solution instances. Use close_fds=True for solution sandboxes and rely on stdin_redirect/stdout_redirect to pass only the intended pipe ends.

Suggested change

close_fds=False,

close_fds=True,

Copilot · 2026-04-08T09:09:59Z

cms/grading/tasktypes/interactive_keeper.py

+        else:
+            result = {
+                "success": False,
+                "outcome": 0.0,
+                "text": "Invalid command from controller: " + line,
+                "admin_text": None,
+                "stats": None,
+            }
+            print(json.dumps(result), flush=True)
+            sys.exit(0)


The early-return JSON for an invalid controller command sets text to a string, but the rest of the grading pipeline expects text to be a list of strings (e.g., other branches build lists and EvaluationJob.text is typed as list[str]). Return text as a one-element list for consistency.

Copilot · 2026-04-08T09:10:00Z

cms/grading/tasktypes/interactive_keeper.py

+    except Exception as e:
+        success = False
+        text = ["Internal error"]
+        admin_text = [f"Internal error: {e}"]


On exception while parsing controller feedback, admin_text is set to a list, but Job.admin_text is a single string (or None) elsewhere in the grading code. Set admin_text to a string (e.g., the formatted error) to avoid type inconsistencies downstream.

Suggested change

admin_text = [f"Internal error: {e}"]

admin_text = f"Internal error: {e}"

Copilot · 2026-04-08T09:10:00Z

cms/grading/steps/evaluation.py

    for command in commands:
        success = evaluation_step_before_run(
            sandbox, command, time_limit, memory_limit,
-            dirs_map, writable_files, stdin_redirect, stdout_redirect,
-            multiprocess, wait=True)
+            None, dirs_map, writable_files, stdin_redirect,
+            stdout_redirect, multiprocess, wait=True)
        if not success:


evaluation_step() now passes stdout_redirect=None into evaluation_step_before_run(), but evaluation_step_before_run no longer normalizes None to "stdout.txt". This changes the default behavior from "redirect to stdout.txt" to "no redirect", contradicting evaluation_step's docstring and breaking callers like Batch that rely on default stdout redirection to produce output files. Consider restoring the None→"stdout.txt" normalization in evaluation_step() or in evaluation_step_before_run().

Copilot · 2026-04-08T09:10:01Z

cms/grading/tasktypes/Interactive.py

+        # Note: isolate wants the memory limit in *bytes*!
+        self.controller_memory_limit = self.parameters[3] * 2**20


init multiplies controller_memory_limit by 2**20 unconditionally. If the parameter is omitted/None (e.g., from a loader default) this will raise at runtime, and even for valid values it couples validation to arithmetic. Handle the optional case explicitly (keep None) and only convert MB→bytes when a numeric limit is provided.

Suggested change

# Note: isolate wants the memory limit in *bytes*!

self.controller_memory_limit = self.parameters[3] * 2**20

controller_memory_limit = self.parameters[3]

# Note: isolate wants the memory limit in *bytes*!

self.controller_memory_limit = (

None

if controller_memory_limit is None

else controller_memory_limit * 2**20

)

Copilot · 2026-04-08T09:10:01Z

cms/grading/tasktypes/Interactive.py

+        codenames_to_compile = []
+        if self._uses_stub():
+            codenames_to_compile.append(self.STUB_BASENAME + ".%l")
+        codenames_to_compile.extend([x for x in submission_format if x.endswith(".%l")])
+        res = dict()
+        for language in LANGUAGES:
+            source_ext = language.source_extension
+            executable_filename = self._executable_filename(submission_format, language)
+            res[language.name] = language.get_compilation_commands(
+                [
+                    codename.replace(".%l", source_ext)
+                    for codename in codenames_to_compile
+                ],


get_compilation_commands() only includes submission_format entries ending with ".%l". This skips any fixed-name source files in the submission format, so they will never be compiled/linked (unlike Communication, which compiles the entire submission_format). Include all relevant submitted sources (not just wildcard ones), or validate that Interactive submission formats cannot contain fixed filenames.

Suggested change

codenames_to_compile = []

if self._uses_stub():

codenames_to_compile.append(self.STUB_BASENAME + ".%l")

codenames_to_compile.extend([x for x in submission_format if x.endswith(".%l")])

res = dict()

for language in LANGUAGES:

source_ext = language.source_extension

executable_filename = self._executable_filename(submission_format, language)

res[language.name] = language.get_compilation_commands(

[

codename.replace(".%l", source_ext)

for codename in codenames_to_compile

],

res = dict()

for language in LANGUAGES:

source_ext = language.source_extension

codenames_to_compile = []

if self._uses_stub():

codenames_to_compile.append(self.STUB_BASENAME + source_ext)

codenames_to_compile.extend(

codename.replace(".%l", source_ext)

for codename in submission_format

if codename.endswith(".%l")

)

codenames_to_compile.extend(

codename

for codename in submission_format

if not codename.endswith(".%l") and codename.endswith(source_ext)

)

executable_filename = self._executable_filename(submission_format, language)

res[language.name] = language.get_compilation_commands(

codenames_to_compile,

Copilot · 2026-04-08T09:10:01Z

cms/grading/tasktypes/Interactive.py

+            p = subprocess.Popen(
+                [sys.executable, keeper_path, json.dumps(keeper_config)],
+                stdin=subprocess.PIPE,
+                stdout=subprocess.PIPE,
+            )
+            stdout, _ = p.communicate(timeout=self.controller_wall_limit * 2)
+
+            KEEPER_ERROR_MESSAGE = "Internal error in interactive keeper"
+


Interactive.evaluate() calls keeper_process.communicate(timeout=self.controller_wall_limit * 2) without handling controller_wall_limit being None and without catching TimeoutExpired. Either case will crash the worker instead of returning a controlled "internal error" evaluation result. Ensure a fallback timeout (or no timeout) when the limit is unset, and catch TimeoutExpired to terminate the keeper and set job.success/text appropriately.

Suggested change

p = subprocess.Popen(

[sys.executable, keeper_path, json.dumps(keeper_config)],

stdin=subprocess.PIPE,

stdout=subprocess.PIPE,

)

stdout, _ = p.communicate(timeout=self.controller_wall_limit * 2)

KEEPER_ERROR_MESSAGE = "Internal error in interactive keeper"

KEEPER_ERROR_MESSAGE = "Internal error in interactive keeper"

keeper_timeout = (

self.controller_wall_limit * 2

if self.controller_wall_limit is not None

else None

)

p = subprocess.Popen(

[sys.executable, keeper_path, json.dumps(keeper_config)],

stdin=subprocess.PIPE,

stdout=subprocess.PIPE,

)

try:

stdout, _ = p.communicate(timeout=keeper_timeout)

except subprocess.TimeoutExpired:

logger.error(

"Keeper timed out after %s seconds",

keeper_timeout,

)

p.terminate()

try:

stdout, _ = p.communicate(timeout=5)

except subprocess.TimeoutExpired:

p.kill()

stdout, _ = p.communicate()

job.success = False

job.text = [KEEPER_ERROR_MESSAGE]

return

Copilot · 2026-04-08T09:10:02Z

cmscontrib/loaders/italy_yaml.py

+                break

+        if controller_path is not None and manager_path is not None:
+            logger.fatal("Cannot have both a manager and a controller")


When both controller and manager are present, the loader logs logger.fatal but continues building the task; this can silently produce an invalid/misclassified dataset. Fail fast here (raise an exception or return None) so the task import clearly errors out.

Suggested change

logger.fatal("Cannot have both a manager and a controller")

logger.fatal("Cannot have both a manager and a controller")

raise ValueError("Cannot have both a manager and a controller")

Copilot · 2026-04-08T09:10:02Z

cmscontrib/loaders/italy_yaml.py

+            process_limit = conf.get("controller_process_limit", 200)
+            concurrent = conf.get("interactive_concurrent", True)
+            controller_memory_limit_mb = conf.get("controller_memory_limit", None)
+            controller_time_limit = conf.get("controller_time_limit", None)
+            controller_wall_limit = conf.get("controller_wall_time_limit", None)
+
+            args["task_type_parameters"] = \
+                [process_limit, compilation_param, concurrent,
+                 controller_memory_limit_mb, controller_time_limit, controller_wall_limit]


Interactive task_type_parameters are built with defaults of None for controller_memory_limit/time_limit/wall_limit. However, Interactive's ParameterTypeFloat validation (and Interactive.init arithmetic) do not accept None, so tasks that omit these YAML keys will fail later when instantiating the TaskType. Provide concrete defaults here (or enforce required keys) and keep key names consistent with the Interactive parameter short_names (e.g., use process_limit/concurrent/controller_wall_limit instead of controller_process_limit/controller_wall_time_limit if those are not documented).

Copilot AI review requested due to automatic review settings April 8, 2026 09:01

Copilot started reviewing on behalf of pxsit April 8, 2026 09:01 View session

Copilot AI reviewed Apr 8, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Interactive task type and keeper#2

Add Interactive task type and keeper#2
pxsit wants to merge 1 commit intomainfrom
interactive

pxsit commented Apr 8, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 8, 2026

Uh oh!

Copilot AI Apr 8, 2026

Uh oh!

Copilot AI Apr 8, 2026

Uh oh!

Copilot AI Apr 8, 2026

Uh oh!

Copilot AI Apr 8, 2026

Uh oh!

Copilot AI Apr 8, 2026

Uh oh!

Copilot AI Apr 8, 2026

Uh oh!

Copilot AI Apr 8, 2026

Uh oh!

Copilot AI Apr 8, 2026

Uh oh!

Copilot AI Apr 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	raise ValueError(f"Unknown controller feedback command {f}")
	raise ValueError(f"Unknown controller feedback command {line}")

	admin_text = [f"Internal error: {e}"]
	admin_text = f"Internal error: {e}"

		# Note: isolate wants the memory limit in bytes!
		self.controller_memory_limit = self.parameters[3] * 2**20

-            p = subprocess.Popen(
-                [sys.executable, keeper_path, json.dumps(keeper_config)],
-                stdin=subprocess.PIPE,
-                stdout=subprocess.PIPE,
-            )
-            stdout, _ = p.communicate(timeout=self.controller_wall_limit * 2)
-            KEEPER_ERROR_MESSAGE = "Internal error in interactive keeper"
+            KEEPER_ERROR_MESSAGE = "Internal error in interactive keeper"
+            keeper_timeout = (
+                self.controller_wall_limit * 2
+                if self.controller_wall_limit is not None
+                else None
+            )
+            p = subprocess.Popen(
+                [sys.executable, keeper_path, json.dumps(keeper_config)],
+                stdin=subprocess.PIPE,
+                stdout=subprocess.PIPE,
+            )
+            try:
+                stdout, _ = p.communicate(timeout=keeper_timeout)
+            except subprocess.TimeoutExpired:
+                logger.error(
+                    "Keeper timed out after %s seconds",
+                    keeper_timeout,
+                )
+                p.terminate()
+                try:
+                    stdout, _ = p.communicate(timeout=5)
+                except subprocess.TimeoutExpired:
+                    p.kill()
+                    stdout, _ = p.communicate()
+                job.success = False
+                job.text = [KEEPER_ERROR_MESSAGE]
+                return

	logger.fatal("Cannot have both a manager and a controller")
	logger.fatal("Cannot have both a manager and a controller")
	raise ValueError("Cannot have both a manager and a controller")

Conversation

pxsit commented Apr 8, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants