Skip to content

Add Interactive task type and keeper#2

Open
pxsit wants to merge 1 commit intomainfrom
interactive
Open

Add Interactive task type and keeper#2
pxsit wants to merge 1 commit intomainfrom
interactive

Conversation

@pxsit
Copy link
Copy Markdown
Owner

@pxsit pxsit commented Apr 8, 2026

Introduce interactive task support and related plumbing.

  • Add Interactive TaskType and a separate interactive_keeper to run controller and solution sandboxes, manage pipes and aggregate results (cms/grading/tasktypes/Interactive.py, cms/grading/tasktypes/interactive_keeper.py).
  • Update sandbox to accept integer file descriptors for stdin/stdout/stderr, add close_fds flag and propagate inherit/open-fds options to isolate, silence isolate/init output, and adjust Popen usage (cms/grading/Sandbox.py).
  • Extend evaluation step API to accept wall-clock limits, integer redirects and close_fds control, and wire these into Sandbox configuration (cms/grading/steps/evaluation.py).
  • Add ParameterTypeBool and ParameterTypeFloat to support new task parameters (cms/grading/ParameterTypes.py).
  • Extend YAML loader to detect controllers and stubs and configure Interactive task parameters; add validation preventing both manager and controller (cmscontrib/loaders/italy_yaml.py).
  • Improve ambiguous-file error message to include submission_format (cms/server/contest/submission/file_matching.py).
  • Add interactive tests, example managers, and sample codes to the test suite (cmstestsuite/* additions) and register tests in Tests.py.

These changes enable running interactive/communication-style problems with controllers that spawn solution instances, allow passing raw fds between processes, and provide new parameter types for configuration.

Introduce interactive task support and related plumbing.

- Add Interactive TaskType and a separate interactive_keeper to run controller and solution sandboxes, manage pipes and aggregate results (cms/grading/tasktypes/Interactive.py, cms/grading/tasktypes/interactive_keeper.py).
- Update sandbox to accept integer file descriptors for stdin/stdout/stderr, add close_fds flag and propagate inherit/open-fds options to isolate, silence isolate/init output, and adjust Popen usage (cms/grading/Sandbox.py).
- Extend evaluation step API to accept wall-clock limits, integer redirects and close_fds control, and wire these into Sandbox configuration (cms/grading/steps/evaluation.py).
- Add ParameterTypeBool and ParameterTypeFloat to support new task parameters (cms/grading/ParameterTypes.py).
- Extend YAML loader to detect controllers and stubs and configure Interactive task parameters; add validation preventing both manager and controller (cmscontrib/loaders/italy_yaml.py).
- Improve ambiguous-file error message to include submission_format (cms/server/contest/submission/file_matching.py).
- Add interactive tests, example managers, and sample codes to the test suite (cmstestsuite/* additions) and register tests in Tests.py.

These changes enable running interactive/communication-style problems with controllers that spawn solution instances, allow passing raw fds between processes, and provide new parameter types for configuration.
Copilot AI review requested due to automatic review settings April 8, 2026 09:01
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces first-class support for Interactive tasks in CMS, including a dedicated keeper process to run controller/solution sandboxes and new sandbox/evaluation plumbing to support passing raw file descriptors for stdio redirection.

Changes:

  • Add Interactive TaskType and interactive_keeper.py to orchestrate controller-driven solution spawning and aggregate results/stats.
  • Extend sandbox + evaluation-step APIs to support integer stdin/stdout/stderr redirects, close_fds control, and wall-clock limits.
  • Add YAML loader detection/validation for interactive controllers, add new parameter types (bool/float), and expand the test suite with interactive tasks.

Reviewed changes

Copilot reviewed 32 out of 36 changed files in this pull request and generated 12 comments.

Show a summary per file
File Description
setup.py Registers the new Interactive task type entry point and adds interactive test tasks to package data.
cmstestsuite/Tests.py Adds functional tests covering interactive correctness, WA, TLE, wall-timeout, and crashes (single/many instances).
cmstestsuite/tasks/interactive/** Adds an example interactive task with controller + stubs and inputs.
cmstestsuite/tasks/interactive_many/** Adds an example interactive task that spawns many solution instances.
cmstestsuite/code/interactive-*.{cpp,py} Adds sample contestant solutions for interactive tests (correct/WA/protocol/TLE/etc).
cmscontrib/loaders/italy_yaml.py Detects controllers and configures Interactive task parameters; adds validation against controller+manager.
cms/server/contest/submission/file_matching.py Improves ambiguous-file error message by including submission_format.
cms/grading/tasktypes/Interactive.py Implements compilation/evaluation for Interactive tasks via an external keeper process.
cms/grading/tasktypes/interactive_keeper.py Runs controller + multiple solution sandboxes, wires pipes, merges stats, and returns JSON result.
cms/grading/steps/evaluation.py Extends evaluation-step API with wall limits, integer redirects, and close_fds.
cms/grading/Sandbox.py Adds fd-based stdio wiring, close_fds behavior, and adjusts isolate invocation/output suppression.
cms/grading/ParameterTypes.py Adds ParameterTypeBool and ParameterTypeFloat for task parameters.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

assert admin_text is None
admin_text = line[len("ADMIN_MESSAGE: ") :].strip()
else:
raise ValueError(f"Unknown controller feedback command {f}")
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In get_controller_text(), the ValueError message interpolates f (the file handle) instead of the unexpected line content, so the exception becomes unhelpful and hides the offending controller output. Use the parsed line (or the raw line) in the error message instead.

Suggested change
raise ValueError(f"Unknown controller feedback command {f}")
raise ValueError(f"Unknown controller feedback command {line}")

Copilot uses AI. Check for mistakes.
stdin_redirect=p["c_to_u"][0],
stdout_redirect=p["u_to_c"][1],
multiprocess=False,
close_fds=False,
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solutions are started with close_fds=False. This makes each solution inherit all pipe FDs created for every potential solution (and any other open FDs in the keeper), which can keep pipe ends unintentionally open and lead to hangs (controller never seeing EOF) or cross-talk between solution instances. Use close_fds=True for solution sandboxes and rely on stdin_redirect/stdout_redirect to pass only the intended pipe ends.

Suggested change
close_fds=False,
close_fds=True,

Copilot uses AI. Check for mistakes.
Comment on lines +176 to +185
else:
result = {
"success": False,
"outcome": 0.0,
"text": "Invalid command from controller: " + line,
"admin_text": None,
"stats": None,
}
print(json.dumps(result), flush=True)
sys.exit(0)
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The early-return JSON for an invalid controller command sets text to a string, but the rest of the grading pipeline expects text to be a list of strings (e.g., other branches build lists and EvaluationJob.text is typed as list[str]). Return text as a one-element list for consistency.

Copilot uses AI. Check for mistakes.
except Exception as e:
success = False
text = ["Internal error"]
admin_text = [f"Internal error: {e}"]
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On exception while parsing controller feedback, admin_text is set to a list, but Job.admin_text is a single string (or None) elsewhere in the grading code. Set admin_text to a string (e.g., the formatted error) to avoid type inconsistencies downstream.

Suggested change
admin_text = [f"Internal error: {e}"]
admin_text = f"Internal error: {e}"

Copilot uses AI. Check for mistakes.
Comment on lines 136 to 141
for command in commands:
success = evaluation_step_before_run(
sandbox, command, time_limit, memory_limit,
dirs_map, writable_files, stdin_redirect, stdout_redirect,
multiprocess, wait=True)
None, dirs_map, writable_files, stdin_redirect,
stdout_redirect, multiprocess, wait=True)
if not success:
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

evaluation_step() now passes stdout_redirect=None into evaluation_step_before_run(), but evaluation_step_before_run no longer normalizes None to "stdout.txt". This changes the default behavior from "redirect to stdout.txt" to "no redirect", contradicting evaluation_step's docstring and breaking callers like Batch that rely on default stdout redirection to produce output files. Consider restoring the None→"stdout.txt" normalization in evaluation_step() or in evaluation_step_before_run().

Copilot uses AI. Check for mistakes.
Comment on lines +96 to +97
# Note: isolate wants the memory limit in *bytes*!
self.controller_memory_limit = self.parameters[3] * 2**20
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

init multiplies controller_memory_limit by 2**20 unconditionally. If the parameter is omitted/None (e.g., from a loader default) this will raise at runtime, and even for valid values it couples validation to arithmetic. Handle the optional case explicitly (keep None) and only convert MB→bytes when a numeric limit is provided.

Suggested change
# Note: isolate wants the memory limit in *bytes*!
self.controller_memory_limit = self.parameters[3] * 2**20
controller_memory_limit = self.parameters[3]
# Note: isolate wants the memory limit in *bytes*!
self.controller_memory_limit = (
None
if controller_memory_limit is None
else controller_memory_limit * 2**20
)

Copilot uses AI. Check for mistakes.
Comment on lines +103 to +115
codenames_to_compile = []
if self._uses_stub():
codenames_to_compile.append(self.STUB_BASENAME + ".%l")
codenames_to_compile.extend([x for x in submission_format if x.endswith(".%l")])
res = dict()
for language in LANGUAGES:
source_ext = language.source_extension
executable_filename = self._executable_filename(submission_format, language)
res[language.name] = language.get_compilation_commands(
[
codename.replace(".%l", source_ext)
for codename in codenames_to_compile
],
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_compilation_commands() only includes submission_format entries ending with ".%l". This skips any fixed-name source files in the submission format, so they will never be compiled/linked (unlike Communication, which compiles the entire submission_format). Include all relevant submitted sources (not just wildcard ones), or validate that Interactive submission formats cannot contain fixed filenames.

Suggested change
codenames_to_compile = []
if self._uses_stub():
codenames_to_compile.append(self.STUB_BASENAME + ".%l")
codenames_to_compile.extend([x for x in submission_format if x.endswith(".%l")])
res = dict()
for language in LANGUAGES:
source_ext = language.source_extension
executable_filename = self._executable_filename(submission_format, language)
res[language.name] = language.get_compilation_commands(
[
codename.replace(".%l", source_ext)
for codename in codenames_to_compile
],
res = dict()
for language in LANGUAGES:
source_ext = language.source_extension
codenames_to_compile = []
if self._uses_stub():
codenames_to_compile.append(self.STUB_BASENAME + source_ext)
codenames_to_compile.extend(
codename.replace(".%l", source_ext)
for codename in submission_format
if codename.endswith(".%l")
)
codenames_to_compile.extend(
codename
for codename in submission_format
if not codename.endswith(".%l") and codename.endswith(source_ext)
)
executable_filename = self._executable_filename(submission_format, language)
res[language.name] = language.get_compilation_commands(
codenames_to_compile,

Copilot uses AI. Check for mistakes.
Comment on lines +249 to +257
p = subprocess.Popen(
[sys.executable, keeper_path, json.dumps(keeper_config)],
stdin=subprocess.PIPE,
stdout=subprocess.PIPE,
)
stdout, _ = p.communicate(timeout=self.controller_wall_limit * 2)

KEEPER_ERROR_MESSAGE = "Internal error in interactive keeper"

Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interactive.evaluate() calls keeper_process.communicate(timeout=self.controller_wall_limit * 2) without handling controller_wall_limit being None and without catching TimeoutExpired. Either case will crash the worker instead of returning a controlled "internal error" evaluation result. Ensure a fallback timeout (or no timeout) when the limit is unset, and catch TimeoutExpired to terminate the keeper and set job.success/text appropriately.

Suggested change
p = subprocess.Popen(
[sys.executable, keeper_path, json.dumps(keeper_config)],
stdin=subprocess.PIPE,
stdout=subprocess.PIPE,
)
stdout, _ = p.communicate(timeout=self.controller_wall_limit * 2)
KEEPER_ERROR_MESSAGE = "Internal error in interactive keeper"
KEEPER_ERROR_MESSAGE = "Internal error in interactive keeper"
keeper_timeout = (
self.controller_wall_limit * 2
if self.controller_wall_limit is not None
else None
)
p = subprocess.Popen(
[sys.executable, keeper_path, json.dumps(keeper_config)],
stdin=subprocess.PIPE,
stdout=subprocess.PIPE,
)
try:
stdout, _ = p.communicate(timeout=keeper_timeout)
except subprocess.TimeoutExpired:
logger.error(
"Keeper timed out after %s seconds",
keeper_timeout,
)
p.terminate()
try:
stdout, _ = p.communicate(timeout=5)
except subprocess.TimeoutExpired:
p.kill()
stdout, _ = p.communicate()
job.success = False
job.text = [KEEPER_ERROR_MESSAGE]
return

Copilot uses AI. Check for mistakes.
break

if controller_path is not None and manager_path is not None:
logger.fatal("Cannot have both a manager and a controller")
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When both controller and manager are present, the loader logs logger.fatal but continues building the task; this can silently produce an invalid/misclassified dataset. Fail fast here (raise an exception or return None) so the task import clearly errors out.

Suggested change
logger.fatal("Cannot have both a manager and a controller")
logger.fatal("Cannot have both a manager and a controller")
raise ValueError("Cannot have both a manager and a controller")

Copilot uses AI. Check for mistakes.
Comment on lines +797 to +805
process_limit = conf.get("controller_process_limit", 200)
concurrent = conf.get("interactive_concurrent", True)
controller_memory_limit_mb = conf.get("controller_memory_limit", None)
controller_time_limit = conf.get("controller_time_limit", None)
controller_wall_limit = conf.get("controller_wall_time_limit", None)

args["task_type_parameters"] = \
[process_limit, compilation_param, concurrent,
controller_memory_limit_mb, controller_time_limit, controller_wall_limit]
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interactive task_type_parameters are built with defaults of None for controller_memory_limit/time_limit/wall_limit. However, Interactive's ParameterTypeFloat validation (and Interactive.init arithmetic) do not accept None, so tasks that omit these YAML keys will fail later when instantiating the TaskType. Provide concrete defaults here (or enforce required keys) and keep key names consistent with the Interactive parameter short_names (e.g., use process_limit/concurrent/controller_wall_limit instead of controller_process_limit/controller_wall_time_limit if those are not documented).

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants