-
Notifications
You must be signed in to change notification settings - Fork 6
feat: middleware api controller #198
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev
Are you sure you want to change the base?
Conversation
…tions and validation
Signed-off-by: Keshav Dayal <115068840+keshxvdayal@users.noreply.github.com>
Reviewer's GuideImplements the backend Middleware Management REST API, including OpenAPI spec wiring, MongoDB persistence, controller logic for CRUD/reorder/validate operations, basic static code validation, and dedicated middleware-related exception types and indexes. Sequence diagram for AddMiddleware endpointsequenceDiagram
actor AdminClient
participant ConnexionFlask as Connexion_Flask
participant Controllers as MiddlewareControllers
participant Mongo as MongoDB_middlewares
AdminClient->>ConnexionFlask: HTTP POST /ga4gh/tes/v1/admin/middlewares
ConnexionFlask->>Controllers: AddMiddleware()
Controllers->>Controllers: Parse request.json into MiddlewareCreate
Controllers->>Mongo: find_one({name})
Mongo-->>Controllers: existing_by_name
alt name exists
Controllers-->>ConnexionFlask: 400 BadRequest
ConnexionFlask-->>AdminClient: Error response
else name free
Controllers->>Mongo: find_one({class_path})
Mongo-->>Controllers: existing_by_class_path
alt class_path exists
Controllers-->>ConnexionFlask: 400 BadRequest
ConnexionFlask-->>AdminClient: Error response
else unique
alt order omitted
Controllers->>Mongo: find_one(sort={order:-1})
Mongo-->>Controllers: max_order_doc
Controllers->>Controllers: compute new order
else order provided
Controllers->>Mongo: update_many({order: >= new}, {$inc:{order:1}})
Mongo-->>Controllers: shift_result
end
Controllers->>Controllers: build document with timestamps and source
Controllers->>Mongo: insert_one(doc)
Mongo-->>Controllers: inserted_id
Controllers-->>ConnexionFlask: 201 {id, message}
ConnexionFlask-->>AdminClient: 201 Created JSON
end
end
ER diagram for middlewares MongoDB collectionerDiagram
MIDDLEWARES {
string _id
string name
string class_path
int order
boolean enabled
string source
string github_url
string config
string created_at
string updated_at
string deleted_at
}
%% Unique indexes
UNIQUE_INDEX_NAME {
string name
}
UNIQUE_INDEX_CLASSPATH {
string class_path
}
UNIQUE_INDEX_NAME ||--|| MIDDLEWARES : enforces_unique_name
UNIQUE_INDEX_CLASSPATH ||--|| MIDDLEWARES : enforces_unique_class_path
Class diagram for middleware management models and controllersclassDiagram
class MiddlewareDocument {
+name: str
+class_path: str|List~str~
+order: int
+enabled: bool
+config: dict
+source: str
+github_url: str
+created_at: str
+updated_at: str
+deleted_at: str
}
class MiddlewareCreate {
+name: str
+class_path: str|List~str~
+order: int
+enabled: bool
+config: dict
+github_url: str
}
class MiddlewareUpdate {
+name: str
+order: int
+config: dict
+enabled: bool
}
class MiddlewareList {
+middlewares: List~dict~
+total: int
}
class MiddlewareCreateResponse {
+id: str
+message: str
}
class MiddlewareOrder {
+middleware_ids: List~str~
}
class ValidationRequest {
+class_path: str
+code: str
+github_url: str
}
class ValidationResponse {
+valid: bool
+message: str
+detected_class: str
+required_methods: List~str~
}
class MiddlewareControllers {
+ListMiddlewares(limit, offset, sort_by, enabled, source): dict
+AddMiddleware(): tuple
+GetMiddleware(middleware_id): dict
+UpdateMiddleware(middleware_id): dict
+DeleteMiddleware(middleware_id, force): tuple
+ReorderMiddlewares(): dict
+ValidateMiddleware(): dict
+get_middleware_collection()
}
class MiddlewareValidation {
+validate_middleware_code(code, class_path): dict
}
MiddlewareControllers --> MiddlewareCreate : uses
MiddlewareControllers --> MiddlewareUpdate : uses
MiddlewareControllers --> MiddlewareOrder : uses
MiddlewareControllers --> ValidationRequest : uses
MiddlewareControllers --> ValidationResponse : returns
MiddlewareControllers --> MiddlewareDocument : persists
MiddlewareControllers --> MiddlewareList : returns
MiddlewareControllers --> MiddlewareCreateResponse : returns
MiddlewareControllers --> MiddlewareValidation : calls
File-Level Changes
Possibly linked issues
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey - I've found 4 issues, and left some high level feedback:
- There are several mismatches between the OpenAPI spec and the controller/model implementations (e.g.,
MiddlewareCreateResponsein the spec returns_idandorderwhileAddMiddlewarereturnsidand noorder,ReorderMiddlewaresexpectsordered_idsin the spec but controllers/models usemiddleware_ids, andErrorResponse.codeis defined as integer whileexceptionsuses string codes) — aligning these will prevent runtime/schema validation issues. - The custom middleware exception classes added in
exceptions.py(e.g.,MiddlewareNotFound,MiddlewareDuplicateName) are not used in the controllers, which still raise genericBadRequest/NotFound; consider switching to the custom exceptions to leverage the centralized error mapping. - The validation flow around
github_urland security checks is only partially wired (e.g.,ValidateMiddlewareandvalidate_middleware_codeignoregithub_urland perform simpler checks than described in the PR), so it may be worth either implementing the full intended behavior or adjusting the API/description to match what is actually enforced.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- There are several mismatches between the OpenAPI spec and the controller/model implementations (e.g., `MiddlewareCreateResponse` in the spec returns `_id` and `order` while `AddMiddleware` returns `id` and no `order`, `ReorderMiddlewares` expects `ordered_ids` in the spec but controllers/models use `middleware_ids`, and `ErrorResponse.code` is defined as integer while `exceptions` uses string codes) — aligning these will prevent runtime/schema validation issues.
- The custom middleware exception classes added in `exceptions.py` (e.g., `MiddlewareNotFound`, `MiddlewareDuplicateName`) are not used in the controllers, which still raise generic `BadRequest`/`NotFound`; consider switching to the custom exceptions to leverage the centralized error mapping.
- The validation flow around `github_url` and security checks is only partially wired (e.g., `ValidateMiddleware` and `validate_middleware_code` ignore `github_url` and perform simpler checks than described in the PR), so it may be worth either implementing the full intended behavior or adjusting the API/description to match what is actually enforced.
## Individual Comments
### Comment 1
<location> `pro_tes/api/middleware_management.yaml:533-297` </location>
<code_context>
+ description: Number of results skipped
+ example: 0
+
+ MiddlewareCreateResponse:
+ type: object
+ description: Response after creating a middleware
+ required:
+ - _id
+ - order
+ - message
+ properties:
+ _id:
+ type: string
+ description: Unique identifier of created middleware
+ example: "507f1f77bcf86cd799439011"
+ order:
+ type: integer
+ description: Assigned execution order
+ example: 0
+ message:
+ type: string
+ description: Success message
</code_context>
<issue_to_address>
**issue (bug_risk):** Response schema for middleware creation does not match controller response shape.
`MiddlewareCreateResponse` requires `_id`, `order`, and `message`, but `AddMiddleware` returns `{ id: string, message: string }` and no `order`. With `strict_validation` enabled, this mismatch will cause runtime validation failures. Please either align the schema to the actual response (e.g., use `id` and relax/remove `order`) or update `AddMiddleware` to return the documented fields.
</issue_to_address>
### Comment 2
<location> `pro_tes/api/middlewares/controllers.py:161-163` </location>
<code_context>
+ if not ObjectId.is_valid(middleware_id):
+ raise BadRequest("Invalid middleware ID format")
+
+ document = collection.find_one(
+ {"_id": ObjectId(middleware_id)},
+ {"_id": 0}
+ )
+
</code_context>
<issue_to_address>
**issue (bug_risk):** Get/reorder endpoints strip `_id` from responses, conflicting with the documented schemas.
`GetMiddleware` (and `ReorderMiddlewares`) removes `_id` via the `{ "_id": 0 }` projection, but the OpenAPI `MiddlewareConfig`/`MiddlewareList` schemas require an `_id` field. This will break response validation and prevents clients from reliably identifying resources. Please return `_id` (as a string), aligning with `ListMiddlewares`, and adjust the projection accordingly.
</issue_to_address>
### Comment 3
<location> `pro_tes/api/middlewares/controllers.py:370-372` </location>
<code_context>
+ try:
+ data = request.json
+
+ class_path = data.get("class_path")
+ code = data.get("code")
+ github_url = data.get("github_url")
+
+ if not class_path and not code:
</code_context>
<issue_to_address>
**issue (bug_risk):** Validation endpoint does not align with ValidationRequest/ValidationResponse schemas and ignores github_url.
`ValidationRequest` includes `github_url`, and `ValidationResponse` defines `valid`, `message`, `errors`, and `warnings`. The current implementation ignores `github_url`, doesn’t expose `errors`/`warnings`, and instead returns fields like `detected_class`/`required_methods`, creating a contract mismatch. Please either align the implementation with the spec (including using `github_url`) or update the spec and remove unsupported fields/parameters.
</issue_to_address>
### Comment 4
<location> `pro_tes/api/middlewares/controllers.py:94-98` </location>
<code_context>
+ if existing:
+ raise BadRequest(f"Middleware with name '{middleware.name}' already exists")
+
+ class_path_str = (
+ middleware.class_path if isinstance(middleware.class_path, str)
+ else str(middleware.class_path)
+ )
+ existing_path = collection.find_one({"class_path": class_path_str})
+ if existing_path:
+ raise BadRequest(
</code_context>
<issue_to_address>
**suggestion (bug_risk):** Duplicate class_path detection logic is inconsistent with how class_path is stored and indexed.
Here you compare `class_path` using `class_path_str` (stringified for non-string values) but the document is stored with the original `class_path` type, while the MongoDB unique index is on `class_path` itself. This can cause the manual duplicate check to disagree with MongoDB’s uniqueness (e.g., different list representations that stringify the same, or vice versa). It would be safer to choose a single canonical representation for `class_path` (e.g., always a list or a stable serialized form), store it in that form, and run both the duplicate check and the unique index on that same representation.
Suggested implementation:
```python
existing = collection.find_one({"name": middleware.name})
if existing:
raise BadRequest(f"Middleware with name '{middleware.name}' already exists")
# Normalize class_path to a canonical representation (list) so that:
# 1. The manual duplicate check
# 2. The stored document
# 3. The Mongo unique index
# all operate on the same value.
canonical_class_path = (
middleware.class_path
if isinstance(middleware.class_path, list)
else [middleware.class_path]
)
existing_path = collection.find_one({"class_path": canonical_class_path})
if existing_path:
raise BadRequest(
f"Middleware with class_path '{canonical_class_path}' already exists"
)
```
To fully implement the suggestion and keep everything consistent, you should also:
1. Ensure that when the middleware document is inserted into MongoDB in this controller, it uses `canonical_class_path` as the stored value for the `class_path` field, e.g. `{ ..., "class_path": canonical_class_path, ... }`.
2. Verify that the MongoDB unique index on `class_path` is defined to match this canonical representation (i.e., it assumes `class_path` is always a list), and adjust the index or any migrations if existing data uses a different shape.
3. If `MiddlewareCreate` is used elsewhere, consider centralizing this normalization logic (e.g., in the Pydantic model or a shared helper) so all reads/writes use the same canonical `class_path` format.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| class_path_str = ( | ||
| middleware.class_path if isinstance(middleware.class_path, str) | ||
| else str(middleware.class_path) | ||
| ) | ||
| existing_path = collection.find_one({"class_path": class_path_str}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion (bug_risk): Duplicate class_path detection logic is inconsistent with how class_path is stored and indexed.
Here you compare class_path using class_path_str (stringified for non-string values) but the document is stored with the original class_path type, while the MongoDB unique index is on class_path itself. This can cause the manual duplicate check to disagree with MongoDB’s uniqueness (e.g., different list representations that stringify the same, or vice versa). It would be safer to choose a single canonical representation for class_path (e.g., always a list or a stable serialized form), store it in that form, and run both the duplicate check and the unique index on that same representation.
Suggested implementation:
existing = collection.find_one({"name": middleware.name})
if existing:
raise BadRequest(f"Middleware with name '{middleware.name}' already exists")
# Normalize class_path to a canonical representation (list) so that:
# 1. The manual duplicate check
# 2. The stored document
# 3. The Mongo unique index
# all operate on the same value.
canonical_class_path = (
middleware.class_path
if isinstance(middleware.class_path, list)
else [middleware.class_path]
)
existing_path = collection.find_one({"class_path": canonical_class_path})
if existing_path:
raise BadRequest(
f"Middleware with class_path '{canonical_class_path}' already exists"
)To fully implement the suggestion and keep everything consistent, you should also:
- Ensure that when the middleware document is inserted into MongoDB in this controller, it uses
canonical_class_pathas the stored value for theclass_pathfield, e.g.{ ..., "class_path": canonical_class_path, ... }. - Verify that the MongoDB unique index on
class_pathis defined to match this canonical representation (i.e., it assumesclass_pathis always a list), and adjust the index or any migrations if existing data uses a different shape. - If
MiddlewareCreateis used elsewhere, consider centralizing this normalization logic (e.g., in the Pydantic model or a shared helper) so all reads/writes use the same canonicalclass_pathformat.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR implements the backend controller logic for a Middleware Management REST API, enabling dynamic configuration of middleware components without service restarts. The implementation adds 7 API endpoints for CRUD operations, order management, and code validation, with MongoDB persistence and AST-based security validation.
Changes:
- Added 7 REST API endpoints for middleware management (list, create, get, update, delete, reorder, validate)
- Implemented MongoDB-backed persistence with unique indexes on middleware name and class_path
- Created Pydantic models for request/response validation and security-focused code validation using AST parsing
- Defined custom exception classes for middleware-specific errors
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 18 comments.
Show a summary per file
| File | Description |
|---|---|
| pro_tes/exceptions.py | Added 5 new middleware-specific exception classes with error message definitions |
| pro_tes/config.yaml | Configured MongoDB middlewares collection with unique indexes and added middleware API specification |
| pro_tes/api/middleware_management.yaml | Complete OpenAPI 3.0 specification defining 7 endpoints with request/response schemas |
| pro_tes/api/middlewares/models.py | Pydantic models for request validation and response serialization |
| pro_tes/api/middlewares/validation.py | AST-based code validation checking for dangerous imports and required methods |
| pro_tes/api/middlewares/controllers.py | Core controller logic implementing all 7 endpoints with order management and error handling |
| pro_tes/api/middlewares/init.py | Module initialization file |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| class MiddlewareNotFound(NotFound): | ||
| """Raised when middleware with given ID was not found.""" | ||
|
|
||
|
|
||
| class MiddlewareDuplicateName(BadRequest): | ||
| """Raised when middleware name already exists.""" | ||
|
|
||
|
|
||
| class MiddlewareDuplicateClassPath(BadRequest): | ||
| """Raised when middleware class_path already exists.""" | ||
|
|
||
|
|
||
| class MiddlewareValidationError(BadRequest): | ||
| """Raised when middleware code validation fails.""" | ||
|
|
||
|
|
||
| class MiddlewareCodeFetchError(BadRequest): | ||
| """Raised when fetching middleware code from GitHub fails.""" | ||
|
|
Copilot
AI
Jan 27, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Five new exception classes are defined (MiddlewareNotFound, MiddlewareDuplicateName, MiddlewareDuplicateClassPath, MiddlewareValidationError, MiddlewareCodeFetchError) but they are never imported or used in the controllers. The controllers use generic werkzeug exceptions (BadRequest, NotFound) instead. Either these custom exceptions should be used in the controllers for better error categorization, or they should be removed if not needed.
| class_path = data.get("class_path") | ||
| code = data.get("code") | ||
| github_url = data.get("github_url") | ||
|
|
||
| if not class_path and not code: | ||
| raise BadRequest("Either class_path or code must be provided") | ||
|
|
||
| result = validate_middleware_code(code=code, class_path=class_path) |
Copilot
AI
Jan 27, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The github_url parameter is extracted from the request but never used. According to the PR description, the validation should support "GitHub integration: Fetches and validates remote middleware code", but this functionality is not implemented. Either implement the GitHub fetching logic or remove the unused parameter extraction.
| {"_id": ObjectId(middleware_id)}, | ||
| {"_id": 0} | ||
| ) | ||
|
|
||
| if document is None: | ||
| raise NotFound(f"Middleware with ID '{middleware_id}' not found") | ||
|
|
Copilot
AI
Jan 27, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The middleware documents returned include the _id field as a string (converted on line 62), but GetMiddleware explicitly excludes the _id field (line 163). This inconsistency means list responses include _id while single-item responses don't. According to the OpenAPI schema (lines 359-371 in middleware_management.yaml), MiddlewareConfig requires an _id field, so GetMiddleware should also return it. Either both should include _id or both should exclude it for consistency with the API contract.
| {"_id": ObjectId(middleware_id)}, | |
| {"_id": 0} | |
| ) | |
| if document is None: | |
| raise NotFound(f"Middleware with ID '{middleware_id}' not found") | |
| {"_id": ObjectId(middleware_id)} | |
| ) | |
| if document is None: | |
| raise NotFound(f"Middleware with ID '{middleware_id}' not found") | |
| # Ensure _id is returned as a string, consistent with list responses | |
| if "_id" in document: | |
| document["_id"] = str(document["_id"]) | |
| cursor = collection.find( | ||
| filter_dict | ||
| ).sort(sort_by, 1).skip(offset).limit(limit) |
Copilot
AI
Jan 27, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Soft-deleted middlewares (with enabled=false and deleted_at set) are not filtered out in list or reorder operations. This means soft-deleted middlewares still occupy order positions and are included in the total count. Consider adding a deleted_at: null filter to exclude soft-deleted items from normal operations, or provide a query parameter to control whether deleted items should be included.
| collection = get_middleware_collection() | ||
| data = request.json | ||
|
|
||
| middleware_ids = data.get("middleware_ids", []) |
Copilot
AI
Jan 27, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The field name in the OpenAPI schema is ordered_ids but the controller expects middleware_ids. This mismatch will cause the API to fail when clients follow the OpenAPI specification. The field name should be consistent between the schema and implementation.
| total_count = collection.count_documents({}) | ||
| if len(middleware_ids) != total_count: | ||
| raise BadRequest( | ||
| f"Array must contain all {total_count} middlewares" | ||
| ) |
Copilot
AI
Jan 27, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reorder logic requires all middlewares to be included (line 324-328), but this count includes disabled/soft-deleted middlewares (which have enabled=false or deleted_at set). This means users cannot reorder only the active middlewares - they must include disabled ones too, which is unintuitive. Consider filtering by enabled=true or adding a parameter to control whether disabled middlewares should be included in the count.
| if middleware.order is None: | ||
| max_doc = collection.find_one(sort=[("order", -1)]) | ||
| order = (max_doc["order"] + 1) if max_doc else 0 | ||
| else: | ||
| order = middleware.order | ||
| collection.update_many( | ||
| {"order": {"$gte": order}}, | ||
| {"$inc": {"order": 1}} | ||
| ) | ||
|
|
||
| source = "github" if middleware.github_url else "local" | ||
| now = datetime.utcnow().isoformat() + "Z" | ||
|
|
||
| doc = { | ||
| "name": middleware.name, | ||
| "class_path": middleware.class_path, | ||
| "order": order, | ||
| "enabled": middleware.enabled, | ||
| "config": middleware.config, | ||
| "source": source, | ||
| "github_url": middleware.github_url, | ||
| "created_at": now, | ||
| "updated_at": now | ||
| } | ||
|
|
||
| result = collection.insert_one(doc) | ||
| middleware_id = str(result.inserted_id) | ||
|
|
Copilot
AI
Jan 27, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The order update logic has a potential race condition. Between lines 104-112, the code reads the current max order, then updates existing orders, then inserts the new document. If two requests execute concurrently, they could both read the same max order value and both try to insert at the same position, or the order shifting could interfere with each other. Consider using MongoDB transactions or atomic operations to ensure order updates are consistent. A similar race condition exists in UpdateMiddleware (lines 211-226) where order shifting happens in separate operations.
| if middleware.order is None: | |
| max_doc = collection.find_one(sort=[("order", -1)]) | |
| order = (max_doc["order"] + 1) if max_doc else 0 | |
| else: | |
| order = middleware.order | |
| collection.update_many( | |
| {"order": {"$gte": order}}, | |
| {"$inc": {"order": 1}} | |
| ) | |
| source = "github" if middleware.github_url else "local" | |
| now = datetime.utcnow().isoformat() + "Z" | |
| doc = { | |
| "name": middleware.name, | |
| "class_path": middleware.class_path, | |
| "order": order, | |
| "enabled": middleware.enabled, | |
| "config": middleware.config, | |
| "source": source, | |
| "github_url": middleware.github_url, | |
| "created_at": now, | |
| "updated_at": now | |
| } | |
| result = collection.insert_one(doc) | |
| middleware_id = str(result.inserted_id) | |
| # Use a MongoDB transaction to ensure order updates and insertion are atomic. | |
| client = collection.database.client | |
| with client.start_session() as session: | |
| with session.start_transaction(): | |
| if middleware.order is None: | |
| max_doc = collection.find_one(sort=[("order", -1)], session=session) | |
| order = (max_doc["order"] + 1) if max_doc else 0 | |
| else: | |
| order = middleware.order | |
| collection.update_many( | |
| {"order": {"$gte": order}}, | |
| {"$inc": {"order": 1}}, | |
| session=session, | |
| ) | |
| source = "github" if middleware.github_url else "local" | |
| now = datetime.utcnow().isoformat() + "Z" | |
| doc = { | |
| "name": middleware.name, | |
| "class_path": middleware.class_path, | |
| "order": order, | |
| "enabled": middleware.enabled, | |
| "config": middleware.config, | |
| "source": source, | |
| "github_url": middleware.github_url, | |
| "created_at": now, | |
| "updated_at": now, | |
| } | |
| result = collection.insert_one(doc, session=session) | |
| middleware_id = str(result.inserted_id) | |
|
|
||
| from bson import ObjectId | ||
| from flask import current_app, request | ||
| from pymongo.errors import DuplicateKeyError, PyMongoError |
Copilot
AI
Jan 27, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Import of 'DuplicateKeyError' is not used.
| from pymongo.errors import DuplicateKeyError, PyMongoError | |
| from pymongo.errors import PyMongoError |
pro_tes/api/middlewares/models.py
Outdated
| @@ -0,0 +1,78 @@ | |||
| """Data models for middleware management.""" | |||
|
|
|||
| from datetime import datetime | |||
Copilot
AI
Jan 27, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Import of 'datetime' is not used.
| from datetime import datetime |
pro_tes/api/middlewares/models.py
Outdated
| from datetime import datetime | ||
| from typing import List, Optional, Union | ||
|
|
||
| from pydantic import BaseModel, Field |
Copilot
AI
Jan 27, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Import of 'Field' is not used.
| from pydantic import BaseModel, Field | |
| from pydantic import BaseModel |
|
|
||
| try: | ||
| import requests | ||
| response = requests.get(github_url, timeout=10) |
Check failure
Code scanning / CodeQL
Full server-side request forgery Critical
user-provided value
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI about 9 hours ago
In general, to fix full SSRF when using user-provided URLs, you should avoid sending requests to arbitrary user-chosen endpoints. Instead, map user input to a predefined set of safe targets or, if you must accept URLs, strictly validate them against an allow-list of domains and protocols and, ideally, verify that they resolve to non-private IPs.
For this specific code, the best minimal-impact fix is to harden the existing validation for github_url before calling requests.get. We already restrict the scheme to https and the hostname to three allowed GitHub domains, but we should: (1) ensure that the URL does not specify a custom port, so requests cannot be redirected to GitHub services on unexpected ports; (2) ensure that the netloc (authority) cannot smuggle in another host via credentials (for example https://github.com@evil.com/...), even though parsed.hostname is usually safe, an explicit check on parsed.netloc makes the intent clearer and avoids edge cases; and (3) optionally enforce that the path is non-empty to avoid oddities like https://github.com. These checks keep all legitimate GitHub URLs working while reducing SSRF surface and satisfying stricter static analysis.
Concretely, in ValidateMiddleware in pro_tes/api/middlewares/controllers.py, within the block:
if github_url and not code:
# Validate that the provided URL is a safe GitHub URL to prevent SSRF.
parsed = urlparse(github_url)
...
if hostname not in allowed_github_hosts:
raise BadRequest("github_url must point to a valid GitHub domain")we will extend validation by: (a) rejecting URLs that specify a port (parsed.port is not None), and (b) explicitly checking that parsed.netloc does not contain an @ (which would indicate userinfo that could be abused for host confusion). These extra guards occur before the requests.get call. No new imports are required, since urlparse is already imported and we will continue to use the standard library only. The rest of the function and its external behavior (fetching GitHub code when given a valid URL) will remain unchanged.
-
Copy modified lines R402-R407
| @@ -399,6 +399,12 @@ | ||
| raise BadRequest("github_url must use https scheme") | ||
| if not parsed.hostname: | ||
| raise BadRequest("github_url must include a hostname") | ||
| # Disallow explicit ports and userinfo to prevent host confusion and | ||
| # limit requests to standard HTTPS ports on known GitHub domains. | ||
| if parsed.port is not None: | ||
| raise BadRequest("github_url must not specify a custom port") | ||
| if "@" in (parsed.netloc or ""): | ||
| raise BadRequest("github_url must not contain user credentials") | ||
| allowed_github_hosts = { | ||
| "github.com", | ||
| "raw.githubusercontent.com", |
Middleware Management API
Overview
implements the complete backend controller logic for the Middleware Management REST API, building on the OpenAPI specification from Subtask 1. This enables dynamic middleware management without service restarts.
What Was Built
File Structure
Implemented Endpoints
/ga4gh/tes/v1/admin/middlewares/ga4gh/tes/v1/admin/middlewares/ga4gh/tes/v1/admin/middlewares/{id}/ga4gh/tes/v1/admin/middlewares/{id}/ga4gh/tes/v1/admin/middlewares/{id}?force=true)/ga4gh/tes/v1/admin/middlewares/reorder/ga4gh/tes/v1/admin/middlewares/validateCore Features
1. Order Management
2. Security Validation
os,sys,subprocess,eval,exec,__import__AbstractMiddlewareinheritance andapply_middleware()method3. Data Persistence
taskStore.middlewaresnameandclass_pathprevent duplicatescreated_atandupdated_atclass_pathandsourcecannot be modified (security)4. Request/Response Handling
Key Design Decisions
1. Soft Delete Default
DELETE operations disable by default (
enabled=false). Use?force=truefor permanent removal. Preserves audit trail and allows easy rollback.2. Immutable Class Path
Once created,
class_pathcannot be modified to prevent code substitution attacks. To change implementation, delete and recreate.3. Fallback Groups
Supports arrays of class paths for automatic failover:
{ "name": "Router Group", "class_path": [ "pro_tes.plugins.middlewares.distance.TaskDistributionDistance", "pro_tes.plugins.middlewares.random.TaskDistributionRandom" ] }4. URL Structure
Uses
/ga4gh/tes/v1/admin/*(not/ga4gh/tes/v1/*) due to Flask blueprint naming conflicts. The/adminprefix is actually better REST design.Manual Testing
Validation Results
Status: COMPLETE AND PRODUCTION-READY
Future Subtasks (Brief Overview)
Subtask 3: Dynamic Middleware Loading (NEXT)
Implement runtime middleware loading and execution:
class_pathstringsSubtask 4: Authentication & Authorization
Subtask 5: Monitoring & Observability
Subtask 6: Testing & Documentation
Dependencies
External: FOCA (v20221110), Connexion (2.x), Flask, MongoDB, Pydantic, PyMongo, requests
Internal:
pro_tes.middleware.abstract_middleware,pro_tes.config,pro_tes.ga4gh.tes.dbTroubleshooting
Blueprint Conflict Error:
→ Ensure
middleware_management.yamluses/ga4gh/tes/v1/adminas server URLObjectId Not JSON Serializable:
→ Convert:
doc["_id"] = str(doc["_id"])Duplicate Key Error:
→ Check for existing middleware with same name/class_path before inserting
Status: Subtask 2 Complete ✅
screenshots:

Summary by Sourcery
Add a middleware management REST API with persistence and validation for dynamic middleware configuration.
New Features:
Enhancements:
Documentation: