feat: URL resolving support for validate & metaschema#605
feat: URL resolving support for validate & metaschema#605robbat2 wants to merge 1 commit intosourcemeta:mainfrom
Conversation
050993e to
eca24c0
Compare
|
Interesting heisenbug failure - the URL is fine, but it had a 404 when the test ran? |
|
Interesting feature! Thanks for sending it. I'll book some time today to properly review it 🙏🏻 |
jviotti
left a comment
There was a problem hiding this comment.
I love the idea. I think the way Codex/you are implementing it is fighting the current machinery for iterating on schema arguments. I think we need for make for_each_json transparently work with both URLs or file paths. The src/input.h file abstracts a way a lot of the logic, including line number gathering, reading, etc so that the actual commands never have to worry about it.
for_each_json will loop and pass specific positional arguments to handle_json_entry. It feels to me that such handle_json_entry needs to take either a path or a URI (maybe just a string?) and then internally check: if it is a path and it exists and it is a file, do X, if it is a directory, do Y, if it is an HTTP/S URL, do Z.
I think if we do all of that there, we very elegantly gain URL support in every applicable command.
|
On the transient CI failure, yeah, we've been getting that quite a bit recently. I'm on it right now! |
|
If you merge main back in, the transient CI issue should be gone! |
6bf416d to
3d93e71
Compare
Closes: sourcemeta#234 Co-authored-by: Codex <codex@openai.com> Generated-with: OpenAI Codex CLI (partial) Signed-off-by: Robin H. Johnson <rjohnson@coreweave.com>
3d93e71 to
7ced116
Compare
|
couple more CI bumps from the code but ready again; nothing transient this time |
jviotti
left a comment
There was a problem hiding this comment.
I left some comments. I also wonder if there is a way to split up this PR so it's easy to merge the core logic first and avoid git conflicts, but not sure what that would be 🤔
| -> const std::filesystem::path & { | ||
| if (!this->path.has_value()) { | ||
| std::ostringstream error; | ||
| error << "The `" << command << "` command does not support HTTP inputs"; |
There was a problem hiding this comment.
Note we never dynamically construct strings for exceptions. Instead, use one of the existing error classes that would be handled in src/error.h or create a new exception class there
| } | ||
|
|
||
| [[nodiscard]] auto base_uri() const -> std::string { | ||
| if (sourcemeta::jsonschema::is_http_url(this->first)) { |
There was a problem hiding this comment.
| if (sourcemeta::jsonschema::is_http_url(this->first)) { | |
| if (is_http_url(this->first)) { |
You are already in the sourcemeta::jsonschema:: namespace, so you can omit it in every case.
| return this->path.value(); | ||
| } | ||
|
|
||
| [[nodiscard]] auto base_uri() const -> std::string { |
There was a problem hiding this comment.
This method is called base_uri but it doesn't return a base URI, but the whole file URI given the path?
|
|
||
| struct InputJSON { | ||
| std::filesystem::path first; | ||
| std::string first; |
There was a problem hiding this comment.
If the thing here is that the first might be either a path or a URI, maybe a std::variant expresses this better
| const auto path_without_query_or_fragment = | ||
| sourcemeta::jsonschema::uri_path_without_query_or_fragment( | ||
| entry_identifier); | ||
| if (path_without_query_or_fragment.ends_with(".jsonl")) { |
There was a problem hiding this comment.
Beyond just the extension, I think we need to look at the content type mime header?
| { "name": "foo", "kind": "example" } | ||
| EOF | ||
|
|
||
| "$1" validate "http://127.0.0.1:$PORT/schema.json" "$TMP/instance.json" --http \ |
There was a problem hiding this comment.
I'm now wondering if a command invocation to an HTTP URL should assume --http. I think it makes sense?
| "$1" validate "http://127.0.0.1:$PORT/schema.json?ignored=1" "$TMP/instance.json" \ | ||
| --http 2>"$TMP/stderr.txt" | ||
|
|
||
| cat << EOF > "$TMP/expected.txt" |
There was a problem hiding this comment.
Can we run all of these that present no output with --verbose and actually capture BOTH stdout and stderr? i.e. if for whatever reason this command printed non-sense to stdout, we would never know
| SCHEMA_PATH="$(cd "$TMP" && pwd)/schema.json" | ||
| SCHEMA_URI="file://$SCHEMA_PATH" | ||
|
|
||
| "$1" lint "$SCHEMA_URI" --json >"$TMP/output.json" 2>&1 |
There was a problem hiding this comment.
Maybe add a _json suffix to the test file as this is only covering the JSON part?
There was a problem hiding this comment.
I get this test, but I'm wondering now if it is a bit confusing. I think if any input is an HTTP URL, we should assume --http, as that's clearly the user's intent?
| EOF | ||
|
|
||
| diff "$TMP/output.txt" "$TMP/expected.txt" | ||
|
|
There was a problem hiding this comment.
Looks like various test files have this empty line here?
Closes: #234
Co-authored-by: Codex codex@openai.com
Generated-with: OpenAI Codex CLI (partial)
Signed-off-by: Robin H. Johnson rjohnson@coreweave.com