Skip to content

Conversation

Copy link

Copilot AI commented Oct 20, 2025

Problem

When registering DataID versions with content variants containing special characters (such as URLs with colons, slashes, or hash symbols), the system generated invalid IRIs that caused JSON-LD deserialization errors:

{
  "isSuccess": false,
  "message": "{\"message\":\"[line: 0, col: 0 ] Wrong IRI 'https://databus.dev.dbpedia.link/jj-test/ex-group/exotic-filenames/2025-10-16#exotic-filenames_test=http://url.value/foo#bar'. com.apicatalog.jsonld.deseralization.JsonLdToRdf build\"}"
}

The root cause was that content variant values were being directly concatenated into IRIs without proper URL encoding, resulting in IRIs containing unescaped special characters that violate RFC 3986 URI syntax rules.

Solution

This PR adds proper URL encoding using encodeURIComponent() to all user-provided values before they are used in IRI construction. The fix is applied in the autofillFileIdentifiers function in dataid-autocomplete.js.

Example Transformation

Before (invalid):

https://databus.dev.dbpedia.link/.../2025-10-16#exotic-filenames_test=http://url.value/foo#bar
                                                                        ^^^^^^^^^^^^^^^^^^^^^^^^^
                                                                        Contains unescaped : / #

After (valid):

https://databus.dev.dbpedia.link/.../2025-10-16#exotic-filenames_test=http%3A%2F%2Furl.value%2Ffoo%23bar
                                                                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                                                                        Special characters properly encoded

Changes

  • Modified server/app/api/lib/dataid-autocomplete.js: Added encodeURIComponent() to encode:

    • Artifact names
    • Content variant facets and values
    • Format extensions
    • Compression values
  • Added server/app/tests/test.autocomplete.js: Comprehensive test suite with three test cases:

    1. Verifies encoding of special characters (:, /, #) in content variant values
    2. Verifies basic IRI generation with format extensions
    3. Verifies handling of multiple custom content variants

Testing

All new tests pass successfully:

cd server
NODE_PATH=../public/node_modules:node_modules npx uvu app/tests test.autocomplete.js

Impact

  • Minimal changes: Only 4 lines modified in the core logic
  • Backward compatible: Normal values without special characters continue to work unchanged
  • Standards compliant: Generated IRIs now conform to RFC 3986
  • Well-tested: New test suite ensures correctness and prevents regressions

Fixes the issue where posting DataIDs with content variants containing URLs or other special characters would fail with IRI validation errors.

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

  • mapbox-node-binary.s3.amazonaws.com
    • Triggering command: node-pre-gyp (dns block)
  • scarf.sh
    • Triggering command: node ./report.js (dns block)

If you need me to access, download, or install something from one of these locations, you can either:

Original prompt

This section details on the original issue you should resolve

<issue_title>invalid databus file id / IRI and distribution resource IRI generation (probably caused by autocompletion) due to missing url-escaping of content variant values in IRI generation</issue_title>
<issue_description>i get following error message

{
  "isSuccess": false,
  "message": "{\"message\":\"[line: 0, col: 0 ] Wrong IRI 'https://databus.dev.dbpedia.link/jj-test/ex-group/exotic-filenames/2025-10-16#exotic-filenames_test=http://url.value/foo#bar'. com.apicatalog.jsonld.deseralization.JsonLdToRdf build\"}"
}

when posting below dataid version

curl -X POST https://databus.dev.dbpedia.link/api/register \
  -H "X-API-KEY: !!!missing!!!" \
  -H "Content-Type: application/json" \
  -d '{
  "@context": "https://databus.dev.dbpedia.link/res/context.jsonld",
  "@graph": [
    {
      "@id": "https://databus.dev.dbpedia.link/jj-test/ex-group/exotic-filenames/2025-10-16",
      "@type": "Version",
      "title": "initial-exotic-content-vars-and-values-test",
      "abstract": "initial test",
      "description": "try out different content variants",
      "license": "http://foo.lic",
      "distribution": [
        {
          "@type": "Part",
          "formatExtension": "none",
          "compression": "none",
          "downloadURL": "https://httpbin.org/anything?test=url-value",
          "dcv:test": "http://url.value/foo#bar"
        }
      ]
    }
  ]
}'

so when the IRIs are generated it should be taken care that appropriate encoding/escaping is applied on the values of the content variants (leaving the original value in this example "dcv:test": "http://url.value/foo#bar" of course untouched) before they will be appended / inserted into the IRIs.</issue_description>

<agent_instructions>analyze where the mentioned IRIs are generated or their templates are instantiaded that make use of the content variants (more specific) there values (you can start looking what happnes after dataid json-ld has been submitted to "register" api POST call)
make sure you apply encoding to content-var that escapes characters that are not allowed in that part of the URI/IRI (fragment part vs. path part)
please also add tests to the existing test system to verify your changes and that you did not break anything.</agent_instructions>

Comments on the Issue (you are @copilot in this section)

Fixes #226


💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

@coderabbitai
Copy link

coderabbitai bot commented Oct 20, 2025

Important

Review skipped

Bot user detected.

To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.


Comment @coderabbitai help to get the list of available commands and usage tips.

Co-authored-by: JJ-Author <2630603+JJ-Author@users.noreply.github.com>
Copilot AI changed the title [WIP] Fix invalid IRI generation for databus file Fix invalid IRI generation for content variants with special characters Oct 20, 2025
Copilot AI requested a review from JJ-Author October 20, 2025 11:45
@JJ-Author JJ-Author requested review from holycrab13 and removed request for JJ-Author October 20, 2025 11:52
@JJ-Author JJ-Author removed their assignment Oct 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants