I have a script that extracts parameters from a DLL such as the Author and Product name and I have identified a case where the attributes are encoded twice within the PackageURL. I then use these attributes to create a PURL that can be used as a decoded UTF-8 string.
Here are some values that can be used to reproduce the issue. Using file attributes from DotNetNuke.DLL as an example.
The 'dll' contains the following methods:
def get_product_name():
product_name = "https://dnncommunity.org" # This is the unencoded value
return urllib.parse.quote(product_name, safe='') # This is the encoded value "https%%3A%2F%2Fdnncommunity.org"
def get_author():
author = ".NET Foundation" # This is the unencoded value
return urllib.parse.quote(author, safe='') # This is the encoded value ".NET%20Foundation"
I need to combine the content in a forward slash ('/' separated format so that Nexus can understand it.
e.g. /<product_name>
purlattrs = f'{dll.get_author()}%2F{dll.get_product_name()}'
print(purlattrs) # output = '.NET%20Foundation%2Fhttps%3A%2F%2Fdnncommunity.org'
# This is the correctly encoded URL safe string
_qualifiers = {'Attr1':purlattrs), 'Attr2':'Foo'}
purl = PackageURL(type='generic', name="DotNetNuke.dll", version="9.11.0.46", qualifiers=_qualifiers)
print(purl)
The purl that is printed is
"pkg:generic/DotNetNuke.dll@9.11.0.46?Attr1=.NET%2520Foundation%252Fhttps%253A%252F%252Fdnncommunity.org&Attr2=Foo"
- As you can see, the Space characters are encoded now as %2520
- The Forward Slash is now %252F instead of %2F
- The colon is now %253A instead of %3A.
- The % Character is being encoded to %25.
If I pass in the raw string value to the PackageURL like below:
purlattrs = f".Net Foundation/https://dnncommunity.org"
print(purlattrs) # output = ".Net Foundation/https://dnncommunity.org"
I get the following output from print() when I pass in the raw string value.
"pkg:generic/DotNetNuke.dll@9.11.0.46?Attr1=.NET%20Foundation/https://dnncommunity.org&Attr2=Foo"
-
In this scenario the Encoding works for the Space, but does not work for the Slashes or Colon.
-
Recommend changing the behavior of the PURL encoding to urllib.parse.quote and url.parse.unquote ,or eliminating the encoding portion and having the PackageURL user perform the encoding/decoding.
I have a script that extracts parameters from a DLL such as the Author and Product name and I have identified a case where the attributes are encoded twice within the PackageURL. I then use these attributes to create a PURL that can be used as a decoded UTF-8 string.
Here are some values that can be used to reproduce the issue. Using file attributes from DotNetNuke.DLL as an example.
The 'dll' contains the following methods:
I need to combine the content in a forward slash ('/' separated format so that Nexus can understand it.
e.g. /<product_name>
The purl that is printed is
"pkg:generic/DotNetNuke.dll@9.11.0.46?Attr1=.NET%2520Foundation%252Fhttps%253A%252F%252Fdnncommunity.org&Attr2=Foo"
If I pass in the raw string value to the PackageURL like below:
I get the following output from print() when I pass in the raw string value.
"pkg:generic/DotNetNuke.dll@9.11.0.46?Attr1=.NET%20Foundation/https://dnncommunity.org&Attr2=Foo"
In this scenario the Encoding works for the Space, but does not work for the Slashes or Colon.
Recommend changing the behavior of the PURL encoding to urllib.parse.quote and url.parse.unquote ,or eliminating the encoding portion and having the PackageURL user perform the encoding/decoding.