-
Notifications
You must be signed in to change notification settings - Fork 39
Add more source provenance attribtues #2095
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Add more source provenance attribtues #2095
Conversation
c54c50d to
db78fe2
Compare
src/buildstream/source.py
Outdated
|
|
||
| self.attribution_text: Optional[str] = attribution_text | ||
| """ | ||
| Required acknowledgements for the package |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out of all of these added attributes... this is the only one which doesn't clearly mean anything to me.
An example of this could be like "This package is based on s code from " ?
That's what I think "attribution" could mean... on the other hand: required acknowledgements probably means something like "We are aware that usage of this package will harm baby seals when deployed in proximity to a hockey game"
Maybe something a bit more specific for the docs ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The SPDX docs for this are a little wordy. I've perhaps more appropriately condensed this to "Used to communicate specific clauses from licenses that must be acknowledged".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahhh shucks, I liked the idea of software intentionally being harmful to cute baby seals when deployed near hockey games :)
In the unfortunate case that seriousness needs to be deployed in this comment, let's look at that doc:
This field provides a place for the SPDX document creator to record, at the package level, acknowledgements that might be required to be communicated in some contexts. This is not meant to include the package's actual complete license text (see PackageLicenseConcluded, PackageLicenseDeclared and PackageLicenseInfoFromFiles), and might or might not include copyright notices (see also PackageCopyrightText). The SPDX document creator might use this field to record other acknowledgements, such as particular clauses from license texts, which might be necessary or desirable to reproduce. The metadata for the package attribution text field is shown in Table 35.
This text is a load of nonsense, and looks like it describes a field that should be called "random notes", it is not at all specific to the concept of attribution. For instance in apache projects there is the NOTICE file which is reserved for acknowledging copyright from code copied in from third parties under licenses which allow relicencing under ASF, this is really attributing those files with the acknowledgement that the file was borrowed.
The SPDX definition above on the other hand is just a load of meaningless trash.
Note also it says "at the package level". Is this related to packaging ? I.e. is this completely unrelated to BuildStream source input, and more relevant to, for example, debian packages or RPMS ? Do we have traceability (maybe a git commit and issue thread in the SPDX specifications) leading to the root cause for it's inclusion ?
My inclination is to just not include this field in BuildStream, until such a time that an argument can be made for it's meaning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that an argument can be made for supporting a general field named something like "legal_notes", intentionally not being conformant to SPDX, which is "take it or leave it", if the underlying motivation for including this is that users need a place to write some "extra legal stuff".
The downside of this would be that we couldn't justifiably be able to map it to an SPDX value if we want to use this stuff to generate SBoMs in SPDX format.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SPDX has it's own idea of what a package is, from the docs:
"A package refers to any unit of content that can be associated with a distribution of software. Typically, a package is composed of one or more files. An SPDX document may, but is not required to, provide details about the individual files comprising a package".
Given, I believe, any information that would be stored in this field would also need to be in the license or copyright anyway, there is no harm in omitting this field.
We could use a different field for such information but, as you said, it can't be mapped to SPDX so I'm not sure on the overall value.
db78fe2 to
63c58e3
Compare
src/buildstream/source.py
Outdated
|
|
||
| self.supplier: Optional[str] = supplier | ||
| """ | ||
| The name of the project suppliers/owners |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we also add originator? If we're trying to follow SPDX, it would be nice to have this differentiation.
See https://spdx.github.io/spdx-spec/v2.3/package-information/#75-package-supplier-field vs https://spdx.github.io/spdx-spec/v2.3/package-information/#76-package-originator-field
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point: I would argue that we drop supplier entirely.
The originator text again makes references to packages (in context of the supplier), which we don't usually consume:
For example, the SPDX document identifies the package as glibc and the Package Supplier as Red Hat, but the Free Software Foundation is the Package Originator.
The above is biased towards redhat of course... glibc's supplier is only RedHat if you received the package as an RPM, but it could have been supplied in many other ways.
With BuildStream, we normally only care about the originator, and the BuildStream user is generally the supplier themselves, if I've understood the text correctly.
In this light, I'd agrue that we drop supplier from the initially proposed fields, and only care about the originator (yes we have some oddball plugins which support debian packages as sources... but they are unused outliers).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point: I would argue that we drop supplier entirely.
...
With BuildStream, we normally only care about the originator, and the BuildStream user is generally the supplier themselves, if I've understood the text correctly.
Agree with this for the vast majority of elements.
However, there is one case where the supplier is not the BuildStream user: when integrating prebuilt binaries. I have a project where we have a set of binaries provided by the supplier (a big an inflexible company that insist on binary distribution) that we have to integrate in to the system. For the SBOM entries corresponding to these prebuilt binaries we want the Supplier to be the supplier (and not the user because the BuildStream user has no control over the binaries they are integrating).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another thing that I feel would be useful to have is the external reference: https://spdx.github.io/spdx-spec/v2.3/package-information/#721-external-reference-field
10cfa11 to
2e6c3af
Compare
|
Ok so here we are trying to interpret SPDX's misguided naming conventions making assumptions about things being "packages"... which only makes sense for "package" based "distributions", I feel we should be preferring the other non-package related fields, as we are neither a package based system (although we can be used for such), nor are we a distribution (although we can be used for such). I have a suggestion to improve clarity here... last year post-fosdem I attended an openembedded conference and Ross Burton gave a talk about generating SBoMs with yocto/poky, I believe it was spdx stuff - if this is the case, how about we at least look at what other people decided to put in those weird SPDX fields ? |
I'm not entirely sure I'm following. As I understand it, a "package" is an SPDX term for basically any inputs you are describing in the document, whether it's tar balls, directories, etc. A "package" is just the name of a thing in SPDX-land. Therefore there is no intent of specific fields being specifically for "packages" (in the traditional software distribution sense of the word). (see https://spdx.github.io/spdx-spec/v2.3/composition-of-an-SPDX-document/#522-package-information-section for this too but obviously these docs are what we're questioning the understanding of to begin with) Pending further discussion/alignment on the above, I believe the fields we need to support from purely a build tool perspective would be:
Note: "source" being the specific item being fetched, whether it be a traditional "package", source code, tar ball. |
Yeah, in SPDX there are only "packages" and "files". In buildstream-sbom, we're representing both elements and sources as "packages", though some sources (e.g. |
From some looking around. I haven't found much on real world uses, yocto generates this data itself so there's no written examples in the yocto world, but there have been plenty of talks on the subject which describe the included fields; all of which align with what we're aiming for. As seen in the likes of
We can see that
Is what we should be targetting, all of which are basically self explanatory fields. There are also the official exmaples use case from the spec we can reference where needed https://github.com/spdx/spdx-spec/blob/support/2.3.1/examples/SPDXYAMLExample-2.3.spdx.yaml |
Expand the selection of source provennce attributes according to apache#2069 This adds: - concluded-license - copyright-text - declared-license - description - name - originator - supplier
2e6c3af to
1fbae37
Compare
|
From a brief chat with @juergbi, this can and should be done in a more generic way; skipping the hardcoded definitions of specific SPDX attributes. The buildstream core does not need to care about these attributes so instead this should be implemented such that:
This removes the blocker of us only supporting the attributes we think users would need as well as maintaining them ourselves in both cases. Thinking a few steps ahead on this too though, using only the |
As per #2069, this PR adds in seven new attributes for users to provide as source info.
The new attributes being: