feat: ai contribution policy by gadomski · Pull Request #4 · stac-utils/stac-utils.github.io

gadomski · 2026-03-23T13:21:10Z

Proposed, needs to be discussed at the next STAC PSC. Adapted directly from GDAL's: https://gdal.org/en/stable/community/ai_tool_policy.html

If/when this PR is merged, we'll need to add a pull request template with the "AI checkbox" to https://github.com/stac-utils/.github and notify maintainers that they can update their repos that have a customized template.

jsignell

I like this a lot ❤️

jsignell · 2026-03-23T17:21:36Z

docs/ai-contribution-policy.md

+- Code, usually in the form of a pull request
+- RFCs or design proposals
+- Issue or security vulnerability reporting
+- Comments and feedback on pull requests


I would support prohibiting the use of AI in comments.

Docstrings? Not sure how they should be viewed?

Documentation is mentioned in the section below:

As another example, using an LLM to generate documentation, which a contributor manually reviews for correctness and relevance, edits, and then posts as a PR, is an approved use of tools under this policy.

jonhealy1 · 2026-03-24T09:09:11Z

docs/ai-contribution-policy.md

+Contributors **must be transparent and label contributions that contain
+substantial amounts of tool-generated content**, and always mention it. The pull
+request and issue templates contain a checkbox for that purpose. Failure to do
+so, or lies when asked by a reviewer, will be considered as a violation. Our
+policy on labeling is intended to facilitate reviews, and not to track which
+parts of **stac-utils** repos are generated. Contributors should note tool usage
+in their pull request description, commit message, or wherever authorship is
+normally indicated for the work. For instance, use a commit message trailer like
+Assisted-by: \<name of code assistant\>. This transparency helps the community
+develop best practices and understand the role of these new tools.


There could be better guidance here. We should have a specific approach to citing ai use. The advice here is a little vague:

"Contributors should note tool usage in their pull request description, commit message, or wherever authorship is normally indicated for the work. For instance, use a commit message trailer like
Assisted-by: <name of code assistant>."

Also, this mentions that, regarding ai use, "The pull request and issue templates contain a checkbox ". Will we enforce that across stac-utils? It's not a bad idea, but if we aren't doing it, we should remove the text referencing the approach.

This feels pretty specific to me:

"Contributors should note tool usage in their pull request description, commit message, or wherever authorship is normally indicated for the work. For instance, use a commit message trailer like Assisted-by: ."

@jonhealy1 Can you provide an example of language that you feel would be less vague?

"The pull request and issue templates contain a checkbox ". Will we enforce that across stac-utils? It's not a bad idea, but if we aren't doing it, we should remove the text referencing the approach.

We can create an org-wide pull request template that will be the default for repositories that don't have one. Repositories that do have a template will need to update it to include the checkbox ... I can update the repositories I maintain if/when this PR is merged, and I can post in the CNG Slack channel and Github Discussions to notify other maintainers of the change. Thanks for calling this out, I've updated the PR description to include the additional task.

How about something like:

Checklist

I have tested these changes locally.

I have included / updated tests for these changes.

Edge Cases: I have manually verified "unhappy paths" and edge cases beyond the basic success criteria.

I can explain the implementation logic for every line of code submitted.

AI Disclosure

AI-Assisted: Some or all of this contribution was generated or refined using AI tools. I have reviewed, tested, and take full responsibility for the quality and correctness of the output.

For me a lot of the danger with ai relates to testing. An ai can cleverly write a dozen tests for you and make sure that they all pass, but they may only write tests that they know will pass. Someone thinks - there's a dozen new tests here, this must cover everything! Having a human making sure that everything important - ie all of the edge cases - is being tested is key.

The checklist looks pretty good...could you open a PR on https://github.com/stac-utils/.github to add it? The only bullet I'd add would be one about documentation. And can you use the language from the GDAL PR template for the tool usage disclosure?

## AI tool usage - [ ] AI (Copilot or something similar) supported my development of this PR. See our [policy about AI tool use](https://stac-utils.github.io/ai-contribution-policy). Use of AI tools *must* be indicated.

stac-utils/.github#13

I would honestly prefer something like this - do we really need to know if someone used ai or not? I think as long as they understand the "human-in-the-loop" concept and have read through our ai policy.

[ ] AI Disclosure: I have read the AI Contribution Policy and confirm a "human-in-the-loop" approach. I have reviewed all tool-generated content and take full accountability for this contribution.

Maybe we can have 2 versions available - I would definitely want to use the second version.

Maybe we can have 2 versions available - I would definitely want to use the second version.

I think if you're maintaining a package, it's up to you how you'd like to accept contributions. My intent behind creating this stac-utils policy was to provide a useful default, not proscribe how all stac-utils projects have to be maintained.

Good to know - I guess I was thinking more along the lines of having 2 defaults. We definitely would like to point everyone towards at least being aware of the ai policy/ recommendations.

jonhealy1 · 2026-03-24T11:41:28Z

docs/ai-contribution-policy.md

+
+If a maintainer judges that a contribution doesn't comply with this policy,
+they should paste the following response to request changes:
+


Guidance maybe on how maintainers can judge whether ai-generated code has violated these policies.

There's some in the sections above, e.g.:

it is strongly recommended that contributors write PR descriptions themselves (if needed, using tools for translation or copy-editing), in particular to avoid over-verbose descriptions that LLMs are prone to generate

And

Contributors must be transparent and label contributions that contain substantial amounts of tool-generated content, and always mention it.

In my experience, LLM-generated code tends to have patterns that can be detected by a reviewer, and per these guidelines failing to mention that substantial amounts of tool-generated content without mention that it was tool-generated would be a violation.

If the provided language is not enough, @jonhealy1 could you maybe provide examples of the language you'd like to see?

You are a very experienced programmer - I would be interested to know more about the patterns you see, as guidance for helping improve the projects I work on.

Some common ones that I've observed:

Re-implementing instead of importing, aka copying code from Github rather than adding a new dependency to a project

Little "explainer" comments inside functions

Over-refactoring into small (1-4 line) functions that are only used once or twice

Calling functions in a unconventional way. This happens in geospatial a lot, where there's a "common" way of doing something (e.g. using core libraries like rasterio or shapely a certain way) but the code does it some other way. This isn't always bad, but sometimes can lead to strange breakages (there's some nomenclature in geospatial software that I've found LLMs don't understand correctly, e.g. stuff like buffer)

To be fair, I have written code I could have imported, long before I started using ai, because I didn't understand that another library I wasn't familiar with already provided the same functionality. Explainer comments and over-refactoring are easy things to tell an ai not to do - add in a few other watch out fors, and then maybe we have no idea.

Anyways we should publish a document of things to watch out for/ things we don't like to see.

add in a few other watch out fors, and then maybe we have no idea.

At a certain point, we're relying on the good faith of contributors — open source is a collaborative environment where we're working together to build cool stuff. The intent of this contribution policy (to me) is to help people contribute in a positive way, not to discourage contributions.

The biggest problems I've seen with LLM-generated code w.r.t. open source libraries have been:

Trying to do too much at once. Small, focused PRs are so much easier to review

The contributor not understanding the code they're proposing

Expecting a quick review — even if you generated the code quickly, it can take time to review (especially on an un-funded project)

Small, focused PRs are also a sign of experience - I always get carried away on prs - oh I'll just fix this while I'm at it. Expecting a quick review can be similar - someone may think they've done something really valuable, but then there pr can sit there forever even. An experienced programmer can be more patient, they can fork the project if they need to or install a branch in their codebase - an inexperienced programmer may not know that there are other options.

I don't know if you use ai, but I think that no matter how good ai gets, it's only going to work as good as the developer using it.

jonhealy1 · 2026-03-24T16:52:34Z

How about a shorter, more concise version that's easier for people to read:

stac-utils AI & LLM Contribution Guidelines

Our Stance

The stac-utils community values high-quality, maintainable code. We recognize that AI and Large Language Models (LLMs) are part of the modern developer’s toolkit. We do not ban these tools; however, we require that every contribution remains human-centric.

The "Human in the Loop" principle means that you, the contributor, are the sole author. You are responsible for every line of code, every docstring, and every claim made in your Pull Request.

Core Expectations

1. Ownership & Accountability

If you use an AI tool to generate code or text:

Deep Understanding: You must be able to explain the logic, architecture, and trade-offs of the code during review.
Manual Verification: It is your responsibility to verify that the output follows project idioms and security best practices.
Direct Interaction: Use maintainer feedback as a learning opportunity. Do not "proxy" feedback by simply asking an AI to fix it—you must implement and verify the fix yourself.

2. Rigorous Testing & Edge Cases

AI tools excel at "happy path" logic but often fail at complexity. We require a high standard for testing:

Edge Case Intentionality: You must demonstrate that you have thought through and tested edge cases, "unhappy paths," and boundary conditions (e.g., empty inputs, network timeouts, malformed STAC items).
Independent Test Logic: Do not allow an AI to generate both the implementation and the tests simultaneously. You must independently verify that the tests are actually challenging the code, not just confirming a hallucination.

3. Uniform Disclosure

To keep our review process transparent without being intrusive, we use a standard checkbox in our Pull Request templates.

How to disclose: Simply check the AI-Assisted box in your PR description. You are not required to name the specific tool or provider used.

Evaluating Contributions

Maintainers evaluate work based on impact and quality. A contribution may be flagged or declined if:

The contributor cannot explain the implementation logic.
The tests only cover the "happy path" and ignore obvious edge cases.
The PR adds more "review debt" (time spent fixing AI errors) than it provides value to the project.

The Golden Rule: A contribution should be worth more to the project than the time it takes to review it.

Copyright & Licensing

You are responsible for ensuring your contribution adheres to our project's license. By submitting a PR, you affirm that you have the right to contribute this work and that it does not violate third-party copyrights.

gadomski · 2026-03-24T17:12:32Z

How about a shorter, more concise version that's easier for people to read

Personally I'd prefer to stay as aligned as possible to GDAL — they've got a lot of smart, experienced folks over there and I trust their thinking. I'd be interested in others' opinions on this one.

jonhealy1 · 2026-03-24T17:17:51Z

How about a shorter, more concise version that's easier for people to read

Personally I'd prefer to stay as aligned as possible to GDAL — they've got a lot of smart, experienced folks over there and I trust their thinking. I'd be interested in others' opinions on this one.

I don't mean to present this as some sort of a final version. I think a simplified and more concise version could be beneficial. We could keep the spirit of what they are saying but present it in a more understandable way.

jsignell · 2026-03-24T17:24:39Z

It makes sense to me to keep this close to GDAL rather than going fully down the rabbit hole of considering all the possible options. If you go that route it can be really hard to get something passable merged and I do feel that there is some urgency here.

I guess my preference would be to start with something close to GDAL like Pete suggested and then if we run into a specific part that is chafing we can update the policy once we have more experience of what it feels like in practice.

jonhealy1 · 2026-03-24T17:30:29Z

Pete said this was open for discussion, but maybe it's not. The overarching theme of the GDAL document is to ask contributors to avoid wasting a maintainers valuable time. I'm not saying my time is so valuable that I will regret adding my thoughts here for no reason, but it is worth thinking about.

gadomski · 2026-03-24T17:33:46Z

Pete said this was open for discussion, but maybe it's not.

This is a discussion. @jonhealy1 you presented alternative wording for this PR, @jsignell had a different opinion on the wording. Presenting your opinion does not mean that your suggestions will be incorporated; we will work together as a community to form a consensus.

jonhealy1 · 2026-03-24T17:36:42Z

@gadomski I think you were the one who dismissed my idea towards keeping things short and simple. If you are going to say, "let's just use the GDAL document", then you are not really having much of a discussion.

jonhealy1 · 2026-03-24T17:38:24Z

I would ask, how is the shortened version not aligning with GDAL and then how do we improve it so it is.

jsignell · 2026-03-24T17:42:44Z

My perspective is coming from watching issues like this sit open on other projects (specifically on xarray) so I am super thankful for a concrete proposal. I didn't find the document as Pete suggested hard to read or understand (and I have a ridiculously short attention span) so for me I don't really see the need for any changes other than the one I suggested.

gadomski · 2026-03-24T17:43:27Z

I would ask, how is the shortened version not aligning with GDAL and then how do we improve it so it is.

My personal opinion is that your proposed shortened version doesn't improve on the wording enough to be worth diverging from the GDAL upstream. I'm thinking both about readability and maintainability — as GDAL's guidance evolves, I'd like to take up any of their changes. If we diverge, it's harder to uptake those changes.

jonhealy1 · 2026-03-25T03:03:07Z

An argument could be made for having 2 versions.

We would like contributors to review our policies surrounding AI, but asking them to wade through the GDAL version is maybe a little unreasonable. If something is too wordy, people usually don't take the time to read it.

The shorter version could reference the full version for people who want more context.

Also, we shouldn't give GDAL credit for the document as it seems to have come from here:

https://github.com/llvm/llvm-project/blob/main/llvm/docs/AIToolPolicy.md

feat: ai contribution policy

038324e

gadomski self-assigned this Mar 23, 2026

gadomski mentioned this pull request Mar 23, 2026

Create AI contibution guidelines stac-utils/pystac#1661

Open

jsignell reviewed Mar 23, 2026

View reviewed changes

jonhealy1 reviewed Mar 24, 2026

View reviewed changes

jonhealy1 mentioned this pull request Mar 25, 2026

New pull request template option stac-utils/.github#13

Open


		If a maintainer judges that a contribution doesn't comply with this policy,
		they should paste the following response to request changes:

Conversation

gadomski commented Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jsignell left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

This comment was marked as off-topic.

Uh oh!

jonhealy1 Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jonhealy1 Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Checklist

AI Disclosure

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jonhealy1 Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jonhealy1 commented Mar 24, 2026

stac-utils AI & LLM Contribution Guidelines

Our Stance

Core Expectations

1. Ownership & Accountability

2. Rigorous Testing & Edge Cases

3. Uniform Disclosure

Evaluating Contributions

Copyright & Licensing

Uh oh!

gadomski commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jonhealy1 commented Mar 24, 2026

Uh oh!

jsignell commented Mar 24, 2026

Uh oh!

jonhealy1 commented Mar 24, 2026

Uh oh!

gadomski commented Mar 24, 2026

Uh oh!

gadomski commented Mar 23, 2026 •

edited

Loading

jonhealy1 Mar 24, 2026 •

edited

Loading

jonhealy1 Mar 24, 2026 •

edited

Loading

jonhealy1 Mar 24, 2026 •

edited

Loading

gadomski commented Mar 24, 2026 •

edited

Loading