Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
45 changes: 38 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,11 @@
This repo provides utilities for managing copyright headers and license files
across many repos at scale.

You can use it to add or validate copyright headers on source code files, add a
LICENSE file to a repo, report on what licenses repos are using, and more.
Features:
- Add or validate copyright headers on source code files
- Add and/or manage LICENSE files with git-aware copyright year detection
- Report on licenses used across multiple repositories
- Automate compliance checks in CI/CD pipelines

## Getting Started

Expand Down Expand Up @@ -33,7 +36,7 @@ Usage:
copywrite [command]

Common Commands:
headers Adds missing copyright headers to all source code files
headers Adds missing copyright headers and updates existing headers' year information.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change should probably be applied to the builtin help page too:

Short: "Adds missing copyright headers to all source code files",

init Generates a .copywrite.hcl config for a new project
license Validates that a LICENSE file is present and remediates any issues if found

Expand Down Expand Up @@ -62,8 +65,18 @@ scan all files in your repo and copyright headers to any that are missing:
copywrite headers --spdx "MPL-2.0"
```

You may omit the `--spdx` flag if you add a `.copywrite.hcl` config, as outlined
[here](#config-structure).
The `copywrite license` command validates and manages LICENSE files with git-aware copyright years:

```sh
copywrite license --spdx "MPL-2.0"
```

**Copyright Year Behavior:**
- **Start Year**: Auto-detected from config file and if not found defaults to repository's first commit
- **End Year**: Set to current year when an update is triggered (git history only determines if update is needed)
- **Update Trigger**: Git detects if source code file was modified since the copyright end year

You may omit the `--spdx` flag if you add a `.copywrite.hcl` config, as outlined [here](#config-structure).

### `--plan` Flag

Expand All @@ -72,6 +85,24 @@ performs a dry-run and will outline what changes would be made. This flag also
returns a non-zero exit code if any changes are needed. As such, it can be used
to validate if a repo is in compliance or not.

## Technical Details

### Copyright Year Logic

**Source File Headers:**
- End year: Set to current year when file's source code is modified
- Git history determines if update is needed (compares file's last commit year to copyright end year)
- When triggered, end year updates to current year
- Ignores copyright header updates made to a file as it is not source code change.

**LICENSE Files:**
- End year: Set to current year when any project file is modified
- Git history determines if update is needed (compares repo's last commit year to copyright end year)
- When triggered, end year updates to current year
- Preserves historical accuracy for archived projects (no forced updates)

**Key Distinction:** Git history is used as a trigger to determine *whether* an update is needed, but the actual end year value is always set to the current year when an update occurs.

## Config Structure

> :bulb: You can automatically generate a new `.copywrite.hcl` config with the
Expand Down Expand Up @@ -99,8 +130,8 @@ project {

# (OPTIONAL) Represents the year that the project initially began
# This is used as the starting year in copyright statements
# If set and different from current year, headers will show: "copyright_year, current_year"
# If set and same as current year, headers will show: "current_year"
# If set and different from current year, headers will show: "copyright_year, year-2"
# If set and same as year-2, headers will show: "copyright_year"
# If not set (0), the tool will auto-detect from git history (first commit year)
# If auto-detection fails, it will fallback to current year only
Comment on lines -102 to 136
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is now (still) incorrect. I don't know if we want/need either of the two years to be configurable during updating (I think it would be nice for consistence). Currently they are always inferred during updating - i.e. there is no way of overriding them via configuration.

For new additions, the end year is not configurable either.

const tmplSPDX = `Copyright{{ if .Holder }} {{.Holder}}{{ end }}{{ if .Year }} {{.Year}}{{ end }}
{{ if .SPDXID }}SPDX-License-Identifier: {{.SPDXID}}{{ end }}`
const tmplCopyrightOnly = `Copyright{{ if .Holder }} {{.Holder}}{{ end }}{{ if .Year }} {{.Year}}{{ end }}`

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed privately, year-1 can still be overridden via config file.
Year-2 always defaults to current Year if any change to source code is detected in the file in current year and therefore no override option has been provided.

# Default: 0 (auto-detect)
Expand Down
6 changes: 3 additions & 3 deletions addlicense/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -280,7 +280,7 @@ func walk(ch chan<- *file, start string, logger *log.Logger) error {
if fi.IsDir() {
return nil
}
if fileMatches(path, ignorePatterns) {
if FileMatches(path, ignorePatterns) {
// The [DEBUG] level is inferred by go-hclog as a debug statement
logger.Printf("[DEBUG] skipping: %s", path)
return nil
Expand All @@ -290,9 +290,9 @@ func walk(ch chan<- *file, start string, logger *log.Logger) error {
})
}

// fileMatches determines if path matches one of the provided file patterns.
// FileMatches determines if path matches one of the provided file patterns.
// Patterns are assumed to be valid.
func fileMatches(path string, patterns []string) bool {
func FileMatches(path string, patterns []string) bool {
for _, p := range patterns {

if runtime.GOOS == "windows" {
Expand Down
2 changes: 1 addition & 1 deletion addlicense/main_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -471,7 +471,7 @@ func TestFileMatches(t *testing.T) {

for _, tt := range tests {
patterns := []string{tt.pattern}
if got := fileMatches(tt.path, patterns); got != tt.wantMatch {
if got := FileMatches(tt.path, patterns); got != tt.wantMatch {
t.Errorf("fileMatches(%q, %q) returned %v, want %v", tt.path, patterns, got, tt.wantMatch)
}
}
Expand Down
178 changes: 173 additions & 5 deletions cmd/headers.go
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,14 @@ package cmd
import (
"fmt"
"os"
"path/filepath"
"runtime"
"strings"
"sync"
"sync/atomic"

"github.com/hashicorp/copywrite/addlicense"
"github.com/hashicorp/copywrite/licensecheck"
"github.com/hashicorp/go-hclog"
"github.com/jedib0t/go-pretty/v6/text"
"github.com/samber/lo"
Expand All @@ -21,9 +27,13 @@ var (

var headersCmd = &cobra.Command{
Use: "headers",
Short: "Adds missing copyright headers to all source code files",
Short: "Adds missing copyright headers and updates existing headers' year information in all source code files",
Long: `Recursively checks for all files in the given directory and subdirectories,
adding copyright statements and license headers to any that are missing them.
adding copyright statements and license headers to any that are missing them and
updating the year information in existing headers based on git history.

By default, the command will modify files in place. To perform a dry-run without
modifying any files, use the --plan flag.

Autogenerated files and common file types that don't support headers (e.g., prose)
will automatically be exempted. Any other files or folders should be added to the
Expand Down Expand Up @@ -87,10 +97,23 @@ config, see the "copywrite init" command.`,
".github/workflows/**",
".github/dependabot.yml",
"**/node_modules/**",
".copywrite.hcl",
}
ignoredPatterns := lo.Union(conf.Project.HeaderIgnore, autoSkippedPatterns)

// Construct the configuration addLicense needs to properly format headers
// STEP 1: Update existing copyright headers
gha.StartGroup("Updating existing copyright headers:")
updatedCount, anyFileUpdated, licensePath := updateExistingHeaders(cmd, ignoredPatterns, plan)
gha.EndGroup()
if updatedCount > 0 {
if plan {
cmd.Printf("\n%s\n\n", text.FgYellow.Sprintf("[DRY RUN] Would update %d file(s) with new copyright years", updatedCount))
} else {
cmd.Printf("\n%s\n\n", text.FgGreen.Sprintf("Successfully updated %d file(s) with new copyright years", updatedCount))
}
}

// STEP 2: Construct the configuration addLicense needs to properly format headers
licenseData := addlicense.LicenseData{
Year: conf.FormatCopyrightYears(), // Format year(s) for copyright statements
Holder: conf.Project.CopyrightHolder,
Expand All @@ -112,10 +135,33 @@ config, see the "copywrite init" command.`,
// cobra.CheckErr on the return, which will indeed output to stderr and
// return a non-zero error code.

gha.StartGroup("The following files are missing headers:")
err := addlicense.Run(ignoredPatterns, "only", licenseData, "", verbose, plan, []string{"."}, stdcliLogger)
// STEP 3: Add missing headers
gha.StartGroup("Adding missing copyright headers:")
var err error
// In dry-run mode, if updateExistingHeaders found files that would be
// updated (year bumps), treat that as an error so the command exits
// non-zero to indicate work would be performed.
if plan && updatedCount > 0 {
err = fmt.Errorf("[DRY RUN] %d file(s) would be updated with new copyright years", updatedCount)
}
runErr := addlicense.Run(ignoredPatterns, "only", licenseData, "", verbose, plan, []string{"."}, stdcliLogger)
if err != nil && runErr != nil {
err = fmt.Errorf("%v; %v", err, runErr)
} else if err == nil {
err = runErr
}
gha.EndGroup()

// STEP 4: Update LICENSE file if any files were modified (either updated or added headers)
// In plan mode: if addlicense found missing headers (returns error), assume files would be modified
// In normal mode: if addlicense succeeded, assume files were modified
if runErr != nil || (!plan && runErr == nil) {
anyFileUpdated = true
}

updateLicenseFile(cmd, licensePath, anyFileUpdated, plan)

// Check for errors after LICENSE file update so we still show what would happen
cobra.CheckErr(err)
},
}
Expand All @@ -131,3 +177,125 @@ func init() {
headersCmd.Flags().StringP("spdx", "s", "", "SPDX-compliant license identifier (e.g., 'MPL-2.0')")
headersCmd.Flags().StringP("copyright-holder", "c", "", "Copyright holder (default \"IBM Corp.\")")
}

// updateExistingHeaders walks through files and updates copyright headers based on config and git history
// Returns the count of updated files, a boolean indicating if any file was updated, and the LICENSE file path (if found)
func updateExistingHeaders(cmd *cobra.Command, ignoredPatterns []string, dryRun bool) (int, bool, string) {
targetHolder := conf.Project.CopyrightHolder
if targetHolder == "" {
targetHolder = "IBM Corp."
}

configYear := conf.Project.CopyrightYear
updatedCount := 0
anyFileUpdated := false
var licensePath string

// Producer/consumer: walk files (producer) and process them with a bounded
// worker pool (consumers). This preserves existing semantics while
// bounding concurrency and allowing the walk to run ahead of processors.
ch := make(chan string, 1000)

var wg sync.WaitGroup
var updatedCount64 int64
var anyFileUpdatedFlag int32
var mu sync.Mutex

workers := runtime.NumCPU() * 4
if workers < 2 {
workers = 2
}

// Start worker pool
wg.Add(workers)
for i := 0; i < workers; i++ {
go func() {
defer wg.Done()
for path := range ch {
// capture base and skip LICENSE files here as well
base := filepath.Base(path)
if strings.EqualFold(base, "LICENSE") || strings.EqualFold(base, "LICENSE.TXT") || strings.EqualFold(base, "LICENSE.MD") {
mu.Lock()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was a lock needed here ? Wouldn't all the workers update licensePath to the same value

if licensePath == "" {
licensePath = path
}
mu.Unlock()
continue
}

if !dryRun {
updated, err := licensecheck.UpdateCopyrightHeader(path, targetHolder, configYear, false)
if err == nil && updated {
cmd.Printf(" %s\n", path)
atomic.AddInt64(&updatedCount64, 1)
atomic.StoreInt32(&anyFileUpdatedFlag, 1)
}
} else {
needsUpdate, err := licensecheck.NeedsUpdate(path, targetHolder, configYear, false)
if err == nil && needsUpdate {
cmd.Printf(" %s\n", path)
atomic.AddInt64(&updatedCount64, 1)
atomic.StoreInt32(&anyFileUpdatedFlag, 1)
}
}
}
}()
}

// Producer: walk the tree and push files onto the channel
go func() {
_ = filepath.Walk(".", func(path string, info os.FileInfo, err error) error {
if err != nil || info.IsDir() {
return nil
}

// Check if file should be ignored
if addlicense.FileMatches(path, ignoredPatterns) {
return nil
}

// Non-ignored file -> enqueue for processing. If channel is full,
// this will block until a worker consumes entries, which is fine.
ch <- path
return nil
})
close(ch)
}()

// wait for workers to finish
wg.Wait()

// finalize counts
updatedCount = int(atomic.LoadInt64(&updatedCount64))
anyFileUpdated = atomic.LoadInt32(&anyFileUpdatedFlag) != 0

return updatedCount, anyFileUpdated, licensePath
}

// updateLicenseFile updates the LICENSE file with current year if any files were modified
func updateLicenseFile(cmd *cobra.Command, licensePath string, anyFileUpdated bool, dryRun bool) {
// If no LICENSE file was found during the walk, nothing to do
if licensePath == "" {
return
}

targetHolder := conf.Project.CopyrightHolder
if targetHolder == "" {
targetHolder = "IBM Corp."
}

configYear := conf.Project.CopyrightYear

// Update LICENSE file, forcing current year if any file was updated
if !dryRun {
updated, err := licensecheck.UpdateCopyrightHeader(licensePath, targetHolder, configYear, anyFileUpdated)
if err == nil && updated {
cmd.Printf("\nUpdated LICENSE file: %s\n", licensePath)
}
} else {
needsUpdate, err := licensecheck.NeedsUpdate(licensePath, targetHolder, configYear, anyFileUpdated)
if err == nil && needsUpdate {
cmd.Printf("\n[DRY RUN] Would update LICENSE file: %s\n", licensePath)
}
}
Comment on lines +289 to +300
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed this privately already but just to remember, this will likely need updating to recognise our BUSL LICENSE files, e.g. https://github.com/hashicorp/terraform/blob/main/LICENSE

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has been added to our backlog. We will initiate the conversation with legal and make necessary modifications to accommodate BUSL License updates for copyrights.

C.C. - @CreatorHead @mallikabandaru

}
41 changes: 39 additions & 2 deletions cmd/license.go
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,8 @@ import (
"errors"
"fmt"
"path/filepath"
"strconv"
"time"

"github.com/hashicorp/copywrite/github"
"github.com/hashicorp/copywrite/licensecheck"
Expand Down Expand Up @@ -63,10 +65,14 @@ var licenseCmd = &cobra.Command{
Run: func(cmd *cobra.Command, args []string) {

cmd.Printf("Licensing under the following terms: %s\n", conf.Project.License)
cmd.Printf("Using copyright years: %v\n", conf.FormatCopyrightYears())

// Determine appropriate copyright years for LICENSE file
licenseYears := determineLicenseCopyrightYears(dirPath)

cmd.Printf("Using copyright years: %v\n", licenseYears)
cmd.Printf("Using copyright holder: %v\n\n", conf.Project.CopyrightHolder)

copyright := "Copyright " + conf.FormatCopyrightYears() + " " + conf.Project.CopyrightHolder
copyright := "Copyright " + conf.Project.CopyrightHolder + " " + licenseYears

licenseFiles, err := licensecheck.FindLicenseFiles(dirPath)
if err != nil {
Expand Down Expand Up @@ -175,3 +181,34 @@ func init() {
licenseCmd.Flags().StringP("spdx", "s", "", "SPDX License Identifier indicating what the LICENSE file should represent")
licenseCmd.Flags().StringP("copyright-holder", "c", "", "Copyright holder (default \"IBM Corp.\")")
}

// determineLicenseCopyrightYears determines the appropriate copyright year range for LICENSE file
// Uses git history to get the start year (first commit) and end year (last commit)
func determineLicenseCopyrightYears(dirPath string) string {
currentYear := time.Now().Year()
startYear := conf.Project.CopyrightYear

// If no start year configured, try to auto-detect from git
if startYear == 0 {
if detectedYear, err := licensecheck.GetRepoFirstCommitYear(dirPath); err == nil && detectedYear > 0 {
startYear = detectedYear
} else {
// Fallback to current year
return strconv.Itoa(currentYear)
}
}

// Determine end year from repository's last commit year
endYear := currentYear // Default fallback
if lastRepoCommitYear, err := licensecheck.GetRepoLastCommitYear(dirPath); err == nil && lastRepoCommitYear > 0 && lastRepoCommitYear <= currentYear {
endYear = lastRepoCommitYear
}

// If start year equals end year, return single year
if startYear == endYear {
return strconv.Itoa(endYear)
}

// Return year range: "startYear, endYear"
return fmt.Sprintf("%d, %d", startYear, endYear)
}
Loading
Loading