Open‑source projects thrive on collaboration, but managing license compliance across thousands of contributors can be daunting. In 2026, the industry is pushing for tighter governance, and integrating SPDX into continuous integration and delivery pipelines has become the gold standard. By automating license checks with GitHub Actions, you can detect violations before a pull request (PR) lands, protect your codebase, and maintain trust with users and partners.
Why SPDX Matters for Modern OSS Projects
SPDX (Software Package Data Exchange) is a standardized format for communicating license information. It provides:
- A machine‑readable representation of licenses, ensuring consistency across tools.
- A registry of SPDX identifiers that reduces ambiguity (e.g.,
Apache-2.0instead of “Apache License, Version 2”). - Cross‑platform compatibility, from npm and Maven to Docker images.
When you embed SPDX data in your source tree, any CI/CD runner can automatically validate the licenses of added or modified files. This eliminates the risk of inadvertently merging code that violates your project’s policy or corporate compliance requirements.
Setting Up Your GitHub Repository for SPDX
1. Add SPDX License Files
At the root of your repository, create an spdx directory (or a top‑level LICENSES folder). Place one SPDX document per license you use. Each file should follow the SPDX format and end with .spdx. For example:
SPDXVersion: SPDX-2.2
DataLicense: CC0-1.0
DocumentName: example-project-license
DocumentNamespace: http://spdx.org/spdxdocs/example-project-1.0
Creator: Tool: SPDX-License-Identifier
Creator: Person: Jane Doe <jane@example.com>
Creator: Organization: Open Source Inc
Created: 2026-04-01T12:00:00Z
LicenseListVersion: 8.7
Use tools like SPDX tools to generate or validate these files.
2. Annotate Source Files with SPDX Identifiers
Include SPDX headers in every source file you add. For a Python file, the header looks like:
# SPDX-License-Identifier: Apache-2.0
# SPDX-FileCopyrightText: 2026 Open Source Inc
GitHub’s github/codeql-action can parse these headers and cross‑reference the SPDX documents you stored.
Creating the GitHub Action Workflow
Below is a step‑by‑step guide to create a workflow that runs SPDX scans on every PR. The workflow ensures that no file with an incompatible license slips through.
1. Define the Workflow File
Create a YAML file at .github/workflows/spdx-check.yml:
name: SPDX License Compliance
on:
pull_request:
branches: [main, develop]
push:
branches: [main, develop]
jobs:
license-check:
runs-on: ubuntu-latest
steps:
- name: Checkout Repository
uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Set up Python 3.12
uses: actions/setup-python@v5
with:
python-version: '3.12'
- name: Install SPDX tools
run: |
pip install spdx-tools
- name: Generate SPDX Document
run: |
python -m spdx.generate_document \
--source-path . \
--output-file spdx_report.json
- name: Validate SPDX Document
run: |
python -m spdx.validate \
--file spdx_report.json
- name: Fail on License Violations
if: failure()
run: echo "License violations detected. Please review the SPDX report."
- name: Upload SPDX Report
uses: actions/upload-artifact@v4
with:
name: spdx-report
path: spdx_report.json
This workflow does the following:
- Checks out the PR’s code.
- Generates a SPDX document that lists each file’s license.
- Runs a validation step that checks each license against your project’s policy.
- Fails the job if any violations are found, causing the PR to block until corrected.
- Uploads a report for audit purposes.
2. Define Your License Policy
Create a license-policy.yaml file at the repo root. List allowed licenses and any exclusions. Example:
allowed_licenses:
- Apache-2.0
- MIT
- BSD-3-Clause
disallowed_licenses:
- GPL-3.0-only
- AGPL-3.0-only
Modify the validation script to reference this policy. The script can be a simple Python file that reads the SPDX JSON, cross‑checks each license, and raises an error if a violation is detected.
3. Enhance the Validation Script
Below is a concise validator you can store as spdx_validator.py:
import json
import sys
def load_policy():
with open('license-policy.yaml') as f:
return yaml.safe_load(f)
def validate(report, policy):
violations = []
for entry in report['files']:
license_id = entry['licenseId']
if license_id in policy['disallowed_licenses']:
violations.append(f"Disallowed license {license_id} in {entry['fileName']}")
elif license_id not in policy['allowed_licenses']:
violations.append(f"Unrecognized license {license_id} in {entry['fileName']}")
return violations
if __name__ == "__main__":
with open('spdx_report.json') as f:
report = json.load(f)
policy = load_policy()
violations = validate(report, policy)
if violations:
print("\n".join(violations))
sys.exit(1)
else:
print("No license violations detected.")
Don’t forget to install pyyaml as a dependency in your workflow.
Integrating with Other CI/CD Steps
While SPDX checks provide license compliance, you can chain them with other quality gates:
- Static Analysis: Run
codeqlorSonarCloudafter SPDX validation to catch security issues. - Unit Tests: Ensure new code doesn’t break existing tests.
- Dependency Scanning: Combine SPDX license checks with
Dependabotto monitor open‑source dependencies.
Embedding SPDX at the very beginning of your pipeline guarantees that subsequent steps only operate on compliant code, reducing downstream risk.
Handling Common Edge Cases
1. Mixed Licenses in a Single File
Occasionally, a file may contain multiple license headers (e.g., a proprietary wrapper around an open‑source module). In this scenario, annotate the file with SPDX-License-Identifier: MIT OR Apache-2.0 and ensure your policy allows the union. The validator should treat such expressions as compliant if any part matches the policy.
2. Binary Assets and Third‑Party Libraries
GitHub Actions can also scan binary files and packaged dependencies. Use the spdx-lookup tool to pull license information from package.json, go.mod, or pom.xml before generating the SPDX report.
3. Handling License Changes in Existing Code
If a contributor changes the license of an existing file, the SPDX header should reflect the new license. The validator will flag this as a violation if the new license is disallowed. Prompt the contributor to provide justification, which can be documented in the PR description.
Auditing and Transparency
Maintainers often need to prove compliance to regulators or corporate governance bodies. By uploading the SPDX report as an artifact, you create an immutable audit trail. Store the artifact in a long‑term storage bucket (e.g., Amazon S3 or GitHub’s actions/artifact API) and tag it with the PR number and commit hash.
Performance Optimizations
- Cache SPDX Dependencies: Use
actions/cacheto store thespdx-toolsand policy files across runs. - Selective Scanning: Limit the generator to only changed files by comparing
$GITHUB_BASE_REFand$GITHUB_HEAD_REF. - Parallel Jobs: Split the validation into separate jobs for linting, testing, and license checks, allowing them to run concurrently.
Extending Beyond GitHub Actions
For projects hosted on GitLab or Bitbucket, similar pipelines exist. The key is to translate the workflow steps into the platform’s native syntax (e.g., .gitlab-ci.yml). Many SPDX tools are language‑agnostic, so porting is straightforward.
Conclusion
By integrating SPDX license checks into your CI/CD pipeline with GitHub Actions, you transform a manual compliance hurdle into an automated safety net. This approach not only safeguards your project against legal risk but also streamlines contributor onboarding and fosters a culture of transparency. In the fast‑evolving open‑source landscape of 2026, such automation isn’t just best practice—it’s essential for sustainable, trustworthy software development.
