Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 21 additions & 0 deletions ai/gen-ai-agents/code-quality-agent/LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
MIT License

Copyright (c) 2025 Luigi Saetta

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
177 changes: 177 additions & 0 deletions ai/gen-ai-agents/code-quality-agent/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,177 @@
# Code Quality Agent

A lightweight **LangGraph-based** agent that scans a local codebase (read-only) to:

- ✅ **Check file headers** against a simple template policy
- ✅ **Scan for secrets** (heuristic patterns + suspicious assignments)
- ✅ **Check for license**
- ✅ **Check for dependencies licenses**
- ✅ **Generate header fixes**
- ✅ **Generate per-file documentation** (optional) in Markdown via an LLM (OCI GenAI via LangChain)

It produces artifacts in a separate output folder (no in-place edits).


---

## Features

### Header policy checks
For each discovered source file, the agent validates that a header block contains:

- `File name:`
- `Author:`
- `Date last modified:`
- `Python Version:`
- `Description:`
- `License:`

It also performs a **date alignment check** (header date vs. file `mtime` in UTC) when the file path is available.

### Secrets scanning (heuristic)
The agent searches each file for:
- known patterns (AWS keys, GitHub tokens, OCI OCIDs, private key blocks, bearer headers, etc.)
- suspicious string assignments / dict values with sensitive names (password, token, secret, api_key, …)

Findings are reported with:
- kind
- line number
- a redacted excerpt

### License check
Check that an approved LICENSE file is provided.

### Header fix generation
For each of the files where the header check fails, provide the snippet suggested to use.
- modifiy the Author field
- check the rest.

### Per-file doc generation (LLM)
For each Python file, the agent can generate Markdown documentation with sections such as:
- overview
- public API
- behaviors/edge cases
- side effects
- usage examples
- risks/TODOs

### Report generation
A final summary report is also generated, in Markdown.

### Languages supported
For now, tests have been done using:
- Python

---

## Repository layout

```text
.
├── agent/
│ ├── graph_agent.py # LangGraph pipeline (discover → check → scan → docgen → report)
│ ├── fs_ro.py # Read-only sandboxed filesystem access
│ ├── header_rules.py # Header policy checker
│ ├── secrets_scan.py # Heuristic secrets scanner
│ ├── docgen.py # Per-file documentation generation
│ ├── docgen_prompt.py # Prompts for doc generation + final report
│ ├── docgen_utils.py # LLM invocation + output normalization
│ ├── oci_models.py # OCI GenAI / OCI OpenAI LangChain adapters
│ └── utils.py # Logging helpers, etc.
├── out/ # Default output folder (generated artifacts)
├── run_agent.py # CLI entry point
├── run_agent.sh # Convenience runner
├── requirements.txt
└── LICENSE
```

## Setup
1. Create a python 3.11+ environment

For example,
```
conda create -n code_quality_agent python==3.11
```

activate the environment. If you're using conda:
```
conda activate code_quality_agent
```

2. Install the following python libraries:
```
pip install oci -U
pip install langchain -U
pip install langchain-oci -U
pip install langgraph -U
```

3. Clone this repository
```
git clone https://github.com/luigisaetta/code_quality_agent.git
```

4. Create a config_private.py file, in the agent directory.

Start from the template provided in the repository and create a **config_private.py** file.
Put in the file your compartment's OCID.


5. Have your local OCI config setup

Setup under $HOME/.oci
See: https://docs.oracle.com/en-us/iaas/Content/API/Concepts/sdkconfig.htm

6. Set policies to use Generative AI

See: https://docs.oracle.com/en-us/iaas/Content/generative-ai/iam-policies.htm

Ask your tenancy admin for help.

## How-to use it
Modify the [run_agent.sh](./run_agent.sh) file.

Change the params:
- root (root directory for all the files to be scanned)
- out: with the full path to the output dir

run
```
run_agent.sh
```

## Dependency License Checks – Execution Requirements

This agent checks license compliance for direct Python dependencies listed in `requirements.txt`.

### Recommended (deterministic & fast)
Run the agent in an environment where:
- All dependencies from `requirements.txt`, from the project to-be-scanned, are installed
- Agent runtime dependencies (see Setup above) are installed

This allows the agent to read license data from installed package metadata:
- Offline execution
- Faster and reproducible results
**Recommended for CI and release validation.**

### Fallback (best-effort)
If some dependencies are not installed:
- Network access is required (the agent will do a PyPI JSON lookup)
- Execution may be slower
- License data may be incomplete or ambiguous

## Important Note on Results and Human Review

This agent is intended to **assist** with code quality, security, and license compliance checks, **not to replace entirely human judgment**.

While the agent applies deterministic rules and best-effort analysis, it may produce:
- **False positives** (e.g. ambiguous licenses, heuristic PII detection, conservative policy checks)
- **Incomplete results** depending on the execution environment (installed dependencies, network access, metadata quality)

For this reason:
- **All findings must be reviewed and validated by a human**
- The agent’s output should be treated as an **input to review**, not a final decision
- Final responsibility for compliance, security, and legal interpretation always remains with the user

This is especially important for compliance-critical areas such as **licenses, personal data (PII), and security findings**.

100 changes: 100 additions & 0 deletions ai/gen-ai-agents/code-quality-agent/agent/config.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,100 @@
"""
File name: config.py
Author: Luigi Saetta
Date last modified: 2025-07-02
Python Version: 3.11

Description:
This module provides general configurations


Usage:
Import this module into other scripts to use its functions.
Example:
import config

License:
This code is released under the MIT License.

Notes:
This is a part of a demo showing how to implement a code quality agent.

Warnings:
This module is in development, may change in future versions.
"""

DEBUG = False
STREAMING = False

# OCI general

# type of OCI auth
AUTH = "API_KEY"
REGION = "eu-frankfurt-1"
SERVICE_ENDPOINT = f"https://inference.generativeai.{REGION}.oci.oraclecloud.com"

# LLM
# this is the default model
LLM_MODEL_ID = "openai.gpt-oss-120b"

TEMPERATURE = 0.0
TOP_P = 1
MAX_TOKENS = 4000

#
# specific configs for the Code Quality Agent
#
# for now, only Python files
FILES_PATTERN = "*.py"

# ---- File exclusions (repo-relative glob patterns) ----
EXCLUDED_PATHS = [
".git/**",
".venv/**",
"venv/**",
"__pycache__/**",
"*.pyc",
"build/**",
"dist/**",
"node_modules/**",
]

# Accepted license identifiers (you decide the vocabulary)
ACCEPTED_LICENSE_TYPES = [
"MIT",
"Apache-2.0",
"UPL-1.0",
"BSD-3-Clause",
"BSD-2-Clause",
]

# set this flag to True if you want to create local docs in md format.
# Not needed to check code quality.
ENABLE_DOC_GENERATION = False

# used for header generation.
# It is the minimum version accepted.
PYTHON_VERSION = "3.11"

# Licenses you allow for dependencies (use SPDX-ish IDs where possible)
# see docs here:
# https://confluence.oraclecorp.com/confluence/display/CORPARCH/Licenses+Eligible+for+Pre-Approval+-+Distribution
ACCEPTED_DEP_LICENSES = {
"MIT",
"Apache-2.0",
"BSD-3-Clause",
"BSD-2-Clause",
"BSD",
"ISC",
# Mozilla Public License
"MPL-2.0",
# Python Software Foundation License
"PSF-2.0",
"UPL-1.0",
# Eclipse Public License
"EPL-2.0",
}

# Policy knobs
FAIL_ON_UNKNOWN_DEP_LICENSE = False # usually False at first
FAIL_ON_NOT_INSTALLED_DEP = False
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
"""
Private config
"""

COMPARTMENT_ID = "YOUR_COMPARTMENT_OCID"
Loading