Skip to content

Commit bef71b3

Browse files
committed
Updated quickstart with line highlights and annotations.
Signed-off-by: Rahul Krishna <i.m.ralk@gmail.com>
1 parent 4dcb641 commit bef71b3

File tree

2 files changed

+204
-1
lines changed

2 files changed

+204
-1
lines changed

docs/quickstart.md

Lines changed: 201 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,3 +3,204 @@ icon: cldk/flame-20
33
hide:
44
- toc
55
---
6+
# :cldk-flame-20: Quickstart
7+
8+
Build code analysis pipelines with LLMs in minutes.
9+
10+
In this quickstart guide, we will use the [Apache Commons CLI](https://commons.apache.org/proper/commons-cli/) example
11+
codebase to demonstrate code analysis pipeline creation using CLDK, with both local LLM inference and automated code processing.
12+
13+
??? note "Installing CLDK and Ollama"
14+
15+
This quickstart guide requires CLDK and Ollama. Follow these instructions to set up your environment:
16+
17+
First, install CLDK and Ollama Python SDK:
18+
19+
=== "`pip`"
20+
21+
```shell
22+
pip install cldk ollama
23+
```
24+
25+
=== "`poetry`"
26+
```shell
27+
poetry add cldk ollama
28+
```
29+
30+
=== "`uv`"
31+
```shell
32+
poetry add cldk ollama
33+
```
34+
35+
Then, install Ollama:
36+
37+
=== "Linux/WSL"
38+
39+
Run the following command:
40+
41+
```shell
42+
curl -fsSL https://ollama.com/install.sh | sh
43+
```
44+
45+
=== "macOS"
46+
47+
Run the following command:
48+
49+
```shell
50+
curl -fsSL https://ollama.com/install.sh | sh
51+
```
52+
53+
Or, download the installer from [here](https://ollama.com/download/Ollama-darwin.zip).
54+
55+
56+
## Step 1: Set Up Ollama Server
57+
58+
Model inference with CLDK starts with a local LLM server. We'll use Ollama to host and run the models.
59+
60+
=== "Linux/WSL"
61+
* Check if the Ollama server is running:
62+
```shell
63+
sudo systemctl status ollama
64+
```
65+
66+
* If not running, start it:
67+
```shell
68+
sudo systemctl start ollama
69+
```
70+
71+
=== "macOS"
72+
On macOS, Ollama runs automatically after installation. You can verify it's running by opening Activity Monitor and searching for "ollama".
73+
74+
## Step 2: Pull the code LLM.
75+
76+
* Let's use the Granite 8b-instruct model for this tutorial:
77+
```shell
78+
ollama pull granite-code:8b-instruct
79+
```
80+
81+
* Verify the installation:
82+
```shell
83+
ollama run granite-code:8b-instruct 'Write a function to print hello world in python'
84+
```
85+
86+
## Step 3: Download Sample Codebase
87+
88+
We'll use Apache Commons CLI as our example Java project:
89+
90+
```shell
91+
wget https://github.com/apache/commons-cli/archive/refs/tags/rel/commons-cli-1.7.0.zip -O commons-cli-1.7.0.zip && unzip commons-cli-1.7.0.zip
92+
```
93+
94+
Let's set the project path for future reference:
95+
```shell
96+
export JAVA_APP_PATH=/path/to/commons-cli-1.7.0
97+
```
98+
99+
??? note "About the Sample Project"
100+
Apache Commons CLI provides an API for parsing command line options. It's a well-structured Java project that demonstrates various object-oriented patterns, making it ideal for code analysis experiments.
101+
102+
## Step 3: Create Analysis Pipeline
103+
104+
??? tip "What should I expect?"
105+
In about 40 lines of code, we will use CLDK to build a code summarization pipeline that leverages LLMs to generate summaries for a real world Java project! Without CLDK, this would require multiple tools and a much more complex setup.
106+
107+
Let's build a pipeline that analyzes Java methods using LLMs. Create a new file `code_summarization.py`:
108+
109+
```python title="code_summarization.py" linenums="1" hl_lines="7 10 12-17 21-22 24-25 34-37"
110+
import ollama
111+
from cldk import CLDK
112+
from pathlib import Path
113+
import os
114+
115+
# Create CLDK object, specify language as Java.
116+
cldk = CLDK(language="java") # (1)!
117+
118+
# Create analysis object
119+
analysis = cldk.analysis(project_path=os.getenv("JAVA_APP_PATH")) # (2)!
120+
121+
# Iterate over files
122+
for file_path, class_file in analysis.get_symbol_table().items():
123+
# Iterate over classes
124+
for type_name, type_declaration in class_file.type_declarations.items():
125+
# Iterate over methods
126+
for method in type_declaration.callable_declarations.values(): # (3)!
127+
# Get code body
128+
code_body = Path(file_path).absolute().resolve().read_text()
129+
130+
# Initialize treesitter
131+
tree_sitter_utils = cldk.tree_sitter_utils(source_code=code_body) # (4)!
132+
133+
# Sanitize class
134+
sanitized_class = tree_sitter_utils.sanitize_focal_class(method.declaration) # (5)!
135+
136+
# Format instruction
137+
instruction = (
138+
f"Question: Can you write a brief summary for the method "
139+
f"`{method.declaration}` in the class `{type_name}` below?\n\n"
140+
f"```java\n{sanitized_class}```\n"
141+
)
142+
143+
# Prompt Ollama
144+
summary = ollama.generate(
145+
model="granite-code:8b-instruct", # (6)!
146+
prompt=instruction).get("response") # (7)!
147+
148+
# Print output
149+
print(f"\nMethod: {method.declaration}")
150+
print(f"Summary: {summary}")
151+
```
152+
153+
154+
1. Create a new instance of the CLDK class
155+
2. Create an `analysis` instance for the Java project. This object will be used to obtain all the analysis artifacts from the java project.
156+
3. In a nested loop, we can quickly iterate over the methods in the project and extract the code body.
157+
4. CLDK comes with a number of treesitter based utilities that can be used to extract and manipulate code snippets.
158+
5. We use the `sanitize_focal_class()` method to extract the focal class for the method and sanitize any unwanted code in just one line of code.
159+
6. Try your favorite model for code summarization. We use the `granite-code:8b-instruct` model in this example.
160+
7. We prompt Ollama with the sanitized class and method declaration to generate a summary for the method.
161+
---
162+
163+
### Running `code_summarization.py`
164+
165+
Save the file as `code_summarization.py` and run it:
166+
```shell
167+
python code_summarization.py
168+
```
169+
170+
You'll see output like:
171+
```
172+
Method: parse
173+
Summary: This method parses command line arguments using the specified Options object...
174+
175+
Method: validateOption
176+
Summary: Validates if an option meets the required criteria including checking...
177+
178+
...
179+
```
180+
181+
## Step 5: Customize Analysis
182+
183+
The pipeline can be customized in several ways:
184+
185+
=== "Change Model"
186+
Try different Granite model sizes:
187+
```python
188+
summary = ollama.generate(
189+
model="granite-code:34b-instruct", # Larger model!
190+
prompt=instruction).get("response")
191+
```
192+
193+
=== "Modify Prompt"
194+
Adjust the task to generate a unit test:
195+
```python
196+
def format_inst(code, focal_method, focal_class):
197+
return (f"Generate a complete unit test case using junit4 for the method `{focal_method}`...\n\n"
198+
f"```java\n{code}```\n")
199+
```
200+
201+
202+
## Next Steps
203+
204+
- Explore different analysis tasks like code repair, translation, test generation, and more...
205+
- Create richer prompts with more analysis artifacts that CLDK provides.
206+
- Implement batch processing for larger projects, or use the CLDK SDK to build custom analysis pipelines.

mkdocs.yml

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ copyright: Copyright &copy; 2024-2025 IBM
77

88
nav:
99
- Home:
10-
- Introduction: index.md
10+
- Home: index.md
1111
- Quick Start: quickstart.md
1212
- Installation: installing.md
1313
- Core Concepts: core-concepts/index.md
@@ -40,6 +40,8 @@ theme:
4040
features:
4141
- announce.dismiss
4242
- content.code.copy
43+
- content.code.prettify
44+
- content.code.annotate
4345
- content.tabs.link
4446
- navigation.indexes
4547
- navigation.footer

0 commit comments

Comments
 (0)