@@ -3,3 +3,204 @@ icon: cldk/flame-20
33hide :
44 - toc
55---
6+ # :cldk-flame-20: Quickstart
7+
8+ Build code analysis pipelines with LLMs in minutes.
9+
10+ In this quickstart guide, we will use the [ Apache Commons CLI] ( https://commons.apache.org/proper/commons-cli/ ) example
11+ codebase to demonstrate code analysis pipeline creation using CLDK, with both local LLM inference and automated code processing.
12+
13+ ??? note "Installing CLDK and Ollama"
14+
15+ This quickstart guide requires CLDK and Ollama. Follow these instructions to set up your environment:
16+
17+ First, install CLDK and Ollama Python SDK:
18+
19+ === "`pip`"
20+
21+ ```shell
22+ pip install cldk ollama
23+ ```
24+
25+ === "`poetry`"
26+ ```shell
27+ poetry add cldk ollama
28+ ```
29+
30+ === "`uv`"
31+ ```shell
32+ poetry add cldk ollama
33+ ```
34+
35+ Then, install Ollama:
36+
37+ === "Linux/WSL"
38+
39+ Run the following command:
40+
41+ ```shell
42+ curl -fsSL https://ollama.com/install.sh | sh
43+ ```
44+
45+ === "macOS"
46+
47+ Run the following command:
48+
49+ ```shell
50+ curl -fsSL https://ollama.com/install.sh | sh
51+ ```
52+
53+ Or, download the installer from [here](https://ollama.com/download/Ollama-darwin.zip).
54+
55+
56+ ## Step 1: Set Up Ollama Server
57+
58+ Model inference with CLDK starts with a local LLM server. We'll use Ollama to host and run the models.
59+
60+ === "Linux/WSL"
61+ * Check if the Ollama server is running:
62+ ```shell
63+ sudo systemctl status ollama
64+ ```
65+
66+ * If not running, start it:
67+ ```shell
68+ sudo systemctl start ollama
69+ ```
70+
71+ === "macOS"
72+ On macOS, Ollama runs automatically after installation. You can verify it's running by opening Activity Monitor and searching for "ollama".
73+
74+ ## Step 2: Pull the code LLM.
75+
76+ * Let's use the Granite 8b-instruct model for this tutorial:
77+ ``` shell
78+ ollama pull granite-code:8b-instruct
79+ ```
80+
81+ * Verify the installation:
82+ ` ` ` shell
83+ ollama run granite-code:8b-instruct ' Write a function to print hello world in python'
84+ ` ` `
85+
86+ # # Step 3: Download Sample Codebase
87+
88+ We' ll use Apache Commons CLI as our example Java project:
89+
90+ ```shell
91+ wget https://github.com/apache/commons-cli/archive/refs/tags/rel/commons-cli-1.7.0.zip -O commons-cli-1.7.0.zip && unzip commons-cli-1.7.0.zip
92+ ```
93+
94+ Let' s set the project path for future reference:
95+ ` ` ` shell
96+ export JAVA_APP_PATH=/path/to/commons-cli-1.7.0
97+ ` ` `
98+
99+ ??? note " About the Sample Project"
100+ Apache Commons CLI provides an API for parsing command line options. It' s a well-structured Java project that demonstrates various object-oriented patterns, making it ideal for code analysis experiments.
101+
102+ ## Step 3: Create Analysis Pipeline
103+
104+ ??? tip "What should I expect?"
105+ In about 40 lines of code, we will use CLDK to build a code summarization pipeline that leverages LLMs to generate summaries for a real world Java project! Without CLDK, this would require multiple tools and a much more complex setup.
106+
107+ Let' s build a pipeline that analyzes Java methods using LLMs. Create a new file ` code_summarization.py` :
108+
109+ ` ` ` python title=" code_summarization.py" linenums=" 1" hl_lines=" 7 10 12-17 21-22 24-25 34-37"
110+ import ollama
111+ from cldk import CLDK
112+ from pathlib import Path
113+ import os
114+
115+ # Create CLDK object, specify language as Java.
116+ cldk = CLDK(language=" java" ) # (1)!
117+
118+ # Create analysis object
119+ analysis = cldk.analysis(project_path=os.getenv("JAVA_APP_PATH")) # (2)!
120+
121+ # Iterate over files
122+ for file_path, class_file in analysis.get_symbol_table().items():
123+ # Iterate over classes
124+ for type_name, type_declaration in class_file.type_declarations.items():
125+ # Iterate over methods
126+ for method in type_declaration.callable_declarations.values(): # (3)!
127+ # Get code body
128+ code_body = Path(file_path).absolute().resolve().read_text()
129+
130+ # Initialize treesitter
131+ tree_sitter_utils = cldk.tree_sitter_utils(source_code=code_body) # (4)!
132+
133+ # Sanitize class
134+ sanitized_class = tree_sitter_utils.sanitize_focal_class(method.declaration) # (5)!
135+
136+ # Format instruction
137+ instruction = (
138+ f"Question: Can you write a brief summary for the method "
139+ f" ` {method.declaration}` in the class ` {type_name}` below? \n\n "
140+ f" ` ` ` java\n {sanitized_class}` ` ` \n "
141+ )
142+
143+ # Prompt Ollama
144+ summary = ollama.generate(
145+ model=" granite-code:8b-instruct" , # (6)!
146+ prompt=instruction).get(" response" ) # (7)!
147+
148+ # Print output
149+ print(f" \n Method: {method.declaration}" )
150+ print(f" Summary: {summary}" )
151+ ` ` `
152+
153+
154+ 1. Create a new instance of the CLDK class
155+ 2. Create an ` analysis` instance for the Java project. This object will be used to obtain all the analysis artifacts from the java project.
156+ 3. In a nested loop, we can quickly iterate over the methods in the project and extract the code body.
157+ 4. CLDK comes with a number of treesitter based utilities that can be used to extract and manipulate code snippets.
158+ 5. We use the ` sanitize_focal_class()` method to extract the focal class for the method and sanitize any unwanted code in just one line of code.
159+ 6. Try your favorite model for code summarization. We use the `granite-code:8b-instruct` model in this example.
160+ 7. We prompt Ollama with the sanitized class and method declaration to generate a summary for the method.
161+ ---
162+
163+ # ## Running `code_summarization.py`
164+
165+ Save the file as ` code_summarization.py` and run it:
166+ ` ` ` shell
167+ python code_summarization.py
168+ ` ` `
169+
170+ You' ll see output like:
171+ ```
172+ Method: parse
173+ Summary: This method parses command line arguments using the specified Options object...
174+
175+ Method: validateOption
176+ Summary: Validates if an option meets the required criteria including checking...
177+
178+ ...
179+ ```
180+
181+ ## Step 5: Customize Analysis
182+
183+ The pipeline can be customized in several ways:
184+
185+ === "Change Model"
186+ Try different Granite model sizes:
187+ ```python
188+ summary = ollama.generate(
189+ model="granite-code:34b-instruct", # Larger model!
190+ prompt=instruction).get("response")
191+ ```
192+
193+ === "Modify Prompt"
194+ Adjust the task to generate a unit test:
195+ ```python
196+ def format_inst(code, focal_method, focal_class):
197+ return (f"Generate a complete unit test case using junit4 for the method `{focal_method}`...\n\n"
198+ f"```java\n{code}```\n")
199+ ```
200+
201+
202+ ## Next Steps
203+
204+ - Explore different analysis tasks like code repair, translation, test generation, and more...
205+ - Create richer prompts with more analysis artifacts that CLDK provides.
206+ - Implement batch processing for larger projects, or use the CLDK SDK to build custom analysis pipelines.
0 commit comments