CHAPI (Common Hierarchical Abstract Parser and Information Converter) streamlines code analysis by converting source code from different languages into a unified abstract model, making cross-language analysis and tooling easier.
Chapi 是一个通用层次抽象解析器与信息转换器,它可以将不同编程语言的源代码转换为统一的层次抽象模型, 从而简化跨语言的代码分析与工具构建。
Chapi => Cha Pi => Tea Pi => Tea π => 茶 π. Reference: Tea if by sea, cha if by land.
Chapi (pronounced /tʃɑpi/) can also be read as “XP” in Chinese if you pronounce “X” as “叉”.
| Feature | Java | Python | Go | Kotlin | TS/JS | C | C# | Scala | C++ | Rust | Swift |
|---|---|---|---|---|---|---|---|---|---|---|---|
| HTTP API decl | ✅ | 🆕 | ✅ | ✅ | ✅ | 🆕 | ✅ | ✅ | 🆕 | ✅ | |
| Syntax parsing | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| Function calls | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ||
| Arch/package | ✅ | ✅ | ✅ | ✅ | 🆕 | ✅ | ✅ | ✅ | ✅ | ✅ | |
| Real-world | ✅ | ✅ | ✅ |
| Feature | Protobuf | Thrift |
|---|---|---|
| Syntax parsing | ✅ | ✅ |
| HTTP API decl | ✅ | ✅ |
| Arch/package | ✅ | |
| Real-world | ✅ |
- ArchGuard — An architecture workbench for architecture governance. It can analyze architecture at container/component/code levels, create architecture fitness functions, and inspect system dependencies.
- UnitGen — A fine-tuning data framework that generates datasets from your existing codebase.
- ChocoBuilder — An LLM toolkit for building custom AI assistants.
PS: PRs are welcome — feel free to add your project here.
Tested language versions:
- Java: 8, 11, 17
- TypeScript/JavaScript
- Kotlin
- Rust: v1.60.0
- Python: 2, 3
- Swift: 5, 6 (with typed throws, async/await, actors, ownership modifiers)
Gradle modules (by tier):
// tier 1 languages
":chapi-ast-java",
":chapi-ast-typescript",
// tier 1 model language
":chapi-ast-protobuf",
":chapi-ast-thrift",
// tier 2 languages
":chapi-ast-kotlin",
":chapi-ast-go",
":chapi-ast-python",
":chapi-ast-scala",
// tier 3 languages
":chapi-ast-rust",
":chapi-ast-csharp",
":chapi-ast-c",
":chapi-ast-cpp",
":chapi-ast-swift",
// others
":chapi-parser-toml",
":chapi-parser-cmake",
Language families (refs):
- First-class function: https://en.wikipedia.org/wiki/First-class_function
- Algol family: https://wiki.c2.com/?AlgolFamily
| Category | Languages | Planned support |
|---|---|---|
| C family | C#, Java, Go, C, C++, Objective-C, Rust, ... | C++, C, Java, C#, Rust? |
| Functional | Scheme, Lisp, Clojure, Scala, ... | Scala |
| Scripting | Lua, PHP, JavaScript, Python, Perl, Ruby, ... | Python, JavaScript |
| Other | Fortran, Swift, Matlab, ... | Swift?, Fortran? |
Chapi scans twice to improve cross-file resolution.
- It helps find data structures in the same package/module.
PackageNameuses the resolved path. For example,src/grammar/blbla.tsbecomes@.grammar.- Top-level functions in a file use
defaultasDataStructure.Name. export default ObjectusesdefaultasFunctionNameand belongs to thedefaultdata structure.
- Interpolated string parsing
- Official grammar: https://github.com/dotnet/roslyn/blob/main/src/Compilers/CSharp/Portable/Generated/CSharp.Generated.g4
- Related Antlr issue: antlr/grammars-v4#1146
- Import analysis
- In C#, importing a
namespaceallows resolving calls inside that namespace.
- In C#, importing a
We use https://github.com/shevek/jcpp to preprocess C code.
warpTargetFullTypeis required to resolve classes in the same package.
Add dependencies:
dependencies {
implementation "com.phodal.chapi:chapi-ast-java:2.5.2"
implementation "com.phodal.chapi:chapi-domain:2.5.2"
}Example (Kotlin):
import chapi.domain.core.CodeDataStruct
import kotlinx.coroutines.async
import kotlinx.coroutines.awaitAll
import kotlinx.coroutines.runBlocking
import org.archguard.scanner.core.sourcecode.SourceCodeContext
import java.io.File
class CSharpAnalyser(override val context: SourceCodeContext)
private val client = context.client
private val impl = chapi.ast.csharpast.CSharpAnalyser()
fun analyse(): List<CodeDataStruct> = runBlocking {
getFilesByPath(context.path) {
it.absolutePath.endsWith(".cs")
}
.map { async { analysisByFile(it) } }.awaitAll()
.flatten()
.also { client.saveDataStructure(it) }
}
fun analysisByFile(file: File): List<CodeDataStruct> {
val codeContainer = impl.analysis(file.readContent(), file.name)
return codeContainer.Containers.flatMap { container ->
container.DataStructures.map {
it.apply {
it.Imports = codeContainer.Imports
it.FilePath = file.absolutePath
}
}
}
}
Java source:
package adapters.outbound.persistence.blog;
public class BlogPO implements PersistenceObject<Blog> {
@Override
public Blog toDomainModel() {
}
}Output:
{
"Imports": [],
"Implements": [
"PersistenceObject<Blog>"
],
"NodeName": "BlogPO",
"Extend": "",
"Type": "CLASS",
"FilePath": "",
"InOutProperties": [],
"Functions": [
{
"IsConstructor": false,
"InnerFunctions": [],
"Position": {
"StartLine": 6,
"StartLinePosition": 133,
"StopLine": 8,
"StopLinePosition": 145
},
"Package": "",
"Name": "toDomainModel",
"MultipleReturns": [],
"Annotations": [
{
"Name": "Override",
"KeyValues": []
}
],
"Extension": {},
"Override": false,
"extensionMap": {},
"Parameters": [],
"InnerStructures": [],
"ReturnType": "Blog",
"Modifiers": [],
"FunctionCalls": []
}
],
"Annotations": [],
"Extension": {},
"Parameters": [],
"Fields": [],
"MultipleExtend": [],
"InnerStructures": [],
"Package": "adapters.outbound.persistence.blog",
"FunctionCalls": []
}Syntax parsing identification rules:
- Package name
- Import name
- Class / data structure
- Structure name
- Structure parameters
- Function names
- Return types
- Function parameters
- Function
- Function name
- Return types
- Function parameters
- Method call
- New instance call
- Parameter call
- Field call
- Install Antlr:
brew install antlr - Compile grammars:
./scripts/compile-antlr.sh
classDiagram
direction TB
%% project/module/package
CodeProject "1" o-- "*" CodeModule : Modules
CodeModule "1" o-- "*" CodePackage : Packages
CodeModule "1" o-- "1" CodePackageInfo : packageInfo
CodePackageInfo "1" o-- "*" CodeDependency : Dependencies
%% package/container
CodePackage "1" o-- "*" CodeContainer : codeContainers
CodePackage "1" o-- "*" CodePackage : Packages
CodeContainer "1" o-- "*" CodeImport : Imports
CodeContainer "1" o-- "*" CodeMember : Members
CodeContainer "1" o-- "*" CodeDataStruct : DataStructures
CodeContainer "1" o-- "*" CodeField : Fields
CodeContainer "1" o-- "*" CodeContainer : Containers
CodeContainer "0..1" o-- "1" TopLevelScope : TopLevel
%% core data structures
CodeDataStruct "1" o-- "*" CodeField : Fields
CodeDataStruct "1" o-- "*" CodeFunction : Functions
CodeDataStruct "1" o-- "*" CodeDataStruct : InnerStructures
CodeDataStruct "1" o-- "*" CodeAnnotation : Annotations
CodeDataStruct "1" o-- "*" CodeCall : FunctionCalls
CodeDataStruct "1" o-- "*" CodeImport : Imports
CodeDataStruct "1" o-- "1" CodePosition : Position
CodeFunction "1" o-- "*" CodeProperty : Parameters
CodeFunction "1" o-- "*" CodeProperty : MultipleReturns
CodeFunction "1" o-- "*" CodeCall : FunctionCalls
CodeFunction "1" o-- "*" CodeAnnotation : Annotations
CodeFunction "1" o-- "*" CodeDataStruct : InnerStructures
CodeFunction "1" o-- "*" CodeFunction : InnerFunctions
CodeFunction "1" o-- "1" CodePosition : Position
CodeField "1" o-- "*" CodeAnnotation : Annotations
CodeField "1" o-- "*" CodeCall : Calls
CodeField "1" o-- "*" CodeField : ArrayValue
CodeCall "1" o-- "*" CodeProperty : Parameters
CodeCall "1" o-- "1" CodePosition : Position
CodeMember "1" o-- "*" CodeDataStruct : StructureNodes
CodeMember "1" o-- "*" CodeFunction : FunctionNodes
CodeMember "1" o-- "1" CodePosition : Position
@2020 A Phodal Huang's Idea. This code is distributed under
the MPL license. See LICENSE in this directory.