Skip to content
/ chapi Public

CHAPI (Common Hierarchical Abstract Parser and Information Converter) streamlines code analysis by converting diverse language source code into a unified abstract model, simplifying cross-language development. Chapi 是一个通用层次抽象解析器与信息转换器,它可以将不同编程语言的源代码转换为统一的层次抽象模型。

License

Notifications You must be signed in to change notification settings

phodal/chapi

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Chapi

Chapi Logo

Chapi CI codecov Maven Central

CHAPI (Common Hierarchical Abstract Parser and Information Converter) streamlines code analysis by converting source code from different languages into a unified abstract model, making cross-language analysis and tooling easier.

Chapi 是一个通用层次抽象解析器与信息转换器,它可以将不同编程语言的源代码转换为统一的层次抽象模型, 从而简化跨语言的代码分析与工具构建。

Chapi => Cha Pi => Tea Pi => Tea π => 茶 π. Reference: Tea if by sea, cha if by land.

Chapi (pronounced /tʃɑpi/) can also be read as “XP” in Chinese if you pronounce “X” as “叉”.

Status & language coverage

Language stages

Feature Java Python Go Kotlin TS/JS C C# Scala C++ Rust Swift
HTTP API decl 🆕 🆕 🆕
Syntax parsing
Function calls
Arch/package 🆕
Real-world

IDL stages

Feature Protobuf Thrift
Syntax parsing
HTTP API decl
Arch/package
Real-world

Projects using Chapi

  • ArchGuard — An architecture workbench for architecture governance. It can analyze architecture at container/component/code levels, create architecture fitness functions, and inspect system dependencies.
  • UnitGen — A fine-tuning data framework that generates datasets from your existing codebase.
  • ChocoBuilder — An LLM toolkit for building custom AI assistants.

PS: PRs are welcome — feel free to add your project here.

Language information

Tested language versions:

  • Java: 8, 11, 17
  • TypeScript/JavaScript
  • Kotlin
  • Rust: v1.60.0
  • Python: 2, 3
  • Swift: 5, 6 (with typed throws, async/await, actors, ownership modifiers)

Gradle modules (by tier):

// tier 1 languages
":chapi-ast-java",
":chapi-ast-typescript",

// tier 1 model language
":chapi-ast-protobuf",
":chapi-ast-thrift",

// tier 2 languages
":chapi-ast-kotlin",
":chapi-ast-go",
":chapi-ast-python",
":chapi-ast-scala",

// tier 3 languages
":chapi-ast-rust",
":chapi-ast-csharp",
":chapi-ast-c",
":chapi-ast-cpp",
":chapi-ast-swift",

// others
":chapi-parser-toml",
":chapi-parser-cmake",

Language families (refs):

Category Languages Planned support
C family C#, Java, Go, C, C++, Objective-C, Rust, ... C++, C, Java, C#, Rust?
Functional Scheme, Lisp, Clojure, Scala, ... Scala
Scripting Lua, PHP, JavaScript, Python, Perl, Ruby, ... Python, JavaScript
Other Fortran, Swift, Matlab, ... Swift?, Fortran?

Parsing / analysis rules

Chapi scans twice to improve cross-file resolution.

  • It helps find data structures in the same package/module.

TypeScript

  1. PackageName uses the resolved path. For example, src/grammar/blbla.ts becomes @.grammar.
  2. Top-level functions in a file use default as DataStructure.Name.
  3. export default Object uses default as FunctionName and belongs to the default data structure.

C# notes

C

We use https://github.com/shevek/jcpp to preprocess C code.

Kotlin

  • warpTargetFullType is required to resolve classes in the same package.

Usage

Add dependencies:

dependencies {
    implementation "com.phodal.chapi:chapi-ast-java:2.5.2"
    implementation "com.phodal.chapi:chapi-domain:2.5.2"
}

Example (Kotlin):

import chapi.domain.core.CodeDataStruct
import kotlinx.coroutines.async
import kotlinx.coroutines.awaitAll
import kotlinx.coroutines.runBlocking
import org.archguard.scanner.core.sourcecode.SourceCodeContext
import java.io.File

class CSharpAnalyser(override val context: SourceCodeContext)

private val client = context.client
private val impl = chapi.ast.csharpast.CSharpAnalyser()

fun analyse(): List<CodeDataStruct> = runBlocking {
    getFilesByPath(context.path) {
        it.absolutePath.endsWith(".cs")
    }
        .map { async { analysisByFile(it) } }.awaitAll()
        .flatten()
        .also { client.saveDataStructure(it) }
}

fun analysisByFile(file: File): List<CodeDataStruct> {
    val codeContainer = impl.analysis(file.readContent(), file.name)
    return codeContainer.Containers.flatMap { container ->
        container.DataStructures.map {
            it.apply {
                it.Imports = codeContainer.Imports
                it.FilePath = file.absolutePath
            }
        }
    }
}

Examples Input & Output

Java source:

package adapters.outbound.persistence.blog;

public class BlogPO implements PersistenceObject<Blog> {
    @Override
    public Blog toDomainModel() {

    }
}

Output:

{
    "Imports": [],
    "Implements": [
        "PersistenceObject<Blog>"
    ],
    "NodeName": "BlogPO",
    "Extend": "",
    "Type": "CLASS",
    "FilePath": "",
    "InOutProperties": [],
    "Functions": [
        {
            "IsConstructor": false,
            "InnerFunctions": [],
            "Position": {
                "StartLine": 6,
                "StartLinePosition": 133,
                "StopLine": 8,
                "StopLinePosition": 145
            },
            "Package": "",
            "Name": "toDomainModel",
            "MultipleReturns": [],
            "Annotations": [
                {
                    "Name": "Override",
                    "KeyValues": []
                }
            ],
            "Extension": {},
            "Override": false,
            "extensionMap": {},
            "Parameters": [],
            "InnerStructures": [],
            "ReturnType": "Blog",
            "Modifiers": [],
            "FunctionCalls": []
        }
    ],
    "Annotations": [],
    "Extension": {},
    "Parameters": [],
    "Fields": [],
    "MultipleExtend": [],
    "InnerStructures": [],
    "Package": "adapters.outbound.persistence.blog",
    "FunctionCalls": []
}

Development

Syntax parsing identification rules:

  1. Package name
  2. Import name
  3. Class / data structure
    1. Structure name
    2. Structure parameters
    3. Function names
    4. Return types
    5. Function parameters
  4. Function
    1. Function name
    2. Return types
    3. Function parameters
  5. Method call
    1. New instance call
    2. Parameter call
    3. Field call

Build Antlr grammar

  1. Install Antlr: brew install antlr
  2. Compile grammars: ./scripts/compile-antlr.sh

Data structures

classDiagram
direction TB

%% project/module/package
CodeProject "1" o-- "*" CodeModule : Modules
CodeModule "1" o-- "*" CodePackage : Packages
CodeModule "1" o-- "1" CodePackageInfo : packageInfo
CodePackageInfo "1" o-- "*" CodeDependency : Dependencies

%% package/container
CodePackage "1" o-- "*" CodeContainer : codeContainers
CodePackage "1" o-- "*" CodePackage : Packages
CodeContainer "1" o-- "*" CodeImport : Imports
CodeContainer "1" o-- "*" CodeMember : Members
CodeContainer "1" o-- "*" CodeDataStruct : DataStructures
CodeContainer "1" o-- "*" CodeField : Fields
CodeContainer "1" o-- "*" CodeContainer : Containers
CodeContainer "0..1" o-- "1" TopLevelScope : TopLevel

%% core data structures
CodeDataStruct "1" o-- "*" CodeField : Fields
CodeDataStruct "1" o-- "*" CodeFunction : Functions
CodeDataStruct "1" o-- "*" CodeDataStruct : InnerStructures
CodeDataStruct "1" o-- "*" CodeAnnotation : Annotations
CodeDataStruct "1" o-- "*" CodeCall : FunctionCalls
CodeDataStruct "1" o-- "*" CodeImport : Imports
CodeDataStruct "1" o-- "1" CodePosition : Position

CodeFunction "1" o-- "*" CodeProperty : Parameters
CodeFunction "1" o-- "*" CodeProperty : MultipleReturns
CodeFunction "1" o-- "*" CodeCall : FunctionCalls
CodeFunction "1" o-- "*" CodeAnnotation : Annotations
CodeFunction "1" o-- "*" CodeDataStruct : InnerStructures
CodeFunction "1" o-- "*" CodeFunction : InnerFunctions
CodeFunction "1" o-- "1" CodePosition : Position

CodeField "1" o-- "*" CodeAnnotation : Annotations
CodeField "1" o-- "*" CodeCall : Calls
CodeField "1" o-- "*" CodeField : ArrayValue

CodeCall "1" o-- "*" CodeProperty : Parameters
CodeCall "1" o-- "1" CodePosition : Position

CodeMember "1" o-- "*" CodeDataStruct : StructureNodes
CodeMember "1" o-- "*" CodeFunction : FunctionNodes
CodeMember "1" o-- "1" CodePosition : Position
Loading

License

Phodal's Idea

@2020 A Phodal Huang's Idea. This code is distributed under the MPL license. See LICENSE in this directory.

About

CHAPI (Common Hierarchical Abstract Parser and Information Converter) streamlines code analysis by converting diverse language source code into a unified abstract model, simplifying cross-language development. Chapi 是一个通用层次抽象解析器与信息转换器,它可以将不同编程语言的源代码转换为统一的层次抽象模型。

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 7