Skip to content

Arsngrobg/SourceDiff

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

200 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SourceDiff

SourceDiff is an advanced code analysis tool designed to measure and minimize structural distance between codebases. It helps developers, educators, and code reviewers by analysing how similar codebases are.

By leveraging static analysis, parse trees (PTs), and pattern recognition technique, SourceDiff provides helpful refactoring suggestions to align codebases more closely or to highlight significant divergences. This makes it particularly useful for:

  • Plagiarism or collusion in academic environments
  • AI-generated or unoriginal code that may be copy-pasted or generated by an AI agent
  • Redundant code that may produce unused compilation artifacts
  • Code transposition for duplicate logic that can be transposed into methods/functions

Use Cases

  • Academia
    • Detecting plagiarism through copy-pasting or AI-generated code
    • Evaluation of code quality - sudden differences in code quality may hint at cheating
  • Professional
    • Identify AI-generated code in code reviews
    • Maintain consistent programming styles across the codebase

Plug-n-Play

SourceDiff uses the Tree Sitter API for parsing source code into parse trees, and they provide a large database of officially-supported, pre-compiled incremental parsers. This project can dynamically compile these parsers for you, or you can do so manually.

Extensibility

If your programming language does not have an officially-recognised Tree Sitter parser, you can always create your own. See the Tree Sitter documentation for how to create your own.

See below on how to register a new grammar to SourceDiff.

Building From Source

To build SourceDiff from source, you are required to have make installed on your system. This makes it really to compile SourceDiff: Invoke the command:

~ > make

This will produce a build/pkg directory that will contain all the neccessary files to host SourceDiff. You can now either: setup your PATH variable if you intend to properly use this software; or navigate to the build/pkg directory. After, just invoke the executable using the command-line:

~ > ./srcdiff diff <file> <file>

By default, SourceDiff will be packaged without any precompiled grammars. As stated above, you can find pre made parsers on the Tree Sitter docs. To register a new language parser:

~ > ./srcdiff register <name> <dir>

You specify the resulting name of the shared library that SourceDiff references and the source directory of the parser .c files.

One additional layer of configuration comes with the lookup table. You must configure it using a syntax very similar to Python's dictionary definition syntax. You define a configuration as a key, followed by many values which are file extensions SourceDiff will use to associate files to language parsers.

~ > ./srcdiff lut set {python:[py,pyw,pyi]}

This will work in practice, only if you have configured a Python grammar.

For more information on how to use this software, use the following command:

~ > ./srcdiff --help

About

A tool for analysing codebases using parse trees

Topics

Resources

License

Stars

Watchers

Forks

Contributors