Skip to content

Conversation

@keshav-space
Copy link
Member

Signed-off-by: Keshav Priyadarshi <git@keshav.space>
Signed-off-by: Keshav Priyadarshi <git@keshav.space>
@keshav-space keshav-space self-assigned this Feb 5, 2025
Signed-off-by: Keshav Priyadarshi <git@keshav.space>
Signed-off-by: Keshav Priyadarshi <git@keshav.space>
Signed-off-by: Keshav Priyadarshi <git@keshav.space>
Signed-off-by: Keshav Priyadarshi <git@keshav.space>
Signed-off-by: Keshav Priyadarshi <git@keshav.space>
Signed-off-by: Keshav Priyadarshi <git@keshav.space>
Copy link
Member

@JonoYang JonoYang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@keshav-space The code looks alright, but I need to play around with this to better understand code stemming

Signed-off-by: Keshav Priyadarshi <git@keshav.space>
@JonoYang
Copy link
Member

@keshav-space I am using this branch of matchcode-toolkit in aboutcode-org/matchcode-tests#2 and it is failing when stemming some C source files.

location = '/tmp/scancode-tk-tests -7j8ig1k0/rfxpd8b_/Dataset.zip/Dataset/Control/Human1/itemToString_Human1.c'

    def get_parser(location):
        """
        Get the appropriate tree-sitter parser and grammar config for
        file at location.
        """
        file_type = Type(location)
        language = file_type.programming_language
    
        if not language or language not in TS_LANGUAGE_CONF:
            return
    
        language_info = TS_LANGUAGE_CONF[language]
        wheel = language_info["wheel"]
    
        try:
            grammar = importlib.import_module(wheel)
        except ModuleNotFoundError:
            raise TreeSitterWheelNotInstalled(f"{wheel} package is not installed")
    
>       parser = Parser(language=Language(grammar.language()))
E       ValueError: Incompatible Language version 15. Must be between 13 and 14

venv/lib/python3.10/site-packages/matchcode_toolkit/stemming.py:79: ValueError

https://dev.azure.com/nexB/matchcode-tests/_build/results?buildId=15446&view=logs&j=41fca3e8-fcfe-5670-e26e-f33ade403b7f&t=cc4dfe40-db93-5fe0-c2b6-39217bc3c5fe&l=123

@keshav-space
Copy link
Member Author

I am using this branch of matchcode-toolkit in aboutcode-org/matchcode-tests#2 and it is failing when stemming some C source files.

@JonoYang Looking into it

@JonoYang
Copy link
Member

@keshav-space The issue was that I was not using the same version of tree sitter in matchcode-tests. Once I created requirements files and pinned the dependencies to the same ones we have in matchcode-toolkit, it worked.

@JonoYang
Copy link
Member

tests are passing in matchcode-toolkit. Thanks!

@JonoYang JonoYang merged commit ca21376 into main Feb 19, 2025
6 checks passed
@JonoYang JonoYang deleted the code-stemming branch February 19, 2025 19:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

AI-GCS: Design and implement "Code Stemming", e.g., token replacement and abstraction

2 participants