Call graph construction is the foundation of inter-procedural static analysis. PyCG is the state-of-the-art approach for constructing call graphs for Python programs. Unfortunately, PyCG does not scale to large programs when adapted to whole-program analysis where application and dependent libraries are both analyzed. Moreover, PyCG is flow-insensitive and does not fully support Python’s features, hindering its accuracy.
To overcome these drawbacks, we propose a scalable and precise approach for constructing application-centered call graphs for Python programs, and implement it as a prototype tool JARVIS. JARVIS maintains a type graph (i.e., type relations of program identifiers) for each function in a program to allow type inference. Taking one function as an input, JARVIS generates the call graph on-the-fly, where flow-sensitive intra-procedural analysis and inter-procedural analysis are conducted in turn and strong updates are conducted. Our evaluation on a micro-benchmark of 135 small Python programs and a macro-benchmark of 6 real- world Python applications has demonstrated that JARVIS can significantly improve PYCG by at least 67% faster in time, 84% higher in precision, and at least 20% higher in recall.
The paper has been submitted to ICSE 2025. The Jarvis artifact is provided here.
The micro-benchmark and macro-benchmark are provided in dataset and grount_truth directory.
Prerequisites:
- Python = 3.8
- PyCG: tool/PyCG
- Jarvis: tool/Jarvis
run jarvis_cli.py.
Jarvis usage:
$ python3 tool/Jarvis/jarvis_cli.py [module_path1 module_path2 module_path3...] [--package] [--decy] [-o output_path]Jarvis help:
$ python3 tool/Jarvis/jarvis_cli.py -h
usage: jarvis_cli.py [-h] [--package PACKAGE] [--decy] [--precision]
[--moduleEntry [MODULEENTRY ...]]
[--operation {call-graph,key-error}] [-o OUTPUT]
[module ...]
positional arguments:
module modules to be processed, which are also application entries in A.W. mode
options:
-h, --help show this help message and exit
--package PACKAGE Package containing the code to be analyzed
--decy whether analyze the dependencies
--precision whether flow-sensitive
--entry-point [MODULEENTRY ...]
Entry functions to be processed
-o OUTPUT, --output OUTPUT
Output call graph pathExample 1: analyze bpytop.py in E.A. mode.
$ python3 tool/Jarvis/jarvis_cli.py dataset/macro_benchmark/pj/bpytop/bpytop.py --package dataset/macro_benchmark/pj/bpytop -o jarvis.jsonExample 2: analyze bpytop.py in A.W. mode. Note we should prepare all the dependencies in the virtual environment.
# create virtualenv environment
$ virtualenv venv python=python3.8
# install Dependencies in virtualenv environment
$ python3 -m pip install psutil
# run jarvis
$ python3 tool/Jarvis/jarvis_cli.py dataset/macro_benchmark/pj/bpytop/bpytop.py --package dataset/macro_benchmark/pj/bpytop --decy -o jarvis.jsoncd to the root directory of the unzipped files.
# 1. run micro_benchmark
$ ./reproducing_RQ12_setup/micro_benchmark/test_All.sh
# 2. run macro_benchmark
$ ./reproducing_RQ12_setup/macro_benchmark/pycg_EA.sh
# PyCG iterates once
$ ./reproducing_RQ12_setup/macro_benchmark/pycg_EW.sh 1
# PyCG iterates twice
$ ./reproducing_RQ12_setup/macro_benchmark/pycg_EW.sh 2
# PyCG iterates to convergence
$ ./reproducing_RQ12_setup/macro_benchmark/pycg_EW.sh
$ ./reproducing_RQ12_setup/macro_benchmark/jarvis_AA.sh
$ ./reproducing_RQ12_setup/macro_benchmark/jarvis_EA.sh
$ ./reproducing_RQ12_setup/macro_benchmark/jarvis_AW.shRun
$ python3 ./reproducing_RQ1/gen_table.pyThe results are shown below:
Run
$ pip3 install matplotlib
$ pip3 install numpy
$ python3 ./reproducing_RQ1/FTG/plot.pyThe generated graphs are pycg-ag.pdf, pycg-change-ag.pdf and jarvis-ftg.pdf, where they represents Fig. 9a, Fig. 9b and Fig 10, correspondingly.
Run
$ python3 ./reproducing_RQ2/gen_table.py The generated results:
Scalability results (RQ1), AE denotes AssertionError:
Accuracy results (RQ2):
The 43 python projects out of the top 200 Highly-starred projects are listed in file
Fastapi, Httpie, Scrapy, Lightning, Airflow,sherlock,wagtail
Html: CVE-2018-17142(Golang)- cryptography: CVE-2016-9243, CVE-2020-36242, CVE-2018-10903
- urllib3: CVE-2021-33503, CVE-2019-11324, CVE-2019-11236, CVE-2020-7212
- requests: CVE-2014-1830, CVE-2015-2296, CVE-2018-18074
psutil: CVE-2019-18874(C)Numpy: CVE-2021-33430, CVE-2014-1858, CVE-2014-1859, CVE-2017-12852(cpp)lxml: CVE-2021-28957, CVE-2018-19787, CVE-2020-27783, CVE-2014-3146(js)- jinja2 : CVE-2020-28493, CVE-2014-0012, CVE-2014-1402
- sqlalchemy : CVE-2019-7164, CVE-2019-7548
- httpx: CVE-2021-41945
The CVEs of html , numpy , lxml,psutil don't relate to Python , we don't care them.
- sherlock.sherlock
- requests(v2.28.0)
- urllib3(v1.26.0) ---- [CVE-2021-33503,CVE-2019-11324,CVE-2019-11236,CVE-2020-7212]
- sherlock.sites
- requests(v.2.28.0)
- urllib3(v1.26.0) ---- [CVE-2021-33503,CVE-2019-11324,CVE-2019-11236,CVE-2020-7212]
- airflow.kubernetes.kube_client
- urllib3(v1.26.0) ---- [CVE-2021-33503,CVE-2019-11324,CVE-2019-11236,CVE-2020-7212]
- airflow.providers.cncf.kubernetes.operators.pod
- urllib3(v1.26.0) ---- [CVE-2021-33503,CVE-2019-11324,CVE-2019-11236,CVE-2020-7212]
- airflow.providers.cncf.kubernetes.utils.pot_manager
- urllib3(v1.26.0) ---- [CVE-2021-33503,CVE-2019-11324,CVE-2019-11236,CVE-2020-7212]
- airflow.executors.kubernetes_executor
- urllib3(v1.26.0) ---- [CVE-2021-33503,CVE-2019-11324,CVE-2019-11236,CVE-2020-7212]
......
- wagtail.contrib.frontent_cache.backends
- requests(v2.28.0)
- urllib3(v1.26.0) ---- [CVE-2021-33503,CVE-2019-11324,CVE-2019-11236,CVE-2020-7212]
- httpie.client
- urllib3(v1.26.0) ---- [CVE-2021-33503,CVE-2019-11324,CVE-2019-11236,CVE-2020-7212]
- httpie.ssl_
- urllib3(v1.26.0) ---- [CVE-2021-33503,CVE-2019-11324,CVE-2019-11236,CVE-2020-7212]
- httpie.models
- urllib3(1.26.0) ---- [CVE-2021-33503,CVE-2019-11324,CVE-2019-11236,CVE-2020-7212]
- scrapy.downloadermiddlewares.cookies
- tldextract(v3.4.4)
- requests(v2.28.0)
- urllib3(v1.26.0) ---- [CVE-2021-33503,CVE-2019-11324,CVE-2019-11236,CVE-2020-7212]
- lightning.app.utilities.network
- requests(v2.28.0)
- urllib3(v1.26.0) ---- [CVE-2021-33503,CVE-2019-11324,CVE-2019-11236,CVE-2020-7212]
- lightning.app.utilities.network
- requests(v2.28.0)
- urllib3(v1.26.0) ---- [CVE-2021-33503,CVE-2019-11324,CVE-2019-11236,CVE-2020-7212]
- lightning.app.utilities.network
- requests(v2.28.0)
- urllib3(v1.26.0) ---- [CVE-2021-33503,CVE-2019-11324,CVE-2019-11236,CVE-2020-7212]
...
According to the patch commit, the vulnerable method of CVE-2021-33503 in urllib3 is urllib3.util.url.
Below is the method-level invocation path:
- httpie.apapters.<main>
- requests.adapters.<main>
- urllib3.contrib.socks.<main>
- Urllib3.util.url.<main> ---- CVE-2021-33503
- scrapy.downloadermiddlewares.cookies.<main>
- tldextract.__init__.<main>
- tldextract.tldextract.<main>
- tldextract.suffix_list.<main>
- requests_file.<main>
- requests.adapters.<main>
- Urllib3.util.url.<main> ---- CVE-2021-33503
- lightning.app.utilities.network.<main>
- requests.adapters.<main>
- urllib3.contrib.socks.<main>
- Urllib3.util.url.<main> ---- CVE-2021-33503
- airflow.providers.amazon.aws.hooks.base_aws.BaseSessionFactory._get_idp_response
- requests.adapters.<main>
- urllib3.contrib.sock.<main>
- urllib3.util.url.<main> ---- CVE-2021-33503
PS:
represents body code block of python file.(Because python doesn't need entry function)Our artifact has reused part of the functionalities from third party libraries. i.e., PyCG.
Vitalis Salis et al. PyCG: Practical Call Graph Generation in Python. In 43rd International Conference on Software Engineering (ICSE), 25–28 May 2021.



