You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[](https://github.com/LogIntelligence/UNLEASH/actions/workflows/build-and-test.yml)
__UNLEASH__ is a semantic-based log parsing framework. This repository includes artifacts for reuse and reproduction of experimental results presented in our ICSE'25 paper titled _"Unleashing the True Potential of Semantic-based Log Parsing with Pre-trained Language Models"_.
-[Clone UNLEASH from GitHub](#clone-unleash-from-github)
14
16
-[Create and activate a virtual environment](#create-and-activate-a-virtual-environment)
15
17
-[Install UNLEASH from PyPI or Build from source](#install-unleash-from-pypi-or-build-from-source)
18
+
-[Usage](#to-run-the-code)
16
19
-[Test the installation](#test-the-installation)
17
-
-[To run the code](#to-run-the-code)
18
-
-[Reproducibility](#reproducibility)
19
-
-[Parsing Performance](#parsing-performance)
20
-
-[Scalability and Generalization](#scalability-and-generalization)
21
-
-[Other Settings](#other-settings)
20
+
-[Basic usage](#basic-usage)
21
+
-[Run sampling for a specific dataset](#1-run-sampling-for-a-specific-dataset)
22
+
-[Run UNLEASH on a specific dataset](#2-run-unleash-on-a-specific-dataset)
23
+
-[Evaluate Unleash on a specific dataset](#3-evaluate-unleash-on-a-specific-dataset)
24
+
-[Reproducibility](#reproducibility)
25
+
-[Parsing Performance](#parsing-performance)
26
+
-[Scalability and Generalization](#scalability-and-generalization)
27
+
-[Other Settings](#other-settings)
22
28
-[Download Paper](#download-paper)
23
29
-[Citation](#citation)
24
30
-[Contact](#contact)
25
31
26
-
## Repository Structure
32
+
## Purpose
27
33
28
-
There are three main components in the repository:
29
-
1.`datasets`: Contains the log datasets used in the experiments.
30
-
2.`examples`: Contains the scripts to run the experiments.
31
-
3.`unleash`: Contains the implementation of UNLEASH.
34
+
The artifacts in this repository provides the UNLEASH tool along with the neccessary benchmarks and scripts, facilitating its reuse and enabling the replication of the associated study.
32
35
33
-
<details>
34
-
<Summary>The main structure of the repository would look like this</Summary>
36
+
## Provenance
37
+
38
+
Our artifacts are available via public archival repositories, including:
39
+
- A copy of the paper is available at: [docs/paper/ICSE_25__Unleash.pdf](docs/paper/ICSE_25___Unleash.pdf).
40
+
- The archival repository is available at: https://archive.softwareheritage.org/browse/origin/https://github.com/LogIntelligence/UNLEASH/.
41
+
- The datasets are adopted from existing works, which are publicly available at: https://zenodo.org/record/8275861.
42
+
43
+
## Data
44
+
45
+
The datasets used in the study are publicly available at: https://zenodo.org/record/8275861. The storage requirements for the datasets are approximately 966 MB (compressed) and 13 GB (uncompressed) for 14 datasets.
35
46
47
+
During the operation of UNLEASH, the datasets will be automatically downloaded and extracted to the `datasets` folder by default. You can also download the datasets manually and extract them in the `datasets` folder. The datasets should be organized as follows:
36
48
```
37
49
📦 UNLEASH
38
-
├─ LICENSE
39
-
├─ README.md
40
50
├─ datasets
41
51
│ └─ loghub-2.0
42
52
│ ├─ Apache
@@ -46,66 +56,11 @@ There are three main components in the repository:
46
56
│ │ ├─ Apache_full.log_templates.csv
47
57
│ │ └─ Apache_full.log_templates_corrected.csv
48
58
│ ├─ ...
49
-
├─ docs
50
-
│ ├─ CL.png
51
-
│ ├─ Ob2_res.png
52
-
│ ├─ Ob3_res.png
53
-
│ ├─ RESULTS.md
54
-
│ └─ S_test_1.png
55
-
├─ environment.yml
56
-
├─ examples
57
-
│ ├─ 01_sampling.py
58
-
│ ├─ 02_run_unleash.py
59
-
│ ├─ 03_evaluation.py
60
-
│ ├─ benchmark.py
61
-
│ └─ config.py
62
-
├─ requirements.txt
63
-
├─ setup.py
64
-
├─ tests
65
-
│ └─ test.py
66
-
└─ unleash
67
-
├─ __init__.py
68
-
├─ arguments.py
69
-
├─ data
70
-
│ ├─ __init__.py
71
-
│ ├─ data_loader.py
72
-
│ └─ utils.py
73
-
├─ evaluation
74
-
│ ├─ settings.py
75
-
│ └─ utils
76
-
│ ├─ GA_calculator.py
77
-
│ ├─ PA_calculator.py
78
-
│ ├─ common.py
79
-
│ ├─ evaluator_main.py
80
-
│ ├─ oracle_template_correction.py
81
-
│ ├─ post_process.py
82
-
│ ├─ postprocess.py
83
-
│ └─ template_level_analysis.py
84
-
├─ models
85
-
│ ├─ __init__.py
86
-
│ ├─ base.py
87
-
│ ├─ deberta.py
88
-
│ └─ roberta.py
89
-
├─ parsing_base.py
90
-
├─ parsing_cache.py
91
-
├─ postprocess.py
92
-
├─ sampling
93
-
│ ├─ __init__.py
94
-
│ ├─ entropy_sampling.py
95
-
│ ├─ lilac_sampling.py
96
-
│ ├─ logppt_sampling.py
97
-
│ └─ utils.py
98
-
└─ tuning
99
-
├─ __init__.py
100
-
├─ early_stopping.py
101
-
├─ trainer.py
102
-
└─ utils.py
103
59
```
104
-
</details>
105
60
106
61
107
-
## Installation Instruction
108
-
The code is implemented in Python 3.9.
62
+
## Setup
63
+
The code is implemented in Python 3.9. We recommend using machines equipped with at least an 4-cores CPU, an 8GB **GPU**, 16GB RAM, and ~50GB available disk space with **Ubuntu 20.04** or **Ubuntu 22.04** to stably reproduce the experimental results in our paper. The full requirements to run the code can be found at [REQUIREMENTS.md](REQUIREMENTS.md).
109
64
110
65
### Install Python 3.9
111
66
We recommend using Python 3.9+ to run the code.
@@ -138,6 +93,8 @@ pip install icse-unleash
138
93
pip install -e .
139
94
```
140
95
96
+
## Usage
97
+
141
98
### Test the installation
142
99
```bash
143
100
pytest tests/test.py
@@ -158,14 +115,16 @@ tests/test.py ... [100%
158
115
```
159
116
</details>
160
117
161
-
## To run the code
118
+
119
+
### Basic usage
120
+
162
121
To perform log parsing on a specific dataset, you need to set the `dataset` parameter and set the working directory to the `examples` folder.
To reproduce the parsing performance, you can run the following command:
288
247
```bash
@@ -292,7 +251,7 @@ bash benchmark.sh
292
251
293
252
The parsing accuracy (`parsing_accuracy.csv`) and parsing time (`time_cost.json`) will be saved in the corresponding folders in the `../results` directory (e.g., `../results/iteration_01/logs`).
294
253
295
-
### Scalability and Generalization
254
+
####Scalability and Generalization
296
255
297
256
- Scalability: The scalability of UNLEASH is reflected in the parsing time and accuracy with different numbers of parsing processes. To run UNLEASH with different numbers of parsing processes, you can set the `parsing_num_processes` parameter in the `02_run_unleash.py` script and run [Step 2](#2-run-unleash-on-a-specific-dataset) again:
We recommend using machines equipped with at least 4 cores, 16GB RAM, and ~50GB available disk space with **Ubuntu 20.04** or **Ubuntu 22.04** to stably reproduce the experimental results in our paper.
2
+
We recommend using machines equipped with at least an 4-cores CPU, an 8GB **GPU**, 16GB RAM, and ~50GB available disk space with **Ubuntu 20.04** or **Ubuntu 22.04** to stably reproduce the experimental results in our paper.
3
3
4
4
## Software
5
5
6
6
- We develop UNLEASH using Python 3.9 (UNLEASH is compatible with Python 3.9+)
7
7
- We encourage user to create a virtual environment to use UNLEASH (e.g., `python3.9 -m venv env`).
8
-
- UNLEASH could be installed using pip. (e.g., `pip install unleash`).
9
-
- UNLEASH requires the following packages: `pytorch`, `transformers`, `numpy`, `pandas`, `scikit-learn`, `pytest`, `tqdm`, `requests`, and `matplotlib`.
8
+
- UNLEASH could be installed using pip. (e.g., `pip install icse-unleash`).
9
+
- UNLEASH requires the following packages: `ipython`, `natsort`, `nltk`, `numpy`, `pandas`, `regex`, `scikit_learn`, `scipy`, `textdistance`, `torch`, `tqdm`, `transformers`, `datasets`, `pytest`. All of these packages will be automatically installed when you install UNLEASH.
0 commit comments