Skip to content

Commit ac7d10d

Browse files
committed
update
1 parent d256fac commit ac7d10d

2 files changed

Lines changed: 44 additions & 85 deletions

File tree

README.md

Lines changed: 41 additions & 82 deletions
Original file line numberDiff line numberDiff line change
@@ -2,41 +2,51 @@
22
[![pypi package](https://img.shields.io/pypi/v/icse-unleash.svg)](https://pypi.org/project/icse-unleash/)
33
[![Build and test](https://github.com/LogIntelligence/UNLEASH/actions/workflows/build-and-test.yml/badge.svg)](https://github.com/LogIntelligence/UNLEASH/actions/workflows/build-and-test.yml)
44
[![Upload Python Package](https://github.com/LogIntelligence/UNLEASH/actions/workflows/python-publish.yml/badge.svg)](https://github.com/LogIntelligence/UNLEASH/actions/workflows/python-publish.yml)
5-
<!-- [![Downloads](https://static.pepy.tech/badge/icse-unleash)](https://pepy.tech/projects/icse-unleash) -->
5+
[![Archived](https://archive.softwareheritage.org/badge/origin/https://github.com/LogIntelligence/UNLEASH)](https://archive.softwareheritage.org/browse/origin/https://github.com/LogIntelligence/UNLEASH/)
66

77
__UNLEASH__ is a semantic-based log parsing framework. This repository includes artifacts for reuse and reproduction of experimental results presented in our ICSE'25 paper titled _"Unleashing the True Potential of Semantic-based Log Parsing with Pre-trained Language Models"_.
88

99
__Table of Contents__
10-
- [Repository Structure](#repository-structure)
11-
- [Installation Instruction](#installation-instruction)
10+
- [Purpose](#repository-structure)
11+
- [Provenacne](#provenance)
12+
- [Data](#data)
13+
- [Setup](#setup)
1214
- [Install Python 3.9](#install-python-39)
1315
- [Clone UNLEASH from GitHub](#clone-unleash-from-github)
1416
- [Create and activate a virtual environment](#create-and-activate-a-virtual-environment)
1517
- [Install UNLEASH from PyPI or Build from source](#install-unleash-from-pypi-or-build-from-source)
18+
- [Usage](#to-run-the-code)
1619
- [Test the installation](#test-the-installation)
17-
- [To run the code](#to-run-the-code)
18-
- [Reproducibility](#reproducibility)
19-
- [Parsing Performance](#parsing-performance)
20-
- [Scalability and Generalization](#scalability-and-generalization)
21-
- [Other Settings](#other-settings)
20+
- [Basic usage](#basic-usage)
21+
- [Run sampling for a specific dataset](#1-run-sampling-for-a-specific-dataset)
22+
- [Run UNLEASH on a specific dataset](#2-run-unleash-on-a-specific-dataset)
23+
- [Evaluate Unleash on a specific dataset](#3-evaluate-unleash-on-a-specific-dataset)
24+
- [Reproducibility](#reproducibility)
25+
- [Parsing Performance](#parsing-performance)
26+
- [Scalability and Generalization](#scalability-and-generalization)
27+
- [Other Settings](#other-settings)
2228
- [Download Paper](#download-paper)
2329
- [Citation](#citation)
2430
- [Contact](#contact)
2531

26-
## Repository Structure
32+
## Purpose
2733

28-
There are three main components in the repository:
29-
1. `datasets`: Contains the log datasets used in the experiments.
30-
2. `examples`: Contains the scripts to run the experiments.
31-
3. `unleash`: Contains the implementation of UNLEASH.
34+
The artifacts in this repository provides the UNLEASH tool along with the neccessary benchmarks and scripts, facilitating its reuse and enabling the replication of the associated study.
3235

33-
<details>
34-
<Summary>The main structure of the repository would look like this</Summary>
36+
## Provenance
37+
38+
Our artifacts are available via public archival repositories, including:
39+
- A copy of the paper is available at: [docs/paper/ICSE_25__Unleash.pdf](docs/paper/ICSE_25___Unleash.pdf).
40+
- The archival repository is available at: https://archive.softwareheritage.org/browse/origin/https://github.com/LogIntelligence/UNLEASH/.
41+
- The datasets are adopted from existing works, which are publicly available at: https://zenodo.org/record/8275861.
42+
43+
## Data
44+
45+
The datasets used in the study are publicly available at: https://zenodo.org/record/8275861. The storage requirements for the datasets are approximately 966 MB (compressed) and 13 GB (uncompressed) for 14 datasets.
3546

47+
During the operation of UNLEASH, the datasets will be automatically downloaded and extracted to the `datasets` folder by default. You can also download the datasets manually and extract them in the `datasets` folder. The datasets should be organized as follows:
3648
```
3749
📦 UNLEASH
38-
├─ LICENSE
39-
├─ README.md
4050
├─ datasets
4151
│  └─ loghub-2.0
4252
│     ├─ Apache
@@ -46,66 +56,11 @@ There are three main components in the repository:
4656
│     │  ├─ Apache_full.log_templates.csv
4757
│     │  └─ Apache_full.log_templates_corrected.csv
4858
│     ├─ ...
49-
├─ docs
50-
│  ├─ CL.png
51-
│  ├─ Ob2_res.png
52-
│  ├─ Ob3_res.png
53-
│  ├─ RESULTS.md
54-
│  └─ S_test_1.png
55-
├─ environment.yml
56-
├─ examples
57-
│  ├─ 01_sampling.py
58-
│  ├─ 02_run_unleash.py
59-
│  ├─ 03_evaluation.py
60-
│  ├─ benchmark.py
61-
│  └─ config.py
62-
├─ requirements.txt
63-
├─ setup.py
64-
├─ tests
65-
│  └─ test.py
66-
└─ unleash
67-
   ├─ __init__.py
68-
   ├─ arguments.py
69-
   ├─ data
70-
   │  ├─ __init__.py
71-
   │  ├─ data_loader.py
72-
   │  └─ utils.py
73-
   ├─ evaluation
74-
   │  ├─ settings.py
75-
   │  └─ utils
76-
   │     ├─ GA_calculator.py
77-
   │     ├─ PA_calculator.py
78-
   │     ├─ common.py
79-
   │     ├─ evaluator_main.py
80-
   │     ├─ oracle_template_correction.py
81-
   │     ├─ post_process.py
82-
   │     ├─ postprocess.py
83-
   │     └─ template_level_analysis.py
84-
   ├─ models
85-
   │  ├─ __init__.py
86-
   │  ├─ base.py
87-
   │  ├─ deberta.py
88-
   │  └─ roberta.py
89-
   ├─ parsing_base.py
90-
   ├─ parsing_cache.py
91-
   ├─ postprocess.py
92-
   ├─ sampling
93-
   │  ├─ __init__.py
94-
   │  ├─ entropy_sampling.py
95-
   │  ├─ lilac_sampling.py
96-
   │  ├─ logppt_sampling.py
97-
   │  └─ utils.py
98-
   └─ tuning
99-
      ├─ __init__.py
100-
      ├─ early_stopping.py
101-
      ├─ trainer.py
102-
      └─ utils.py
10359
```
104-
</details>
10560

10661

107-
## Installation Instruction
108-
The code is implemented in Python 3.9.
62+
## Setup
63+
The code is implemented in Python 3.9. We recommend using machines equipped with at least an 4-cores CPU, an 8GB **GPU**, 16GB RAM, and ~50GB available disk space with **Ubuntu 20.04** or **Ubuntu 22.04** to stably reproduce the experimental results in our paper. The full requirements to run the code can be found at [REQUIREMENTS.md](REQUIREMENTS.md).
10964

11065
### Install Python 3.9
11166
We recommend using Python 3.9+ to run the code.
@@ -138,6 +93,8 @@ pip install icse-unleash
13893
pip install -e .
13994
```
14095

96+
## Usage
97+
14198
### Test the installation
14299
```bash
143100
pytest tests/test.py
@@ -158,14 +115,16 @@ tests/test.py ... [100%
158115
```
159116
</details>
160117

161-
## To run the code
118+
119+
### Basic usage
120+
162121
To perform log parsing on a specific dataset, you need to set the `dataset` parameter and set the working directory to the `examples` folder.
163122
```bash
164123
export dataset=Apache
165124
cd examples
166125
```
167126

168-
### 1. Run sampling for a specific dataset
127+
#### 1. Run sampling for a specific dataset
169128
```bash
170129
python 01_sampling.py --dataset $dataset --sampling_method unleash
171130
```
@@ -213,7 +172,7 @@ Shot: 256 Coarse size: 25
213172
</details>
214173

215174

216-
### 2. Run UNLEASH on a specific dataset
175+
#### 2. Run UNLEASH on a specific dataset
217176
```bash
218177
python 02_run_unleash.py --log_file ../datasets/loghub-2.0/$dataset/${dataset}_full.log_structured.csv --model_name_or_path roberta-base --train_file ../datasets/loghub-2.0/$dataset/samples/unleash_32.json --validation_file ../datasets/loghub-2.0/$dataset/validation.json --dataset_name $dataset --parsing_num_processes 1 --output_dir ../results --max_train_steps 1000
219178
```
@@ -254,7 +213,7 @@ Parsing: 100%|██████████████████████
254213
```
255214
</details>
256215

257-
### 3. Evaluate Unleash on a specific dataset
216+
#### 3. Evaluate Unleash on a specific dataset
258217
```bash
259218
python 03_evaluation.py --output_dir ../results --dataset $dataset
260219
```
@@ -280,9 +239,9 @@ Template-level accuracy calculation done. [Time taken: 0.010]
280239
```
281240
</details>
282241

283-
## Reproducibility
242+
### Reproducibility
284243

285-
### Parsing Performance
244+
#### Parsing Performance
286245

287246
To reproduce the parsing performance, you can run the following command:
288247
```bash
@@ -292,7 +251,7 @@ bash benchmark.sh
292251

293252
The parsing accuracy (`parsing_accuracy.csv`) and parsing time (`time_cost.json`) will be saved in the corresponding folders in the `../results` directory (e.g., `../results/iteration_01/logs`).
294253

295-
### Scalability and Generalization
254+
#### Scalability and Generalization
296255

297256
- Scalability: The scalability of UNLEASH is reflected in the parsing time and accuracy with different numbers of parsing processes. To run UNLEASH with different numbers of parsing processes, you can set the `parsing_num_processes` parameter in the `02_run_unleash.py` script and run [Step 2](#2-run-unleash-on-a-specific-dataset) again:
298257
```bash
@@ -317,7 +276,7 @@ python 02_run_unleash.py --log_file ../datasets/loghub-2.0/$dataset/${dataset}_f
317276
python 02_run_unleash.py --log_file ../datasets/loghub-2.0/$dataset/${dataset}_full.log_structured.csv --model_name_or_path roberta-base --train_file ../datasets/loghub-2.0/$dataset/samples/unleash_$shot.json --validation_file ../datasets/loghub-2.0/$dataset/validation.json --dataset_name $dataset --parsing_num_processes 1 --output_dir ../results --max_train_steps 1000
318277
```
319278

320-
### Other Settings
279+
#### Other Settings
321280

322281
UNLEASH provides various settings to customize the parsing process. You can set the following **main parameters**:
323282
- For sampling (Step 1 - `01_sampling.py`):

REQUIREMENTS.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,9 @@
11
## Operating System & Hardware
2-
We recommend using machines equipped with at least 4 cores, 16GB RAM, and ~50GB available disk space with **Ubuntu 20.04** or **Ubuntu 22.04** to stably reproduce the experimental results in our paper.
2+
We recommend using machines equipped with at least an 4-cores CPU, an 8GB **GPU**, 16GB RAM, and ~50GB available disk space with **Ubuntu 20.04** or **Ubuntu 22.04** to stably reproduce the experimental results in our paper.
33

44
## Software
55

66
- We develop UNLEASH using Python 3.9 (UNLEASH is compatible with Python 3.9+)
77
- We encourage user to create a virtual environment to use UNLEASH (e.g., `python3.9 -m venv env`).
8-
- UNLEASH could be installed using pip. (e.g., `pip install unleash`).
9-
- UNLEASH requires the following packages: `pytorch`, `transformers`, `numpy`, `pandas`, `scikit-learn`, `pytest`, `tqdm`, `requests`, and `matplotlib`.
8+
- UNLEASH could be installed using pip. (e.g., `pip install icse-unleash`).
9+
- UNLEASH requires the following packages: `ipython`, `natsort`, `nltk`, `numpy`, `pandas`, `regex`, `scikit_learn`, `scipy`, `textdistance`, `torch`, `tqdm`, `transformers`, `datasets`, `pytest`. All of these packages will be automatically installed when you install UNLEASH.

0 commit comments

Comments
 (0)