This is my repository for the M2 coursework. The code for this project is in the src directory of this repository.
src/preprocessor.pycontains the code for preprocessing the data according to the LLMTIME scheme.src/postprocessor.pycontains the code for postprocessing the model prediction back into time series.src/qwen.pyloads the Qwen model and tokenizer from HuggingFace.src/FLOPS.pycontains the code for calculating the FLOPS of the model. I have computed the FLOPS for this project using thetotal_training_flopsfunction from theFLOPS.pyfile.src/eval_baseline.pywas used to generate predictions from the baseline model.src/eval_lora.pywas used to generate predictions from the LoRA model. This was modified for each of the fine-tuned models used in this coursework accordingly.
The results from the last two files are saved in the data directory along with the original time series data. The data is in the following format:
- *_losses.csv contains the loss values for the model during training.
- *_predictions.csv contains the predictions from the model.
- *_vallosses.csv contains the validation loss values for the model after training, computed batchwise over the validation set.
Additionally, I have included some Jupyter notebooks in the root directory. I used these notebooks to produce some of the graphs and tables in the report and to test the code.
- notebook1.ipynb contains some code for Question 2(a),(b) of the coursework.
- notebook2.ipynb contains some code for Question 3 of the coursework.
The report for this project is in pdf format and is located in the report directory.
Clone this GitLab repository to your local machine.
git clone https://gitlab.developers.cam.ac.uk/phy/data-intensive-science-mphil/assessments/m2_coursework/fm565.gitCreate a conda environment by running:
conda env create -f environment.ymlin the root directory of this repository.
This will create a new conda environment called M2Coursework. This will install the necessary packages for this project, listed in the requirements.txt file.
Activate the environment by running:
conda activate M2CourseworkThis may automatically create a Jupyter Kernel for the new environment. If not, you can create a kernel manually e.g.
python -m ipykernel install --user --name M2Coursework --display-name "M2Coursework"You should now be able to run the notebooks in this repository.
To deactivate the conda environment, run conda deactivate.
Microsoft Copilot was used in the following code:
- When generating predictions for the data, I used copilot to explain how to move the data to the GPU and how to use the
torchlibrary to generate predictions. It recommended the use ofmodel.generate()andtokenizer.batch_decode()to generate predictions and decode them. I modified this code to suit my needs. - Some of the docstrings in the code were generated using Copilot where it became tedious to write them out by hand. I modified these to suit my needs.
A declaration of the use of generative tools in writing the report is given in the report itself.