Repository: https://github.com/chirindaopensource/quantifying_semantic_shift_financial_nlp
Owner: 2025 Craig Chirinda (Open Source Projects)
This repository contains an independent, professional-grade Python implementation of the research methodology from the 2025 paper entitled "Quantifying Semantic Shift in Financial NLP: Robust Metrics for Market Prediction Stability" by:
- Zhongtian Sun
- Chenghao Xiao
- Anoushka Harit
- Jongmin Yu
The project provides a complete, end-to-end computational framework for replicating the paper's novel evaluation suite for financial NLP models. It delivers a modular, auditable, and extensible pipeline that executes the entire research workflow: from rigorous data validation and regime-based partitioning, through multi-architecture model training and feature engineering, to the computation of four novel diagnostic metrics and a comprehensive suite of analytical studies.
- Introduction
- Theoretical Background
- Features
- Methodology Implemented
- Core Components (Notebook Structure)
- Key Callables
- Prerequisites
- Installation
- Input Data Structure
- Usage
- Output Structure
- Project Structure
- Customization
- Contributing
- Recommended Extensions
- License
- Citation
- Acknowledgments
This project provides a Python implementation of the methodologies presented in the 2025 paper "Quantifying Semantic Shift in Financial NLP: Robust Metrics for Market Prediction Stability." The core of this repository is the iPython Notebook quantifying_semantic_shift_financial_nlp_draft.ipynb, which contains a comprehensive suite of functions to replicate the paper's findings, from initial data validation to the final generation of all analytical tables and figures.
The paper introduces a structured evaluation framework to quantify the robustness of financial NLP models under the stress of macroeconomic regime shifts. It argues that standard metrics like MSE are insufficient and proposes four complementary diagnostic metrics to provide a multi-faceted view of model stability. This codebase operationalizes this advanced evaluation suite, allowing users to:
- Rigorously validate and cleanse time-series financial news and market data.
- Systematically partition data into distinct macroeconomic regimes (e.g., Pre-COVID, COVID).
- Perform chronological train-validation-test splits to prevent lookahead bias.
- Train multiple model architectures (LSTM, Text Transformer, Feature-Enhanced MLP) on a per-regime basis.
- Compute the four novel diagnostic metrics: FCAS, PCS, TSV, and NLICS.
- Quantify semantic drift between regimes using Jensen-Shannon Divergence.
- Conduct a full suite of analyses, including case studies, ablation studies, and cross-sector generalization tests.
The implemented methods are grounded in time-series econometrics, natural language processing, and deep learning.
1. Regime-Based Evaluation:
The framework's foundation is the acknowledgment that financial markets are non-stationary. The data-generating process changes over time, particularly during major economic events. The methodology explicitly partitions the data into distinct macroeconomic regimes,
2. The Four Diagnostic Metrics: The paper introduces four metrics to create a "Robustness Profile" beyond simple prediction error:
- Financial Causal Attribution Score (FCAS): Measures if a model's prediction direction aligns with simple causal keywords in the source text. $$ \text{FCAS} = \mathbb{E}[\mathbb{I}(\text{sign}(\text{prediction}) = \text{sign}(\text{causal_cue}))] $$
- Patent Cliff Sensitivity (PCS): Measures the magnitude of change in a model's prediction when the input text is subjected to a controlled semantic perturbation (e.g., "growth" -> "decline"). $$ \text{PCS} = \mathbb{E}[|f_\theta(\mathbf{x}) - f_\theta(\tilde{\mathbf{x}})|] $$
- Temporal Semantic Volatility (TSV): Measures the drift in the underlying meaning of the text corpus over time, calculated as the average Euclidean distance between embeddings of consecutive news articles. $$ \text{TSV} = \frac{1}{N-1} \sum_{i=1}^{N-1} |\phi(\mathbf{x}_{i+1}) - \phi(\mathbf{x}_i)|_2 $$
- NLI-based Logical Consistency Score (NLICS): Uses a large language model (LLM) to perform Natural Language Inference, assessing whether the model's prediction is a logical entailment of the source news text. $$ \text{NLICS} = \mathbb{E}[\text{EntailmentScore}(\text{text}, \text{Hypothesis}(\text{prediction}))] $$
3. Semantic Drift Quantification: The linguistic shift between any two regimes is quantified using the Jensen-Shannon (J-S) Divergence between their respective vocabulary probability distributions. This provides a formal measure of how much the language used in financial news has changed. $$ D_{JS}(P, Q) = \frac{1}{2}D_{KL}(P || M) + \frac{1}{2}D_{KL}(Q || M), \quad M = \frac{1}{2}(P+Q) $$
The provided iPython Notebook (quantifying_semantic_shift_financial_nlp_draft.ipynb) implements the full research pipeline, including:
- Modular, Multi-Phase Architecture: The entire pipeline is broken down into 35 distinct, modular tasks, each with its own orchestrator function, covering validation, partitioning, feature engineering, training, inference, and a full suite of analyses.
- Configuration-Driven Design: All experimental parameters are managed in an external
config.yamlfile, allowing for easy customization and replication without code changes. - Multi-Architecture Support: Complete training and evaluation pipelines for three distinct model types: a baseline LSTM, a fine-tuned Text Transformer (DistilBERT), and a hybrid Feature-Enhanced MLP.
- Idempotent and Resumable Pipelines: Computationally expensive steps, such as model training and LLM-based evaluations, are designed to be idempotent (resumable), saving checkpoints and caching results to prevent loss of progress and redundant computation.
- Production-Grade Metric Implementation: Includes a highly performant, asynchronous, and cached implementation for the NLICS metric and a full-pipeline replication for the computationally intensive PCS metric.
- Comprehensive Analysis Suite: Implements all analyses from the paper, including J-S divergence, t-SNE visualization, stock-specific case studies, control experiments, and a full N x N cross-sector generalization matrix.
- Automated Reporting: Programmatic generation of all key tables and figures from the paper, as well as a final, synthesized analytical report.
The core analytical steps directly implement the methodology from the paper:
- Validation & Cleansing (Tasks 1-3): Ingests and rigorously validates the raw data and
config.yaml, performs a deep data quality audit, and standardizes all data. - Data Partitioning (Tasks 4-6): Partitions the data by macroeconomic regime and performs chronological train/val/test splits.
- Feature Engineering (Tasks 7-9): Generates TF-IDF, sentence embedding, and combined feature sets.
- Model Training (Tasks 10-15): Orchestrates the training of all 12 model-regime pairs with early stopping and checkpointing.
- Inference & Evaluation (Tasks 16-24): Generates predictions on all test sets and computes the full suite of five performance and diagnostic metrics.
- Analysis & Ablation (Tasks 25-35): Executes all higher-level analyses, including semantic drift calculation, visualizations, case studies, and ablation studies.
The quantifying_semantic_shift_financial_nlp_draft.ipynb notebook is structured as a logical pipeline with modular orchestrator functions for each of the major tasks. All functions are self-contained, fully documented with type hints and docstrings, and designed for professional-grade execution.
The project is designed around a single, top-level user-facing interface function:
execute_quantifying_semantic_shift_study: This master orchestrator function runs the entire automated research pipeline from end-to-end. It handles all data processing, model training, and analysis. It also generates the necessary files for the optional, human-in-the-loop entailment model comparison. A single call to this function reproduces the entire computational portion of the project.
- Python 3.9+
- An OpenAI API key set as an environment variable (
OPENAI_API_KEY) for the NLICS metric. - Core dependencies:
pandas,numpy,scipy,scikit-learn,pyyaml,torch,transformers,sentence-transformers,openai,matplotlib,seaborn,tqdm,ipython.
-
Clone the repository:
git clone https://github.com/chirindaopensource/quantifying_semantic_shift_financial_nlp.git cd quantifying_semantic_shift_financial_nlp -
Create and activate a virtual environment (recommended):
python -m venv venv source venv/bin/activate # On Windows, use `venv\Scripts\activate`
-
Install Python dependencies:
pip install -r requirements.txt
-
Set Environment Variable:
export OPENAI_API_KEY="your_api_key_here"
The pipeline requires a single pandas.DataFrame and a config.yaml file. The script includes a helper function to generate a synthetic, structurally correct DataFrame for testing purposes. The required schema is:
- Index: A
pandas.MultiIndexwith three levels:date(DatetimeIndex): The trading date.ticker(object): The stock ticker.sector(object): The GICS sector.
- Columns:
Open,High,Low,Close,Adj Close(float64): Standard market data.Volume(int64): Daily trading volume.aggregated_text(object/str): Concatenated daily news text. An empty string is a valid value.target_return(float64): The forward-looking, next-day adjusted close return.
The quantifying_semantic_shift_financial_nlp_draft.ipynb notebook provides a complete, step-by-step guide. The primary workflow is to call the top-level orchestrator from a main.py script or the final cell of the notebook:
# main.py
from pathlib import Path
import pandas as pd
import yaml
# Assuming all pipeline functions are in `pipeline.py`
from pipeline import execute_quantifying_semantic_shift_study
# Load configuration
with open("config.yaml", 'r') as f:
study_config = yaml.safe_load(f)
# Load data (or generate synthetic data)
raw_df = pd.read_pickle("data/financial_data.pkl")
# Run the entire study
final_artifacts = execute_quantifying_semantic_shift_study(
raw_df=raw_df,
study_config=study_config
)The execute_quantifying_semantic_shift_study function creates a results/ directory and returns a dictionary of artifact paths:
{
'data_splits': Path('results/data_splits.pkl'),
'training_results': Path('results/training_results.pkl'),
'enriched_predictions': Path('results/enriched_predictions.pkl'),
'robustness_profile': Path('results/robustness_profile.csv'),
'js_divergence_matrix': Path('results/js_divergence_matrix.csv'),
'nli_benchmark_for_annotation': Path('results/nli_benchmark_for_annotation.csv'),
...
}
quantifying_semantic_shift_financial_nlp/
│
├── quantifying_semantic_shift_financial_nlp_draft.ipynb # Main implementation notebook
├── config.yaml # Master configuration file
├── requirements.txt # Python package dependencies
├── LICENSE # MIT license file
└── README.md # This documentation file
The pipeline is highly customizable via the config.yaml file. Users can easily modify all experimental parameters, including regime dates, model architectures, feature engineering settings, and LLM prompts, without altering the core Python code.
Contributions are welcome. Please fork the repository, create a feature branch, and submit a pull request with a clear description of your changes. Adherence to PEP 8, type hinting, and comprehensive docstrings is required.
Future extensions could include:
- Additional Model Architectures: Integrating other models like FinBERT or more advanced transformer architectures.
- Alternative Diagnostic Metrics: Implementing other measures of model robustness, such as influence functions or prediction confidence calibration.
- Automated Retraining Triggers: Building a system that uses the computed drift metrics (like TSV or J-S Divergence) to automatically trigger model retraining when a significant regime shift is detected.
- Dynamic Feature Selection: Exploring methods for dynamically adjusting feature importance based on the detected market regime.
This project is licensed under the MIT License. See the LICENSE file for details.
If you use this code or the methodology in your research, please cite the original paper:
@inproceedings{sun2025quantifying,
author = {Sun, Zhongtian and Xiao, Chenghao and Harit, Anoushka and Yu, Jongmin},
title = {Quantifying Semantic Shift in Financial NLP: Robust Metrics for Market Prediction Stability},
booktitle = {Proceedings of the 6th ACM International Conference on AI in Finance},
series = {ICAIF '25},
year = {2025},
publisher = {ACM}
}For the implementation itself, you may cite this repository:
Chirinda, C. (2025). A Professional-Grade Implementation of the "Quantifying Semantic Shift" Framework.
GitHub repository: https://github.com/chirindaopensource/quantifying_semantic_shift_financial_nlp
- Credit to Zhongtian Sun, Chenghao Xiao, Anoushka Harit, and Jongmin Yu for the foundational research that forms the entire basis for this computational replication.
- This project is built upon the exceptional tools provided by the open-source community. Sincere thanks to the developers of the scientific Python ecosystem, including Pandas, NumPy, SciPy, Scikit-learn, PyTorch, HuggingFace, and Jupyter, whose work makes complex computational analysis accessible and robust.
--
This README was generated based on the structure and content of quantifying_semantic_shift_financial_nlp_draft.ipynb and follows best practices for research software documentation.