This repository contains the code for Experiment 2 of a research project on prefix-based session type prediction. The experiment tests whether enriching early session prefixes with temporal intervals (delta t) and global context (device, geo, traffic source) improves session classification, under a unified evaluation protocol across three model families: higher-order Markov chains, LightGBM with engineered features (plus SHAP analysis), and SASRec with temporal/contextual ablations.
Large artifacts (datasets, trained models, prediction files) are not stored in GitHub. The notebook writes artifacts to a local folder (/content/exp2_artifacts in Colab) and can optionally back them up to a Google Cloud Storage (GCS) bucket.
All large artifacts (datasets, trained models, prediction files) are not stored in GitHub. They are available via Google Drive – see the Reproducibility section below.
.
├── exp2.ipynb # Main notebook with the full experiment pipeline
├── README.md # This file
└── requirements.txt # Python dependencies
Data:
- BigQuery public GA4 e-commerce sample:
bigquery-public-data.ga4_obfuscated_sample_ecommerce.events_* - 4,295,584 events, aggregated into 360,129 sessions
- Session key: (
user_pseudo_id,ga_session_id) - Temporal split by
session_end_ts(70/15/15): 252,090 train, 54,019 val, 54,020 test
Task:
- Given the first
tevents of a session, predict the final session type (prefix-level classification) - Prefix lengths:
t = 1..43whereT_MAX = 43(p95 train session length) - Test evaluation covers 54,020 sessions and 497,852 prefixes
Labels (rule-based hierarchy, assigned from the full session):
- Buyer: at least one
purchase - Intent: at least one of
add_to_cart,begin_checkout,add_payment_info,add_shipping_info(but no purchase) - Researcher: no Buyer/Intent signals, but at least 3 product interactions from
{view_item, view_item_list, select_item} - Browser: all remaining sessions
Label distribution (session-level):
- Browser 88.81%, Researcher 5.54%, Intent 4.30%, Buyer 1.35%
Models:
- Markov-2 / Markov-3 (count-based higher-order Markov baselines)
- LightGBM (engineered prefix features, weighted training, SHAP analysis)
- SASRec transformer with ablations: Base, +Time, +Context, +Time+Context
Metric:
- Macro-F1 (unweighted mean over the 4 classes), computed on all test prefixes
Aggregate results below correspond to the representative prediction artifacts used for cross-model diagnostics (Section 7). SASRec variants use the best-seed run selected by validation Macro-F1 (out of 3 seeds).
| Model | Test Macro-F1 | Buyer F1 | Intent F1 | Researcher F1 | Browser F1 |
|---|---|---|---|---|---|
| SASRec +Time+Context | 0.5759 | 0.3334 | 0.4819 | 0.5931 | 0.8952 |
| SASRec Base | 0.5731 | 0.2857 | 0.5207 | 0.5911 | 0.8948 |
| SASRec +Time | 0.5729 | 0.3078 | 0.5100 | 0.5804 | 0.8936 |
| SASRec +Context | 0.5718 | 0.3454 | 0.4569 | 0.5890 | 0.8959 |
| LightGBM | 0.5632 | 0.2974 | 0.4875 | 0.5772 | 0.8906 |
| Markov-3 | 0.4221 | 0.1106 | 0.2843 | 0.4417 | 0.8518 |
| Markov-2 | 0.3794 | 0.0572 | 0.2372 | 0.3857 | 0.8372 |
SASRec ablation stability across 3 seeds (42, 43, 44), mean +/- std:
| Variant | Val Macro-F1 (mean +/- std) | Test Macro-F1 (mean +/- std) |
|---|---|---|
| Base | 0.5660 +/- 0.0069 | 0.5668 +/- 0.0047 |
| +Time | 0.5699 +/- 0.0073 | 0.5731 +/- 0.0021 |
| +Context | 0.5673 +/- 0.0004 | 0.5694 +/- 0.0019 |
| +Time+Context | 0.5675 +/- 0.0026 | 0.5688 +/- 0.0059 |
- SASRec variants lead overall, with small differences between Base, +Time, +Context, and +Time+Context.
- LightGBM is competitive but below SASRec on Macro-F1.
- Markov baselines perform substantially worse and rely heavily on Browser predictions.
- Buyer remains the hardest class across all models, and the main non-Browser boundary is Buyer-Intent ambiguity.
The notebook writes artifacts to a fixed local directory:
- Colab default:
/content/exp2_artifacts/
Optionally, the notebook also uploads timestamped backups to a GCS bucket (gs://<your-bucket>/) for versioned recovery.
Data:
sessions.parquet,split_boundaries.jsonprefix_train.parquet,prefix_val.parquet,prefix_test.parquet(LightGBM feature matrices)
Models:
markov_best_alphas.jsonlgbm_final.pkl,lgbm_best_params.jsonsasrec_{variant}_seeds3.pt,sasrec_ablation_summary.json
Predictions:
markov2_test_predictions.parquet,markov3_test_predictions.parquetlgbm_test_predictions.parquetsasrec_{variant}_seed{seed}_test_predictions.parquetsasrec_{variant}_bestseed_test_predictions.parquetconfusion_matrices.npz
- Clone the repository.
- Install dependencies:
pip install -r requirements.txt
- Open
exp2.ipynbin Jupyter or Colab. - If you want to pull raw data from BigQuery and/or upload backups to GCS, authenticate with Google Cloud in your environment and set the config variables in Section 1.1 (
PROJECT_ID,GCS_BUCKET).
A full rerun executes all cells sequentially. GPU is required for SASRec training in Section 6.
Fixed constants used in the notebook:
T_MAX = 43- SASRec
N_RUNS = 3, seeds[42, 43, 44] - SASRec:
HIDDEN_DIM=64,BATCH_SIZE=512,dropout=0.1,lr=1e-3,MAX_EPOCHS=30,PATIENCE=3, Adam (no weight decay), left-pad toT_MAX - Optuna:
TPESampler(seed=42),n_jobs=1 - LightGBM final fit:
n_jobs=1 - Label mapping: Buyer=0, Intent=1, Researcher=2, Browser=3
MIT
- Google Cloud for the public GA4 sample dataset.
- The open‑source community for the incredible tools used in this work.