This repository is designed to scrape historical version information for 10 major applications across both iOS and Android platforms.
The project workflow is divided into two major phases:
- Testing and validating available public scraping sources
- Running the actual scraping pipelines to generate structured datasets
The final output of the project is a formatted Excel spreadsheet containing application version history data.
The repository scrapes version history for:
- YouTube
- TikTok
- ChatGPT
- Claude
- CapCut
- Tinder
- Spotify
.
├── test_scrap.py
├── test_source_android.py
├── test_source_iOS.py
├── scraping_android.py
└── scraping_iOS.pyBasic iOS API validation script.
This script:
- tests the iTunes Lookup API
- verifies API access
- retrieves:
- app names
- developer information
- categories
- current versions
- release dates
- release notes
This is primarily used as a quick connectivity and metadata validation step.
Android source diagnostic script.
This file tests multiple Android scraping sources including:
- Google Play Store
- APKMirror
- APKPure
- AppBrain
- Apptopia
- Wayback Machine
- AppFollow
- AppShopper
The script evaluates:
- source accessibility
- version history availability
- release date availability
- release notes availability
This script is used for diagnostics only and does not generate the final dataset.
iOS source diagnostic script.
This file tests multiple iOS scraping sources including:
- iTunes API
- AppShopper
- AppAdvice
- AppAgg
- AppFollow
- AppPure
- iOSNoops
- Wayback Machine
- AppRaven
- AppMagic
- MobileAction
The script validates:
- historical version availability
- release notes availability
- release dates
- source quality
This script is only used for testing and diagnostics.
Main Android scraping pipeline.
This script:
- scrapes Android app version history
- combines multiple public data sources
- extracts release notes
- extracts release dates
- validates version numbers
- formats the final dataset
- exports the results into Excel
android_app_version_history_v3.xlsxMain iOS scraping pipeline.
This script:
- scrapes iOS app version history
- retrieves metadata from the iTunes API
- combines multiple historical sources
- extracts release notes
- extracts release dates
- structures and formats the dataset
- exports the results into Excel
ios_app_version_history_v3.xlsx- Python 3.9+
- Internet connection
- pip
python -m venv venv
source venv/bin/activatepython -m venv venv
venv\Scripts\activateRun the following command:
pip install pandas requests beautifulsoup4 lxml openpyxl tabulateRun:
python test_scrap.pyThis verifies:
- iTunes API access
- metadata retrieval
- JSON response structure
Expected result:
- JSON output printed in terminal
Run:
python test_source_android.pyThis script:
- checks Android scraping sources
- tests version history availability
- validates release notes and dates
- prints source diagnostics
Expected outputs:
- terminal diagnostic summary
- JSON diagnostic report
Run:
python test_source_iOS.pyThis script:
- validates iOS scraping sources
- checks historical version availability
- evaluates release notes and release dates
- prints diagnostic summaries
Expected outputs:
- terminal diagnostic summary
- JSON diagnostic report
Run:
python scraping_android.pyThe script will:
- scrape Android version history
- merge multiple sources
- clean and validate records
- generate a formatted Excel spreadsheet
Generated output:
android_app_version_history_v3.xlsxRun:
python scraping_iOS.pyThe script will:
- scrape iOS version history
- merge metadata from multiple sources
- clean and structure records
- generate a formatted Excel spreadsheet
Generated output:
ios_app_version_history_v3.xlsxThe final outputs are Excel spreadsheets containing structured application version history.
The spreadsheets include:
- App Name
- Platform
- Developer / Company
- App Category
- Version Number
- Version Release Date
- Current Version Indicator
- Initial App Release Date
- Update Description / Release Notes
- Update History Source
- Data Quality Notes
The spreadsheets are automatically formatted using openpyxl.
# Clone repository
git clone <repository-url>
# Enter repository
cd <repository-folder>
# Create virtual environment
python -m venv venv
# Activate virtual environment
# macOS / Linux
source venv/bin/activate
# Windows
venv\Scripts\activate
# Install dependencies
pip install pandas requests beautifulsoup4 lxml openpyxl tabulate
# Test API access
python test_scrap.py
# Test Android sources
python test_source_android.py
# Test iOS sources
python test_source_iOS.py
# Generate Android Excel output
python scraping_android.py
# Generate iOS Excel output
python scraping_iOS.py- Some public sources may rate-limit requests
- Certain websites may change their HTML structure over time
- Historical version coverage varies by source
- Release notes availability depends on source quality
- Some sources may become temporarily unavailable
After running the scraping scripts, the repository generates:
android_app_version_history_v3.xlsx
ios_app_version_history_v3.xlsxThese Excel files contain the structured historical version datasets for all supported applications.