paper2data is a small local Linux app for extracting calibrated data from
vector plots in PDF papers. It renders PDF pages as SVG, lets you click the
actual SVG paths and tick marks, and exports two-column .dat files or CSV.
It is intentionally SVG-only. If a paper embeds a plot as a raster image, there will be no selectable curve path to extract.
On Ubuntu/Debian-like Linux systems:
sudo apt install nodejs npm mupdf-tools poppler-utils
git clone https://github.com/liambern/paper2data.git
cd paper2data
npm startOpen:
http://localhost:5177
The app has no npm package dependencies. It needs:
nodefor the local servermutoolfrommupdf-toolsto render PDF pages as SVGpdfinfofrompoppler-utilsto read page counts
If your system package manager provides an old Node version, install Node 20+
or newer and rerun npm start.
- Click
Open PDF. - Pick the page containing the plot.
- Use mouse wheel zoom and
Panmode to focus on the plot. - In
Selectmode, hover SVG paths to highlight them and click the curve you want. Shift-click adds multiple path segments. - In
Calibratemode, clickX1,X2,Y1, andY2tick marks. When you click a selectable SVG tick/path, paper2data uses the center of that SVG object rather than the raw mouse point. - Enter the numeric axis values for
X1,X2,Y1, andY2. - Set optional
Points,X min, andX max. - Export
DATorCSV.
DAT output is the default and contains exactly two whitespace-separated columns with no header:
x y
CSV output contains:
x,y
Step controls the internal SVG-path sampling density. Points, when set,
resamples the selected curve to exactly that many evenly spaced x-values over
the requested range.
- Best results come from PDF figures that preserve plot curves as vector paths.
- Multi-line plots may represent one visible curve as several SVG path segments; use Shift-click and the selected-path list when needed.
- The app runs locally. Uploaded PDFs stay on your machine in
.paper2data/.
