Skip to content

BSB-publishing/usj2bsb

Repository files navigation

usj2bsb

Convert USJ (JSON) Bible files back to the BSB word-level interlinear TSV format used by bereanbible.com.

This is the reverse companion to bsb2usfm, which converts the TSV into USFM/USJ/USX.

Overview

The BSB interlinear table (bsb_tables.tsv) is a 23-column, word-level TSV where each row represents a single word aligned to the original Hebrew or Greek. The forward converter (bsb2usfm) produces USJ files from this table. This repository converts those USJ files back to TSV format and provides tools to merge, verify, and diff the results against the original source.

Two editions are supported:

Edition ID Scope Source
Berean Standard Bible BSB Full Bible (OT + NT) bereanbible.com
Majority Standard Bible MSB New Testament only majoritybible.com

Quick Start

1. Place USJ files in the edition directory

Copy one or more .usj files (even a single book or partial book) into the appropriate directory:

  • content_usj/bereanbible/ -- for BSB (OT or NT books)
  • content_usj/majoritybible/ -- for MSB (NT books only)

These can be generated by bsb2usfm or any tool that produces USJ output.

2. Run the pipeline

make all              # run pipeline for all editions that have content
make bereanbible      # download BSB source + convert + merge
make majoritybible    # download MSB source + convert + merge

make all automatically detects which edition directories contain .usj files and runs only those pipelines. You can also target a specific edition directly.

This automatically downloads and caches the source TSV (only re-downloading when the remote file has been updated), converts the USJ files to TSV, and runs the full merge pipeline. Output goes to output/bereanbible/ or output/majoritybible/.

Note: Running make majoritybible will warn about any OT books (which are not part of the MSB) and will fail if no NT books are found.

Pipeline steps

The Makefile runs these steps in order:

  1. Download source TSV into cache/<edition>/source.tsv
  2. Convert USJ to TSV (usj2tsv.py -> output/<edition>/output.tsv)
  3. Merge with source (merge_tsv.py -> output/<edition>/merged.tsv)
  4. Restore footnotes (merge_footnotes_from_tsv.py -> output/<edition>/merged_fn.tsv)
  5. Insert missing words (insert_missing_words_from_tsv.py -> output/<edition>/merged_full.tsv)

Manual verification (optional)

python3 verify_roundtrip.py output/bereanbible/merged_full.tsv content_usj/bereanbible/ --bsb2usfm ../bsb2usfm/bsb2usfm.py
python3 diff_fast.py output/bereanbible/merged_full.tsv cache/bereanbible/source.tsv
python3 diff_order.py output/bereanbible/merged_full.tsv cache/bereanbible/source.tsv

Scripts

Script Purpose
usj2tsv.py Main converter: reads USJ files and produces a 23-column TSV
merge_tsv.py Merges USJ-derived TSV with the original source TSV (source-driven)
merge_footnotes_from_tsv.py Restores original footnote styling (fq/fqa) from source TSV
insert_missing_words_from_tsv.py Re-inserts placeholder word rows absent from the USJ variant
verify_roundtrip.py Roundtrip verification: merged TSV -> bsb2usfm -> USJ -> diff
diff_fast.py Fast column-by-column diff between two TSV files
diff_order.py Verifies row ordering between two TSV files

TSV Column Structure

The 23-column format matches bsb_tables.tsv from bereanbible.com:

# Column Description
0 Heb Sort Hebrew word sort order
1 Greek Sort Greek word sort order
2 BSB Sort BSB translation word order
3 Verse Verse number within chapter
4 Language "Hebrew" or "Greek"
5 WLC / Nestle Base Original language text (plain)
6 WLC / Nestle Base (variants) Original language text with textual apparatus
7 Translit Transliteration
8 Parsing Short grammatical parsing
9 Parsing Long grammatical parsing
10 Str Heb Strong's Hebrew number
11 Str Grk Strong's Greek number
12 VerseId Full verse reference (e.g. "Genesis 1:1")
13 Hdg Section heading (HTML)
14 Crossref Cross-references (HTML)
15 Par Paragraph/formatting marker (HTML)
16 Space Spacing
17 begQ Opening quotation mark
18 BSB version English translation text
19 pnc Punctuation
20 endQ Closing quotation mark
21 footnotes Footnote text (HTML)
22 End text Additional end-of-verse text

Columns 0-11 (original-language data) are left blank by usj2tsv.py and filled in by merge_tsv.py.

Directory Structure

content_usj/
  bereanbible/       # user-provided BSB .usj files (git-ignored)
  majoritybible/     # user-provided MSB .usj files (git-ignored)
cache/
  bereanbible/       # cached bsb_tables.tsv (git-ignored)
  majoritybible/     # cached msb_nt_tables.tsv (git-ignored)
output/              # generated TSV files (git-ignored)

Make Targets

Target Description
make all Run pipeline for all editions that have content (default)
make help Show available targets and usage
make bereanbible Full pipeline for BSB edition
make majoritybible Full pipeline for MSB edition (NT only)
make clean Remove generated output files
make clean-cache Remove cached source files (forces re-download)

Requirements

  • Python 3.6+
  • No external dependencies (standard library only)
  • curl (for downloading source tables)
  • make (GNU Make)
  • For roundtrip verification: bsb2usfm and its usfmtc dependency

Related

License

  • Bible text (BSB/MSB): Public Domain
  • Software tools: MIT License

See LICENSE for details.

About

Convert USJ (JSON) Bible files back to the BSB word-level interlinear TSV format. Reverse companion to bsb2usfm.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors