Skip to content
This repository was archived by the owner on Jun 2, 2025. It is now read-only.
This repository was archived by the owner on Jun 2, 2025. It is now read-only.

What happened? Was there any interest or the OCR side? #5

@coolaj86

Description

@coolaj86

I'm curious as to what happened with this. It looks like you did an excellent job of architecture and documentation.

Update

I got the main Tracks database down to 800 KB (270 KB gzipped) using CSV as

ID  Title   Tags    Posted  Length  Bitrate FileName    FileSize    YouTube

I think I might do another 2 like that for Games + Songs and Artists / Composers and still keep the total data under 1mb, and provide a small snipped of JavaScript that fetches the 3 files and links them by numerical ID on the client side along with hard-coded numeric ids for Tags and the list of Mirrors.

Implementation Ideas / Notes

(for people who end up on this repo like I did)

Since there's less than 10,000 remixes + albums, and it's highly unlikely that there will be 100,000 items within our lifetimes, it would probably be simpler to

  • ship the entire database as a single JSON file for GET / filter / etc with per-IP rate limit (to encourage proper caching)
    • could be brute-force optimally precompressed w/ gzip, zstd, and brotli
    • could also use single-digit ids for relationships
      (like a typical db rather than a typical api, though I don't like this idea as much)
    • possibly use permanent, browser-level caching for ID ranges e.g. 1-2000, 2001-4000, and dynamic for newer IDs
    • could also use multiple CSVs rather than a single JSON... maybe
  • POST by ID for atomic updates
  • have a GET that only hands back updates since a last_updated_at parameter

Scraping

I'm going to give this a shot myself with a little help from Grok to save on the tedious HTML parsing.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions