ॐ  ·  IMI Library Project

A Living Archive of
Timeless Teachings

Complete project roadmap — workflow design, tooling decisions, phased implementation, and critical gaps for building a permanent, searchable, public-ready library.

45+Years of Teachings
~3kEdited Videos (Priority)
12kMaster Records (Long-term)
6Pipeline Steps
18moFull Timeline

The Core Pipeline

Six sequential steps from recordings to a searchable library entry. Immediate priority: ingest the ~3,000 recently created edited videos into the system. The 12,000 master recordings in FileMaker/Dropbox are a longer-term effort involving transcoding and renaming at scale.

1
Transcode to MP4
Convert all incoming formats (DAT, DVD, AVI, MOV, etc.) to a standardised H.264 MP4 container. Apply canonical file naming at this stage: YYYY-MM-DD_[type]_[location]_[seq].mp4
FFmpeg (batch scripts) HandBrake (GUI fallback)
⚠ Rename every file to the canonical convention at this step — retroactively renaming thousands of records later is painful.
2
Extract Audio
Strip audio track from MP4 to high-quality MP3 for transcription. Browser-based GUI with real-time progress, batch processing, and completion reports. Runs entirely on the user's machine — no cloud uploads.
FFmpeg Python + Flask Browser GUI (localhost)
💡 Optionally apply light noise reduction at this stage (Audacity's RNNoise or iZotope RX) on poor-quality recordings before transcription — it measurably improves accuracy.
System Design & Data Flow
System Architecture

Everything runs locally on the user's machine. The only external touchpoints are the launch page (shyamgyaan.com) and GitHub (source code). No audio or video data leaves the machine.

Cloud (minimal)
shyamgyaan.com
Launch page & OS detection
 + 
GitHub
Source code & CI
↓ downloads launcher & latest code ↓
User's Machine
Launcher
Setup & start
Python Server
localhost
Browser GUI
HTML/CSS/JS
FFmpeg
Audio extraction
MP3 Files
Local disk
Data Flow (Single File)
MP4
Validate
FFmpeg
Extract audio
MP3
Save to target
Report
Log result
Invalid files (corrupted, no audio, output exists) are logged and skipped. Batch continues.
User Journey
Visit Site
OS detected
Download
Launcher
Auto Setup
~1 min first time
Select Folder
Configure
Process
Watch progress
Report
Download .txt
Returning users: run launcher (~5 sec), always gets latest version.
Requirements (6 issues)
#4 — Audio Extraction GUI
High
Browser-based GUI with folder picker for source directory. Display MP4 file count. Option to output to same or different folder. Transcription checkbox. Start button. Controls disabled during processing.
#5 — FFmpeg MP4 to MP3 Engine
High
Extract audio from each MP4 using FFmpeg. Save as MP3 (320kbps, best quality). Same filename with .mp3 extension. Skip corrupted files, files with no audio, and existing outputs — log each with descriptive message.
#6 — Progress Display (Batch + File)
High
Two progress bars: batch level ("3 of 47") and file level (0% → 100% based on video duration). Show current filename, elapsed time, per-file status. Real-time updates.
#7 — Completion Report
High
Summary report in GUI: total files, successes, skips, errors. Per-file status with messages. "Download Report" button saves as .txt. Includes timestamp, folder paths, and all details.
#8 — Auto-Transcription Trigger (Stub)
Medium
When transcription checkbox is enabled, trigger job after each successful extraction. Runs in parallel with continued extraction. Stubbed for now — logs "Transcription triggered for [filename]". Status in progress display and report.
#9 — Web Launcher
High
Launch page on shyamgyaan.com detects OS and browser. Downloads platform-specific launcher script. Installs Python + FFmpeg if missing. Pulls latest code. Starts local server and opens browser. Blocks mobile users.
Requirements (6 issues)
#10 — Performance
High
  • 1-hour MP4 extracted in under 60 seconds
  • Max 2GB RAM usage, with monitoring
  • CPU-aware parallel processing (never 100% cores)
  • Disk space check before starting, monitor during
#11 — Reliability
High
  • Crash recovery via checkpoint file — resume from where it left off
  • Idempotent — skip files that already have MP3 output
  • Works fully offline (extraction only)
  • Single file failure never crashes the batch
#12 — Usability
High
  • Plain English error messages — no stack traces
  • First-time setup under 2 minutes
  • Keyboard navigable, screen reader labels, WCAG AA contrast
#13 — Security & Privacy
High
  • No data leaves the machine during extraction
  • No telemetry without explicit consent
  • Launcher script is human-readable and auditable
  • App only accesses user-selected folders
#14 — Compatibility
High
  • macOS 12+, Windows 10/11
  • Chrome, Firefox, Safari (latest 2 versions)
  • Files up to 10GB, Unicode filenames, up to 100 files per folder
  • Mobile explicitly blocked at launch page
#15 — Logging & Diagnostics
Medium
  • Timestamped log file per run in target folder
  • GUI shows plain English; log captures full technical detail
  • "Share Log" button for easy troubleshooting
3
Transcribe with Diarization
Submit audio to transcription API. Must produce: timestamped transcript, speaker identification (diarization), auto-chapters, topic detection. Submit custom Sanskrit/spiritual vocabulary glossary to every service used.
AssemblyAI ✦ Recommended Whisper + Pyannote (local / private) Deepgram
⚠ Sanskrit terms, Hindi-English mixing, and yogic vocabulary will degrade accuracy on all tools. A custom vocabulary glossary + mandatory human review pass is non-negotiable.
4
AI Post-Processing & Enrichment
Pass transcript text through an LLM to generate structured metadata: segment summaries, controlled-vocabulary topic tags, key quotes, estimated quality score, and speaker identification confirmation.
Claude API GPT-4o (fallback)
💡 Use a structured JSON prompt so the LLM returns consistently parseable output that flows directly into the database ingest step.
5
Human Review Queue
A team member reviews AI-generated transcript, summaries, and tags before the record is marked "published." Corrects Sanskrit errors, adds contextual metadata (location, series, occasion), flags quality issues.
Airtable Review View Custom review interface (later)
⚠ Do not skip this step. At ~10 min/recording, 2,000 recordings = 333 hours of work. Staff this as an ongoing role, not a one-time task.
6
Ingest to Library Database
Approved records are written to the database with full metadata: transcript, timestamps, tags, file references (master + edits), quality, type (satsang / tea / trip / series), and access tier.
Airtable API (short-term) Supabase / PostgreSQL (long-term)
💡 Store the parent-child relationship between master recordings and edits explicitly in the schema from day one.

Phased Roadmap

Three phases over 18 months — from foundational infrastructure to internal tools at scale. A potential public access phase is outlined below but is subject to a future team decision and is not assumed.

Phase 1 — Foundation
Build the Base
0 – 3 months
  • Define and document the master metadata schema (field names, controlled vocabularies)
  • Establish canonical file naming conventions
  • Priority: Ingest ~3,000 recently created edited videos into the system first
  • Build FFmpeg batch transcoding + audio extraction scripts
  • Test AssemblyAI on 50 diverse recordings; build Sanskrit/spiritual vocabulary glossary
  • Build n8n workflow: transcription → LLM summary → Airtable draft record
  • Create human review queue in Airtable
  • Define quality tiers; triage recordings before processing
  • Later: Migrate 12k master records from FileMaker Pro; transcode and rename masters at scale
Phase 2 — Scale
Process the Backlog
3 – 9 months
  • Run full backlog of recordings through the automated pipeline
  • Migrate master files from Dropbox to object storage (Backblaze B2 or AWS S3)
  • Add Typesense or Algolia for full-text transcript search across all records
  • Build internal search and browse interface for the team
  • Implement master → edit parent-child schema; begin cataloguing edits
  • Add public/private/restricted access flags to all records
  • Ongoing human review staffing — target 200+ records/month reviewed
Phase 3 — Public Access To Be Determined
Open the Doors?
9 – 18 months
⚠ This phase is exploratory only. Whether the archive becomes publicly accessible is a future decision that requires explicit approval from project stakeholders. Nothing below is assumed or committed — it is included here only to illustrate what a public phase could look like if the team decides to pursue it.
  • ?Launch public library on Vimeo OTT or Uscreen (fastest path to market)
  • ?Implement user accounts, access tiers (free / subscription / pay-per-view)
  • ?Sync metadata from internal DB to public platform
  • ?Expose public search across transcripts and topics
  • ?Review all records for public/private designation before launch
  • ?Evaluate custom platform build: Supabase + Mux + Stripe (based on usage/revenue)
  • ?Migrate to custom platform if/when needed for full control

Key Decisions

The major architectural and tooling choices — with recommendations and trade-offs clearly laid out.

🗄️
Primary Database Where records live; team collaboration; API access for automation
FileMaker Pro (current)
Expensive licensing, poor API ecosystem, limited web integration. Not suitable going forward.
Airtable Short-Term ✦
Visual, collaborative, strong API, handles 12k+ records. Non-technical team members can use it immediately.
Supabase (PostgreSQL) Long-Term ✦
Scalable, open source, built-in auth — required for the public-facing system with user accounts.
Omeka
Purpose-built for digital archives. Open source. Less modern UI; worth evaluating if developer resources are limited long-term.
🎙️
Transcription Service Accuracy, diarization, topic detection, Sanskrit vocabulary support
AssemblyAI Recommended ✦
Transcription + speaker diarization + auto-chapters + topic detection in one API. Best for archival batch work.
Whisper + Pyannote
Open source, runs locally (privacy+cost advantage). Best accuracy, but requires more setup; no built-in summaries.
Deepgram
Very fast, good accuracy. Better suited for real-time use cases; AssemblyAI edges it for archival batch work.
🔍
Full-Text Search Layer Searching across thousands of transcripts by keyword, topic, quote
Typesense Recommended ✦
Open source, self-hostable, fast. Excellent for transcript search. Lower cost than Algolia at scale.
Algolia
Best-in-class search UX, generous free tier. Easier to get started, but gets expensive as the corpus grows.
Postgres Full-Text (Supabase)
No separate service needed once on Supabase. Good for moderate scale; doesn't match dedicated search services.
🎬
Public Streaming & Commerce User access, subscriptions, pay-per-view, video delivery
Vimeo OTT / Uscreen Launch ✦
Purpose-built for streaming libraries with commerce. Fastest path to public access. Use while learning user needs.
Supabase + Mux + Stripe
Full custom control. 3–6 month build. Migrate to this once revenue and user base justify the investment.
⚙️
Workflow Automation Orchestrator Connecting transcription → LLM → DB without heavy custom code
n8n Recommended ✦
Open source, self-hostable, visual workflow builder. Connects Dropbox, AssemblyAI, Claude, Airtable without heavy coding. Maintainable by non-developers.
Make (formerly Integromat)
Cloud-hosted visual automation. Easier setup than n8n, but recurring cost and less control.
Custom Python Scripts
Maximum flexibility. Best if you have a dedicated developer. Highest long-term maintenance burden.
☁️
Master File Storage Where the actual MP4, audio, and transcript files live at scale
Dropbox (current)
Not designed as a media archive at this scale. Expensive per GB, limited API for direct processing pipelines.
Backblaze B2 Recommended ✦
Very low cost per GB (~$6/TB/mo), S3-compatible API, reliable. Best price/performance for large video archives.
AWS S3
Industry standard, deep integrations. More expensive than B2 but integrates seamlessly with the broader AWS ecosystem.
Cloudflare R2
No egress fees — excellent if you're serving video directly. Worth considering for the public streaming phase.

Full Tech Stack

Every tool in the recommended stack, organized by function. Short-term choices prioritize speed; long-term choices prioritize scale and control.

File Processing
FFmpeg
Transcoding, audio extraction, batch scripts
HandBrake
GUI fallback for transcoding
iZotope RX / Audacity
Audio cleanup before transcription
Transcription & NLP
AssemblyAI
Primary: transcript + diarization + topics
Whisper (OpenAI)
Local/private processing option
Pyannote
Speaker diarization (if using Whisper)
AI Enrichment
Claude API
Summaries, tagging, structured metadata
Custom vocabulary glossary
Sanskrit / yogic terms submitted to all APIs
Automation
n8n
Visual workflow orchestration
Make
Cloud automation fallback
Dropbox API
Trigger pipeline on new file uploads
Database (Short-term)
Airtable
Main database + review queue + team views
Airtable API
Automated record creation from pipeline
Database (Long-term)
Supabase
PostgreSQL + auth + real-time + REST API
Typesense
Full-text transcript search layer
File Storage
Backblaze B2
Master file storage (video, audio, transcripts)
Cloudflare R2
Public delivery (no egress fees)
Dropbox
Team sync layer (not primary archive)
Public Platform
Vimeo OTT / Uscreen
Launch: streaming + commerce + user accounts
Mux
Custom build: adaptive video streaming
Stripe
Custom build: subscriptions + payments

Critical Gaps

Things not explicitly covered in the original workflow that must be addressed before scaling. These will cause expensive problems if ignored.

1
No Master Metadata Schema Defined
Before ingesting a single record, the team must agree on and document: canonical field names, controlled vocabularies for tags, recording types, locations, speaker names, and quality tiers. Without this, 12,000 records will be inconsistently tagged. This is the single most important action to take today.
2
File Naming Convention Not Established
Establish a canonical naming convention immediately (e.g., YYYY-MM-DD_satsang_main-hall_001.mp4) and rename all master files before processing begins. This anchors every downstream record to a predictable, unique identifier.
3
Master → Edit Relationship Not Modelled
The database schema must explicitly represent the parent-child relationship between a master recording and its edited versions, with edit metadata (who edited, when, what changed, why). Edits are a "living" part of the system and need a dedicated schema from day one.
4
No Quality Triage Step
Some recordings are degraded — poor audio, background noise, partially inaudible. There is no step in the current workflow to assess recording quality before committing transcription resources. A triage pass (even a simple 3-tier quality flag) should happen early.
5
Public vs. Private Access Not Addressed
Some recordings may contain personal conversations or content not intended for public release. A public/private/restricted access field must be added to the schema now — retroactively reviewing thousands of records before a public launch will be a major bottleneck if not done incrementally.
6
Sanskrit & Mixed-Language Vocabulary Gap
Swami Shyam's teachings contain Sanskrit terms, yogic philosophy vocabulary, and Hindi-English mixing that all transcription tools will get wrong. A comprehensive custom vocabulary glossary must be built before transcription begins and submitted to every service used.
7
No Staffing Plan for Human Review
At ~10 minutes of review per recording, processing 2,000 recordings means 333 hours of human review work. This is not a one-time task — new edits will always be added. Human review must be treated as a staffed, ongoing role with a defined capacity and throughput target.

Pitfalls to Avoid

Hard-earned lessons from similar archival projects. These are the most common ways this kind of work goes wrong.

Process Risk
Trying to automate before understanding the problem
Run 50 representative recordings through the full workflow manually first. You'll discover the Sanskrit vocabulary problem, audio quality edge cases, and metadata inconsistencies when they're cheap to fix — not after building an automated system around incorrect assumptions.
Team Risk
One-person bottleneck on database entry
If a single person controls adding or approving records, the entire pipeline stalls when they're unavailable. Build multi-user workflows from day one with clear roles and access levels.
Technical Risk
Dropbox as the permanent master archive
Dropbox is a sync tool, not an archive. It's expensive per GB at video scale, has no meaningful API for batch processing pipelines, and is a single point of failure. Migrate masters to object storage (B2 or S3) before scaling.
Quality Risk
Skipping human review to move faster
AI summaries of a spiritual teacher's talks will contain meaningful errors — misheard Sanskrit, wrong speaker attribution, summaries that miss the point. Publishing without review risks putting incorrect or misleading content in front of students and seekers.
Architecture Risk
Building a fully custom platform from day one
A custom database, custom UI, custom auth, and custom commerce layer is a 6+ month project with ongoing maintenance. Use best-of-breed services (Airtable, AssemblyAI, Vimeo OTT) and connect them. Build custom only when you've outgrown the off-the-shelf tools.
Data Risk
Inconsistent metadata across the backlog
Without a defined schema and controlled vocabularies, team members will use different tags, location names, and descriptions for the same concepts. This makes search and filtering unreliable. Define the schema before any ingestion begins — not after.
Legal / Privacy Risk
No access control framework before going public
Not all recordings are appropriate for public release. Personal conversations, informal talks, or sensitive content mixed into the archive could cause harm if published without review. Implement a per-record access flag early and review before any public launch.
Cost Risk
Underestimating transcription costs at scale
Several thousand hours of audio through a paid transcription API adds up quickly. Run a cost calculation against your full archive before committing to a service. Consider Whisper (local/free) for the bulk backlog and paid APIs for ongoing new recordings only.