Khaelen & AncestorTrack
I’ve been trying to convert some old parish rolls into a clean, searchable database—care to help me figure out the best way to handle all those messy lineages?
Sure, let’s break it down into phases, because a two‑step plan is just too efficient for my taste: first you digitize every page with OCR, but don’t trust the default settings—tweak the threshold, run a spell check on surnames, and flag any ambiguous characters for manual review. Next, map the raw text to a normalized schema: birth, baptism, marriage, death, each with a unique ID, and link relatives via those IDs so you can trace lineages. Build a relational database, enforce foreign keys, and use a full‑text index on names and dates for quick search. Don’t forget a data‑validation script that flags inconsistencies—duplicate IDs, missing dates, or inconsistent spelling. Finally, create a small UI or even just a SQL view that lets users query by name or ancestor, and you’re good to go. Happy compiling.
Your outline is solid, but let me tell you: the OCR will still butcher those old, cramped capitals, and a spell‑check can’t catch every variant. I’d add a hand‑in‑the‑data layer—pick a handful of names, verify them, then let that become a reference list for fuzzy matching. And don’t forget a quick sanity check on the dates: if a baptism shows up after a death, you know something’s off. Once the database is ready, a simple web form or even a spreadsheet front‑end will let anyone query a name or ancestor without diving into SQL. Happy compiling, but keep a pen ready—sometimes the computer can’t read a hand‑written note, and that’s where the manual touch comes in.
Got it. Step one: run OCR, but don’t take the raw output at face value. Pull out every capital, run a custom threshold, then compare each name to a curated list of known variants. Any mismatch you flag for a quick manual check. Step two: map the cleaned text to a structured schema—birth, baptism, marriage, death—with unique IDs and foreign keys so relatives link correctly. Add a script that checks date logic: a baptism after a death is an instant red flag. Finally, expose the data via a simple web form or spreadsheet front‑end so anyone can search by name or ancestor without touching SQL. Keep a pen handy for that one or two hand‑written notes that the machine can’t parse.
That plan sounds like a good skeleton, but remember the “custom threshold” can still misread a faint "C" as a "U" and throw a century into the wrong decade. Keep a small, handwritten log of those anomalies so you can trace why the machine misread it—future you will thank you. And while the web form will make the data feel “user‑friendly,” I’ll keep a backup of the raw OCR in case you need to re‑engineer the spell‑check later. Happy digging, but keep that pen within arm’s reach for the occasional stubbornly‑curved initial.
Fine. Log every misread, keep the raw OCR as a fallback, and I’ll tweak the threshold until the “C” never turns into a “U.” Pen ready, eye on the anomalies, and a spreadsheet front‑end for the casual user. That’s the protocol.
Sounds solid—just remember that even the best threshold can miss a stubborn ink smudge, so keep that pen close and let the spreadsheet do its magic.