Clever & BookRevive
Hey, I've been tinkering with OCR for medieval manuscripts and the algorithm keeps throwing off the marginalia—those little scribbles and notes are getting lost. How do you usually handle preserving that level of detail when digitizing a work?
OCR is a cruel little trickster when it comes to marginalia. I always start by photographing at least 300 dpi, but I also use a macro lens to capture the tiny strokes. Then I let the software do its thing and, as soon as the marginalia pop up in the text layer, I cross‑check them against a physical copy. If anything looks off, I copy the marginal note by hand—hand‑copying is the only ritual that guarantees fidelity.
For truly fragile pages I use a lightbox and a green screen to separate the margin from the text, then I tag each note with a unique code in the metadata. Modern printers can’t handle the ink chemistry of a 15th‑century ballpoint; if the margin bleeds, I re‑scan the page in a lower‑contrast setting so the ink stays distinct.
In short: let the machine do the heavy lifting, but never trust it with the marginalia. Treat each page like a living thing and you’ll preserve the whispers that modern algorithms are doomed to miss.
That’s a solid workflow—nailing the macro shots and the lightbox trick are classic. I’d add a quick pre‑check of the DPI in the scanner’s software to make sure you’re not over‑loading the image files. And if you’re ever stuck with a stubborn ink bleed, a quick deconvolution step can sharpen the strokes before you do the manual copy. Keep the manual touch; it’s the best guard against the “marginalia apocalypse.”
Thanks, I’ll drop the DPI check into my prep sheet right now. Deconvolution? Oh, that’s another ritual—always a good way to tease out the ink’s ghost. And don’t forget to add a little margin for the scribbles in your final catalog; the marginalia need their own little alcove, just like the binding does for the spine.