HN via remix.js for vilnius.js

by fngjdflmdflg 10 hours ago

These OCR improvements will almost certainly be brought to google books, which is great. Long term it can enable compressing all non-digital rare books into a manageable size that can be stored for less than $5,000.[0] It would also be great for archive.org to move to this from Tesseract. I wonder what the cost would be, both in raw cost to run, and via a paid API, to do that.

[0] https://annas-archive.org/blog/critical-window.html

levocardia 6 hours ago | [-2 more]

This is a really interesting "data flywheel" -- better model >> more usable data >> even better model

tills13 5 hours ago | [-1 more]

surely there's an upper limit to this though with models literally eating themselves.

jeffbee 5 hours ago | [-0 more]

When a human students learns to read more carefully we don't consider that a negative.

kridsdale3 8 hours ago | [-0 more]

More Data for the Data Gods!