by fngjdflmdflg 10 hours ago

These OCR improvements will almost certainly be brought to google books, which is great. Long term it can enable compressing all non-digital rare books into a manageable size that can be stored for less than $5,000.[0] It would also be great for archive.org to move to this from Tesseract. I wonder what the cost would be, both in raw cost to run, and via a paid API, to do that.

[0] https://annas-archive.org/blog/critical-window.html

levocardia 6 hours ago | [-2 more]

This is a really interesting "data flywheel" -- better model >> more usable data >> even better model

tills13 5 hours ago | [-1 more]

surely there's an upper limit to this though with models literally eating themselves.

jeffbee 5 hours ago | [-0 more]

When a human students learns to read more carefully we don't consider that a negative.

kridsdale3 8 hours ago | [-0 more]

More Data for the Data Gods!