Automatic Collation for Diversify Corpora (ACDC) Tutorial
ฝัง
- เผยแพร่เมื่อ 27 พ.ย. 2024
- In this video David Smith (Khoury College of Computer Sciences, Northeastern University) provides a brief introduction to OpenITI's project Automatic Collation for Diversifying Corpora (ACDC). Many tools for handwritten text recognition assume that the user will manually transcribe a few hundreds or thousands of lines from one manuscript in order to train a model to transcribe the rest. ACDC is complementary to that approach. It provides tools to search collections of manuscripts for widely-copied texts, align the manuscripts with the texts, and use the resulting data to train new models. Since many texts survive in a variety of styles and layouts, this approach can produce more generalizable, robust models. This video provides a general background to ACDC, walks through the steps in running our code, and presents the results of our experiments with it.
For the code and further instructions: github.com/Ope...