id author title date pages extension mime words sentence flesch summary cache txt ital-12553 Han, Yan; Rychlik, Marek Development of a Gold-standard Pashto Dataset and a Segmentation App 2021-03-11 15 .pdf application/pdf 6145 371 50 To ensure that the dataset has the gold-standard one-to-one mapping of a line image to a line text, the Ph.D. student keyed in Pashto texts line by line by viewing every individual line image. Peshawar campus has been working on Pashto OCR since 2006, and its research has created a Pashto image-to-ligature dataset titled FAST-NU dataset, containing 4,000 images of 1,000 unique ligatures in a variety of font sizes.9 cache/ital-12553.pdf txt/ital-12553.txt