JOURNAL ARTICLE
Linguistic annotation of cuneiform texts using treebanks and deep learning.
Published In: Digital Scholarship in the Humanities, 2024, v. 39, n. 1. P. 296 1 of 3
Database: Academic Search Ultimate 2 of 3
Authored By: Ong, Matthew; Gordin, Shai 3 of 3
Abstract
This article presents a bootstrapping pipeline for morpho-syntactic annotation of ancient language corpora, demonstrated through the creation of a new Akkadian treebank and a spaCy-based language model named AkkParser. Designed for individual scholars working with low-resource ancient languages, the pipeline integrates human annotation with iterative machine learning to efficiently produce richly annotated corpora, using two volumes of Neo-Assyrian letters from the State Archives of Assyria project hosted on Oracc as a case study. The authors discuss linguistic challenges specific to Neo-Assyrian letters and cuneiform data encoding, and provide quantitative evaluation showing that their bootstrapped model improves annotation accuracy over generic transformer models, particularly benefiting morpho-syntactic parsing despite difficulties with syntactic dependency labeling. The pipeline and resulting resources are publicly available and intended to support further research, pedagogy, and advanced linguistic analyses in Akkadian and potentially other ancient languages.
Additional Information
- Source:Digital Scholarship in the Humanities. 2024/04, Vol. 39, Issue 1, p296
- Document Type:Article
- Subject Area:Communication and Mass Media
- Publication Date:2024
- ISSN:2055-768X
- DOI:10.1093/llc/fqae002
- Accession Number:176806366
- Copyright Statement:Copyright of Digital Scholarship in the Humanities is the property of Oxford University Press / USA and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
Looking to go deeper into this topic? Look for more articles on EBSCOhost.