JOURNAL ARTICLE

Applying Image Analysis and Machine Learning to Historical Newspaper Collections.

  • Published In: American Historical Review, 2023, v. 128, n. 3. P. 1382 1 of 3

  • Database: Academic Search Ultimate 2 of 3

  • Authored By: Soh, Leen-Kiat; Lorang, Liz; Pack, Chulwoo; Liu, Yi 3 of 3

Abstract

The article focuses on the challenges and advancements in analyzing digitized historical newspaper collections using image processing and machine learning techniques. It highlights common "noise effects" in digitized images, such as bleed-through and skewed orientation, which complicate both human and computer vision tasks. The research emphasizes visual cue-based methods for content identification, such as detecting poetic text through layout features, and employs deep learning models for clustering similar document images and enhancing metadata via automated pictorial element recognition. Additionally, content-aware image downscaling and the generation of pseudo-groundtruth datasets are presented as strategies to improve computational efficiency and model training. These approaches aim to expand research capabilities, improve discoverability, and support large-scale analysis of historical newspapers.

Additional Information

  • Source:American Historical Review. 2023/09, Vol. 128, Issue 3, p1382
  • Document Type:Article
  • Subject Area:Information Technology
  • Publication Date:2023
  • ISSN:0002-8762
  • DOI:10.1093/ahr/rhad369
  • Accession Number:172362162
  • Copyright Statement:Copyright of American Historical Review is the property of Oxford University Press / USA and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)

Looking to go deeper into this topic? Look for more articles on EBSCOhost.