JOURNAL ARTICLE

Bridging Linguistics and Artificial Intelligence: A Phoneme-Centric Method for Assessing Synthetic Speech.

Published In: International Journal of Semantic Computing, 2025, v. 19, n. 3. P. 433 1 of 3
Database: Applied Science & Technology Source Ultimate 2 of 3
Authored By: Reynolds, Sarah; Ochoa, Omar 3 of 3

Abstract

This study bridges the gap between traditional linguistic studies, particularly phonology, and modern Machine Learning (ML) developments. It explores the mutually beneficial interactions between the two domains: by utilizing Speech-To-Text (STT) models to aid in phonology and by integrating phonological insights into the evaluation of Text-To-Speech (TTS) generation. Phonology, the study of phonemes — fundamental sound units in human speech — is pivotal for understanding speech variations caused by accents, speech disorders, and individual pronunciation differences. Leveraging the International Phonetic Alphabet (IPA), this work analyzes speech patterns to enhance explainability and accuracy in TTS models, particularly those employing transformer architectures. This work aims to refine ML tools, expand the phonological approach, and further integrate linguistic knowledge into TTS evaluation and improvement. This approach for phoneme analysis is tested on the Speech Accent Archive and synthetic speech samples generated by the OpenAI and Azure text-to-speech systems. By employing the wav2vec model for phoneme detection and DeepPhonemizer for grapheme-to-phoneme conversion, phoneme distances are calculated using an adapted Levenshtein distance. This analysis reveals significant differences in phoneme accuracy between native English speakers, nonnative speakers, and synthetic speech, highlighting areas for improvement in synthetic speech generation. Results indicate that while synthetic speech often rivals or slightly exceeds real speech in overall phoneme distance, it also exhibits higher error variability, thus pinpointing specific areas for targeted improvement. This approach not only aids in refining TTS systems but also offers insights into developing more unique and personalized synthetic voices. [ABSTRACT FROM AUTHOR]

Additional Information

Source:International Journal of Semantic Computing. 2025/09, Vol. 19, Issue 3, p433
Document Type:Article
Subject Area:Language and Linguistics
Publication Date:2025
ISSN:1793351X
DOI:10.1142/S1793351X25440027
Accession Number:189015126
Copyright Statement:Copyright of International Journal of Semantic Computing is the property of World Scientific Publishing Company and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)

Looking to go deeper into this topic? Look for more articles on EBSCOhost.