JOURNAL ARTICLE

Code‐mixed Hindi‐English text correction using fuzzy graph and word embedding.

Published In: Expert Systems, 2024, v. 41, n. 7. P. 1 1 of 3
Database: Academic Search Ultimate 2 of 3
Authored By: Jain, Minni; Jindal, Rajni; Jain, Amita 3 of 3

Abstract

Interaction via social media involves frequent code‐mixed text, spelling errors and noisy elements, which creates a bottleneck in the performance of natural language processing applications. This proposed work is the first approach for code‐mixed Hindi‐English social media text that comprises language identification, detection and correction of non‐word (Out of Vocabulary) errors as well as real‐word errors occurring simultaneously. Each identified language (Devanagari Hindi, Roman Hindi, and English) has its own complexities and challenges. Errors are detected individually for each language and a suggestive list of the erroneous words is created. After this, a fuzzy graph between different words of the suggestive lists is generated using various semantic relations in Hindi WordNet. Word embeddings and Fuzzy graph‐based centrality measures are used to find the correct word. Several experiments are performed on different social media datasets taken from Instagram, Twitter, YouTube comments, Blogs, and WhatsApp. The experimental results demonstrate that the proposed system corrects out‐of‐vocabulary words as well as real‐word errors with a maximum recall of 0.90 and 0.67, respectively for Dev_Hindi and 0.87 and 0.66, respectively for Rom_Hindi. The proposed method is also applied for state‐of‐art sentiment analysis approaches where the F1‐score has been visibly improved. [ABSTRACT FROM AUTHOR]

Additional Information

Source:Expert Systems. 2024/07, Vol. 41, Issue 7, p1
Document Type:Article
Subject Area:Language and Linguistics
Publication Date:2024
ISSN:0266-4720
DOI:10.1111/exsy.13328
Accession Number:177626892
Copyright Statement:Copyright of Expert Systems is the property of Wiley-Blackwell and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)

Looking to go deeper into this topic? Look for more articles on EBSCOhost.