Back

Likelihood corpus distribution: an efficient topic modelling scheme for Bengali document class identification.

  • Published In: Sādhanā: Academy Proceedings in Engineering Sciences, 2024, v. 49, n. 3. P. 1 1 of 3

  • Database: Academic Search Ultimate 2 of 3

  • Authored By: Das Dawn, Debapratim; Khan, Abhinandan; Shaikh, Soharab Hossain; Pal, Rajat Kumar 3 of 3

Abstract

The learning quality of humans depends on the sense of contemplation. Textual documents are a huge part of the literature on contemplation which effortlessly creates perception. Automatic document class identification or organisation is a machine learning function to understand the psychological and emotional content of the text in a concise way. The problem of identification of documents falls in the field of library science, information science and artificial intelligence. The research progress of class identification of documents has been made in various most spoken languages. Numerous research works have been published in European and Asian languages. However, there is a gap in the literature when it comes to any less resource language, especially Bengali. Consequently, this work portrays an efficient topic modelling approach for Bengali document class identification. It proposes a Dirichlet-polynomial clustering model likelihood corpus distribution (LCD), which is based on a Bayesian numerical prototype. Experiments are done to prove the efficiency of LCD over various topic modelling algorithms, such as latent Dirichlet allocation (LDA), LDA with bag-of-words (LDA-BOW), latent semantic indexing (LSI), and hierarchical Dirichlet process (HDP). For performance evaluation, we considered five real-world datasets of Bengali corpora, such as science, sports, computer, season, and epic in this work. The coherence score of different modelling algorithms is compared to find the best model for each dataset separately. [ABSTRACT FROM AUTHOR]

Additional Information

  • Source:Sādhanā: Academy Proceedings in Engineering Sciences. 2024/09, Vol. 49, Issue 3, p1
  • Document Type:Article
  • Subject Area:Language and Linguistics
  • Publication Date:2024
  • ISSN:0256-2499
  • DOI:10.1007/s12046-024-02470-7
  • Accession Number:178527531
  • Copyright Statement:Copyright of Sādhanā: Academy Proceedings in Engineering Sciences is the property of Springer Nature and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)

Looking to go deeper into this topic? Look for more articles on EBSCOhost.