TEJAS Journal of Technologies and Humanitarian Science

October, 2025 | Volume 04 | Issue 04

Paper 1: Reducing FastText's Limits in Romanized Language Detection

Authors : Yashi Bajpai, Aditi Joshi and Mr. Amit Srivastava

Doi: https://doi.org/10.63920/tjths.44001

Abstract

To identify the language of a given text, language identification models such as FastText are used often. However, these models frequently have trouble accurately categorizing text that is written in the Roman (Latin) nature but have historically used non-Latin scripts like Hindi, Japanese and Chinese. In our research, we analyze FastText's performance on romanized inputs and find a pattern of misinterpretation into unrelated languages and lower confidence scores. We solve this by implementing a score-based thresholding method, which hides the input's anticipated language label and classifies it as romanized if the confidence score that FastText returns is less than the set threshold (0.5). This threshold-based method increases classification reliability through testing on several languages and romanized inputs. This study identifies a significant weakness in existing language identification systems and suggests a simple, adjustable modification to improve their effectiveness in multilingual, real-world situations.

TEJAS Journal of Technologies and Humanitarian Science

ISSN : 2583-5599

Open Access | Quarterly | Peer Reviewed Journal

Call for Papers

Quick Links

October, 2025 | Volume 04 | Issue 04

Download Full PDF Paper

Content Links

Contact Us

Connect