Call for Papers
Quick Links
October, 2025 | Volume 04 | Issue 04
Paper 1: Reducing FastText's Limits in Romanized Language Detection
Authors : Yashi Bajpai, Aditi Joshi and Mr. Amit Srivastava
Doi: https://doi.org/10.63920/tjths.44001
Abstract
To identify the language of a given text, language identification models such as FastText are used often. However, these models frequently have trouble accurately categorizing text that is written in the Roman (Latin) nature but have historically used non-Latin scripts like Hindi, Japanese and Chinese. In our research, we analyze FastText's performance on romanized inputs and find a pattern of misinterpretation into unrelated languages and lower confidence scores. We solve this by implementing a score-based thresholding method, which hides the input's anticipated language label and classifies it as romanized if the confidence score that FastText returns is less than the set threshold (0.5). This threshold-based method increases classification reliability through testing on several languages and romanized inputs. This study identifies a significant weakness in existing language identification systems and suggests a simple, adjustable modification to improve their effectiveness in multilingual, real-world situations.
