TEJAS Journal of Technologies and Humanitarian Science

ISSN : 2583-5599

Open Access | Quarterly | Peer Reviewed Journal

October 2022 | Volume 01 | Issue 01


Out of vocabulary words handling in morphological analysis


Amit Asthana
Department of Computer Science, Babasaheb Bhimrao Ambedkar University, Lucknow, India

Author

Ganesh Chandra
Department of Computer Science, Babasaheb Bhimrao Ambedkar University, Lucknow, India

Author


📌 DOI: https://doi.org/10.63920/tjths.11001

🔑 Keywords: Natural Language Processing; Morphological Analysis;

đź“… Publication Date: 4 October, 2022

📜 License:

  • Share — Copy and Redistribute the material
  • Adapt — Remix, Transform, and build upon the material
  • The licensor cannot revoke these freedoms as long as you follow the license terms.

Abstract:

Morphological analysis is the first step in Natural Language Processing (NLP). It paves the way for future analysis and NLP procedures to be completed. Morphological analysis is the act of identifying morphemes in a phrase by studying each word individually. Out of vocabulary (OOV) words are words that are present in a phrase but for which the morphological analyzer is unable to discover a morpheme. In NLP, identifying OOV terms is a challenge. If OOV terms are not detected, it may be difficult to discern the sentence's true meaning. The goal of this research study is to provide a mechanism for identifying OOV words in Hindi during morphological analysis.

Download Full PDF Paper


References

[1] D. Chakrabarti, H. Mandalia, R. Priya, V. Sarma, and P. Bhattacharyya, “Hindi compound verbs and their automatic extraction,” in Proc. COLING, Manchester, U.K., 2008.

[2] G. Ponkiya, K. Patel, P. Bhattacharyya, and G. Palshikar, “Treat us like the sequences we are: Prepositional paraphrasing of noun compounds using LSTM,” in Proc. COLING, Santa Fe, NM, USA, Aug. 20–26, 2018.

[3] G. Ponkiya, R. Murthy, P. Bhattacharyya, and G. Palshikar, “Looking inside noun compounds: Unsupervised prepositional and free paraphrasing using language models,” in Findings of EMNLP, 2020, pp. 16–20.

[4] G. Ponkiya, K. Patel, P. Bhattacharyya, and G. K. Palshikar, “Towards a standardized dataset for noun compound interpretation,” in Proc. LREC, Miyazaki, Japan, May 7–12, 2018.

[5] M. Bapat, H. Gune, and P. Bhattacharyya, “A paradigm-based finite state morphological analyzer for Marathi,” in Workshop on South and South East Asian NLP (COLING), Beijing, China, 2010.

[6] R. Dabre, A. Amberkar, and P. Bhattacharyya, “A way to break them all: A compound word analyzer for Marathi,” in Proc. ICON, Noida, India, Dec. 18–20, 2013.

[7] T. Yamashita and Y. Matsumoto, “Language independent morphological analysis,” in Proc. ACL, 2000, pp. 232–238, doi: 10.3115/974147.974179.