Tag Set and Part-of-Speech-Tagging Approaches for Prakrit Language

Authors

  • Aarushi Jain
  • Vaishali Gupta

Keywords:

Indo-aryan language, Low resource languages, Natural language processing, POS-tagging, Prakrit

Abstract

Part-of-Speech Tagging (POS-tagging) is one of the most arduous problems in the field of natural language processing. Prakrit language is a middle Indo-Aryan language. It was one of the most widely used languages among common people in 3rd century. A lot of manuscripts are available as Jain and Buddha religious books and elsewhere. Despite of its popularity among common people very less amount of work was done for Prakrit language due to which many of the ancient manuscripts could not be explored. The main requirement in Prakrit is to understand the life transforming stories written in 3rd century. A lot of work is done for several other South Asian languages by Indian researchers but Prakrit is hardly touched upon in India. Some work is done in Germany and USA. POS-tagging is the first step for developing any machine translator and so on. POS tagging is exacting task for Prakrit language due to unavailability of corpus for computational processing. This paper reports generation of corpus, training of the data and testing of text tag set and CRF based tagger for Prakrit language.

Published

2021-03-05

Issue

Section

Articles