It is very unlikely for a parallel corpus to contain all morphological forms of words. Statistical MT approaches face the problem of data sparsity when translating to a morphologically rich language. The objective of the current work is to handle morphological inflections in Hindi, Marathi, and Malayalam using Factored translation models when translating from English. These factors can contain any information about the surface word and use it while translating. Factored models are found to be useful for such cases, as they consider word as a vector of factors. However, PBSMT runs into difficulty when either or both of the source and target languages are morphologically rich. Phrase-based Statistical Machine Translation (PBSMT) is commonly used for automatic translation. There should be combined efforts for the technological development of Malayalam. Resources available to one have to be shared with others. Cloud sourcing can be encouraged to minimize the efforts. have to share and made available to the general users. The language resources such as text corpus, speech corpus, parallel corpus, etc. Many computational tools developed so far have to be made available as open source. WX transliteration system adopted for inputting Malayalam is notoriously bad. Still scholars convert Malayalam into roman and then into Malayalam after grammatical analysis. Malayalam need more slots for proper grammatical analysis. There are many problems with the Unicode slots allotted for Malayalam. Malayalam WordNet is to expanded or augmented to make it at par with European languages. For example MT systems are yet to me modified to make it suitable for the general users. Of course there still many unfinished works. Malayalam is now prepared for the full-fledged digitalization as visualized by the central government. The organizations such as CIIL, Mysore, Kerala University, Amrita University, CDAC-Trivandrum need to be appreciated for their efforts in uplifting Malayalam in the era of Information Technology. Many individuals, both from inside and abroad, literally worked for Tamil computing. Private organizations also contributed for this mission. This helped it to develop MT systems, wordNet and other NLP systems. Governments, both state and central, funded liberally for the technological development of Tamil. The references listed below stand to establish its efforts in fulfilling the need of the day i.e. It has made use of all the opportunities given to it for making it suitable for digitalization and computerization. Malayalam has initiated its technological development well in advance.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |