(GIST OF YOJANA) Technology Areas for Indian Languages [DECEMBER-2018]
(GIST OF YOJANA) Technology Areas for Indian Languages
[DECEMBER-2018]
Technology Areas for Indian Languages
Language technology has reached a level of maturity today where it is making mass impact on users of English and many other languages of the world. Indian language technology is also at an advanced level where it can make a mass impact on various aspect of language use. Indian language technology can enable people to access material in their own languages, for example, material in English and other Indian languages can be translated automatically. Similarly, computers can read out information to the illiterate or the blind through text to speech systems, remote data can become accessible through telephonic speech interfaces, sophisticated search can be provided to the internet, digitally scanned books and other material can be made more accessible by using optical character readers.
Technology Areas
Here are the Indian language technology areas and example tasks in each of
them.
1. Localization
- Availability of Indian language support on all electronic devices
- Use of Standards
- Creating e-content in Indian languages
- Creating by original writing
- Creating through translation
- Automatic machine translation
- English to/from Indian Languages (ILs)
- Among Indian Languages
2. Cross language access to content
- Cross lingual search across Indian languages as well as English
- Speech processing
- Text to speech for ILs
- Speech to text for ILs
- Optical character recognition
- Optical character recognition for ILs
- Online handwriting recognition for ILs
- Status and Prospects of Technology Areas
- Each of the above technology areas are described below with respects to the following aspects:
- What the technology area is about
- Current status of technology for Indian languages
- What can be achieved in the foreseeable future for Indian languages?
Localization
Localization in our context means that the electronic device is enabled with Indian Languages using the standards. For example, when one buys a phone, it should already have the language of the region built into it along with Hindi and English, for displaying, Keyboarding etc. More ever, the customer should be able to add any other Indian language later on demand, without having to change the handset.
Creating e-Content in Indian Languages
There is an acute need to create e-content in Indian languages. While e-content is not a replacement for books, the young generation has started placing increasing reliance on the content available over the internet.
In was observed in Germany, not so long ago, that the German youth were assessing English language content much more than the German language content.
E-Content in ILS can be created rapidly, in the short terms term, through translation of English content; but in the long term, it should be created originally in the Indian languages.
Translation among Indian languages can be used to generate content in all the Indian languages. Translation across ILs can be effective in conveying the originally meaning and would also be suitable to the Indian Context.Automatic machine translation translates a given text in one language to another, instantly. While the quality of translation produced varies depending on the distance between the language pairs, and the technology used, it provides instant access to text in another language to the user.
Translation from English to Indian languages has lower quality, as expected, because English is linguistically distant from Indian languages. Machine translation among Indian languages, on the other hand, has much better quality. MT systems for Indian languages are available and produce good quality translation.
They compare favorably with similar systems across European languages, for example. However, effort needs to be put in deploying them and making them available to users, both general users as well as publication houses. Deployment of systems for the language pairs which are ready, can take place within a year. MT systems are available for about a dozen Indian languages, and need to be developed for all 22 scheduled Indian languages. Technology framework is fully developed and a new language pair can be added easily and rapidly, in a matter of 2 years. The task of addition of new pairs can of course, be done in parallel.
Cross Language Access to Content
As the e-content in Indian languages increases, there would be an even greater need to search for and locate relevant content by the users on the internet. Here, it would be that the content is getting created for Indian languages, because large amount of content might not be available in all Indian languages initially. Technology is available for this task across half a dozen ILS. However, indexing of content in the languages needs to be done. More languages also need to be added.
- Speech Processing
- There are two parts to this technology:
- Text to speech
- Speech to text Systems
UPSC Pre General Studies Study Material
The former technology allows a computer to “read out” a given text file in an IL. The latter allows the computer to “listen” to the spoken language and convert it into a text file.
TTS can be used to allow a text file to be accessed by a blind person or an illiterate person. It can also allow interaction over the telephone, where the text cannot be seen by the user. TTS is a mature technology and is available for more than a dozen ILS.
- Optical Character Recognition (OCR)
- There are two technology areas under this head:
- Optical character recognition (OCR)
Online hand writing Recognition (OHWR) OCR takes a printed book and converts it into text form. When scanning of a book in hardcopy is done, the output is in the form of scanned images which cannot be used for search, machine translation, speech processing etc. OCR takes a scanned image of a page, recognizes the characters, and converts it into text form.
This technology for Indian Languages for about a dozen ILS is available as a field prototype. It needs to be converted into a product and provide to digital libraries which hold scanned collection of IL books, such as, Digital Library of India of Ministry of Information Technology.
Conclusion
The conditions are most conducive for the use and proliferation of language technology for Indian languages. There are a large number of users with digital devices who wish to get information in their own languages as they do not know English. There is a large amount of content in English but not in Indian Languages. Hence, there is a large unserved need!
Several things need to be done. For example, the Indian language technology should immediately be deployed to translate all central government websites into 22 Indian languages. This will generate a demand which will help growth of an ecosystem of academic institutions as researchers and technology developers, start-ups as technology maintainers and other who service the demand using MT technology. There would also be the need for human post-editors who would take the output of MT system, and make it more readable etc. Similarly, speech processing can be done, along with MT, to provide spoken language translation. The National Digital Spoken language translation. The National Digital Library of Indian should use services of an OCR for indexing the scanned images in Indian languages, in making them searchable.