(GIST OF YOJANA) Technology Areas for
Technology Areas for Indian Languages
Language technology has reached a level of maturity today where it is making
mass impact on users of English and many other languages of the world. Indian
language technology is also at an advanced level where it can make a mass impact
on various aspect of language use. Indian language technology can enable people
to access material in their own languages, for example, material in English and
other Indian languages can be translated automatically. Similarly, computers can
read out information to the illiterate or the blind through text to speech
systems, remote data can become accessible through telephonic speech interfaces,
sophisticated search can be provided to the internet, digitally scanned books
and other material can be made more accessible by using optical character
Here are the Indian language technology areas and example tasks in each of
- Availability of Indian language support on all electronic devices
- Use of Standards
- Creating e-content in Indian languages
- Creating by original writing
- Creating through translation
- Automatic machine translation
- English to/from Indian Languages (ILs)
- Among Indian Languages
2. Cross language access to content
- Cross lingual search across Indian languages as well as English
- Speech processing
- Text to speech for ILs
- Speech to text for ILs
- Optical character recognition
- Optical character recognition for ILs
- Online handwriting recognition for ILs
- Status and Prospects of Technology Areas
- Each of the above technology areas are described below with respects to
the following aspects:
- What the technology area is about
- Current status of technology for Indian languages
- What can be achieved in the foreseeable future for Indian languages?
Localization in our context means that the electronic device is enabled with
Indian Languages using the standards. For example, when one buys a phone, it
should already have the language of the region built into it along with Hindi
and English, for displaying, Keyboarding etc. More ever, the customer should be
able to add any other Indian language later on demand, without having to change
Creating e-Content in Indian Languages
There is an acute need to create e-content in Indian languages. While e-content
is not a replacement for books, the young generation has started placing
increasing reliance on the content available over the internet.
In was observed in Germany, not so long ago, that the German youth were
assessing English language content much more than the German language content.
E-Content in ILS can be created rapidly, in the short terms term, through
translation of English content; but in the long term, it should be created
originally in the Indian languages.
Translation among Indian languages can be used to generate content in all the
Indian languages. Translation across ILs can be effective in conveying the
originally meaning and would also be suitable to the Indian Context.Automatic
machine translation translates a given text in one language to another,
instantly. While the quality of translation produced varies depending on the
distance between the language pairs, and the technology used, it provides
instant access to text in another language to the user.
Translation from English to Indian languages has lower quality, as expected,
because English is linguistically distant from Indian languages. Machine
translation among Indian languages, on the other hand, has much better quality.
MT systems for Indian languages are available and produce good quality
They compare favorably with similar systems across European languages, for
example. However, effort needs to be put in deploying them and making them
available to users, both general users as well as publication houses. Deployment
of systems for the language pairs which are ready, can take place within a year.
MT systems are available for about a dozen Indian languages, and need to be
developed for all 22 scheduled Indian languages. Technology framework is fully
developed and a new language pair can be added easily and rapidly, in a matter
of 2 years. The task of addition of new pairs can of course, be done in
Cross Language Access to Content
As the e-content in Indian languages increases, there would be an even greater
need to search for and locate relevant content by the users on the internet.
Here, it would be that the content is getting created for Indian languages,
because large amount of content might not be available in all Indian languages
initially. Technology is available for this task across half a dozen ILS.
However, indexing of content in the languages needs to be done. More languages
also need to be added.
- Speech Processing
- There are two parts to this technology:
- Text to speech
- Speech to text Systems
The former technology allows a computer to “read out” a given text file in an
IL. The latter allows the computer to “listen” to the spoken language and
convert it into a text file.
TTS can be used to allow a text file to be accessed by a blind person or an
illiterate person. It can also allow interaction over the telephone, where the
text cannot be seen by the user. TTS is a mature technology and is available for
more than a dozen ILS.
- Optical Character Recognition (OCR)
- There are two technology areas under this head:
- Optical character recognition (OCR)
Online hand writing Recognition (OHWR) OCR takes a printed book and converts it
into text form. When scanning of a book in hardcopy is done, the output is in
the form of scanned images which cannot be used for search, machine translation,
speech processing etc. OCR takes a scanned image of a page, recognizes the
characters, and converts it into text form.
This technology for Indian Languages for about a dozen ILS is available as a
field prototype. It needs to be converted into a product and provide to digital
libraries which hold scanned collection of IL books, such as, Digital Library of
India of Ministry of Information Technology.
The conditions are most conducive for the use and proliferation of language
technology for Indian languages. There are a large number of users with digital
devices who wish to get information in their own languages as they do not know
English. There is a large amount of content in English but not in Indian
Languages. Hence, there is a large unserved need!
Several things need to be done. For example, the Indian language technology
should immediately be deployed to translate all central government websites into
22 Indian languages. This will generate a demand which will help growth of an
ecosystem of academic institutions as researchers and technology developers,
start-ups as technology maintainers and other who service the demand using MT
technology. There would also be the need for human post-editors who would take
the output of MT system, and make it more readable etc. Similarly, speech
processing can be done, along with MT, to provide spoken language translation.
The National Digital Spoken language translation. The National Digital Library
of Indian should use services of an OCR for indexing the scanned images in
Indian languages, in making them searchable.