Ice Bear SoftSearchable PDF with Devanagari texts

This is the first attempt to prepare searchable PDF with Devanagari text using the devnag package. Searchability is achieved by adding ToUnicode maps using cmap.sty. It means that this feature is available with pdflatex only. Users of pdfplain will have to find their own way by examining cmap.sty and fontenc.sty.

The images below show the results of simple searches made by Acrobat 7.0 in Linux Fedora Core 2. A large screenshot will open in its own window by clicking on the preview.

Demonstration of search capabilities
Search for mai.mne Search for utsaah

Searching Devanagari texts is not that simple as searching texts in latin alphabets. The first peculiarity is presented with i-matra which precedes the consonant. Thus you are not able to find words as mis or dillii if you type them into the search fields as usual. It is necessary to put the characters in the same order as they are displayed. The Devanagari keyboard allows it.

Since the way, how the characters are combined to form words, is quite complex, Acrobat Reader is often confused and adds word boundaries in the middle of a word. This is a case of u-matra and uu-matra. You therefore cannot find kulluu, you must enter it as two words, i.e. ku lluu. Similarly duur must be entered as duu followed by a space and r. Word huu~m can be found if you type huu followed by a space and a lonely candrabindu. This rule does not apply to words containing ru and ruu. These sylables are contained in the dvng fonts as special glyphs, therefore they do not create word boundaries.

Words with vattu present other difficulties. There is no problem if the consonant with a vattu is available as a glyph. If it is composed from pieces, it again generates a word boundary. When searching for .draaivar you must enter .dr followed by a space and aaivar. Unicode distinguishes matras and independent vowels. The Devanagari keyboard thus allows you to do what is impossible with Velthuis transliteration: start the word with aa-matra followed by independent i.

The dvng fonts do not contain independent long a. It is composed from short independent a followed by aa-matra. You must keep this in mind when searching for aayaa. The Devanagari keyboard allows you to write it. Independent o and au are formed similarly. As a complex example try to find word aak.rti. First you must start with short independent a followed by aa-matra. The r-matra creates a word boundary, therefore a space must be inserted. Finally, i-matra must precede the t-consonant.

Unfortunately I did not succeed to find word huaa.

Anusvaras and candrabindus do not create word boundaries. Words as mai.mne and jaauu~mgaa are perfectly searchable.

The second test file, examples.dn, demonstrates that you can find both variants of kta no mater whether the ligature is available from your system font. Try to search for yu ktatara.m and do not forget to add a space after u-matra (see above).

Superscript repha acts similarly to the i-matra. It is written at the end of the akshara, so you have to put it there when filling in the search dialogue. When trying to search for munibhirmata.m, mu inibhmarta.m must be entered. Similarly durj~neya.m can be found as du j~nerya.m.

You can download a sample searchable file. It is the samle file from the devnag package with added date so that I could verify searching digits. The tools are available from a single file together with a short installation instruction and the *.dn sources for making the sample PDF files.