Being Multilingual is an important aspect for the library catalog. it is a feature that koha offers. It facilitates users who want to search the library records in their preferred language. Since koha supports UTF-8, UTF-16, and Unicode standards, the librarians can make catalog entires of regional language books in their regional scripts like Punjabi, Hindi, Bengali, Tamil etc.
But the problem arises when it comes to indexing and searching these vernacular languages.
Koha uses Zebra as its default search engine for indexing and retrieving the records. Zebra is a high-performance, general-purpose structured text indexing and retrieval engine. It reads records in a variety of input formats (eg. email, XML, MARC) and provides access to them through a powerful combination of Boolean search expressions and relevance-ranked free-text queries.
Zebra supports large databases (tens of millions of records, tens of gigabytes of data). It allows safe, incremental database updates on live systems. Because Zebra supports the industry-standard information retrieval protocol, Z39.50, you can search Zebra databases using an enormous variety of programs and toolkits, both commercial and free, which understands this protocol…” Zebra – User’s Guide and Reference, p. 1, http://www.indexdata.dk/zebra/doc/zebra.pdf
But by default, the Zerbra search engine does not support indexing languages other than English. So the solution to this is to install and enable ICU chains. in order to do this, first install yaz-icu package.
Install the Yaz-icu package:
sudo apt-get install yaz-icu
Then in In the staff interface go to More > Administration > Global system preferences > Searching.
- In this tab Change the UseICUStyleQuotes system preference to Using.
- then Change the QueryFuzzy system preference to Don’t try.
- and also Change the QueryStemming system preference to Don’t try.
Then Edit /etc/koha/zebradb/etc/default.idx with the following command
sudo nano /etc/koha/zebradb/etc/default.idx
Change or add the bolded lines as follows:
# Traditional word index
# Used if completenss is 'incomplete field' (@attr 6=1) and
# structure is word/phrase/word-list/free-form-text/document-text
index w
completeness 0
position 1
alwaysmatches 1
firstinfield 1
icuchain words-icu.xml
# Phrase index
# Used if completeness is 'complete {sub}field' (@attr 6=2, @attr 6=1)
# and structure is word/phrase/word-list/free-form-text/document-text
index p
completeness 1
firstinfield 1
icuchain phrases-icu.xml
Restart Zebra and rebuild the search index with the following commands one by one.
sudo koha-zebra --restart {yourinstancename}
sudo koha-rebuild-zebra -f -v {yourinstancename}
This should take some time depending on the size of your catalog, after the indexing is finished, you will be able to search and browse regional language records in the Koha catalog.
Leave a Reply