1 Introduction

India is home to vast treasure trove of Knowledge since ancient times. India was revered and celebrated as Vishwa Guru, one who provides guidance to the world. The advice enshrined in our scriptures range from mundane aspects of life, to Arts, Culture, Science and Philosophy, associated with their insight on deep aspects of learning and the learning process ‘how to learn’. Repeated invasions and various other factors dethroned our Country from this exalted pedestal. It is but the aspiration of every Indian to see that the Country gets back to its former glory. Reviving our past requires us to see, read, and understand our history and go back in time. This is perhaps the reason, our forefathers have thoughtfully documented the ‘contemporary knowledge’ in palm-leaves that has withstood the ravages of time, among other factors.

2 Knowledge preservation and transmission modes

Oral tradition mode Let us start by looking at how knowledge was passed on from one generation to the next and preserved. Two main modes of transmission were followed; oral mode and manuscripts mode. Our entire system of education in the past, Vedas, Shastras and associated subjects were taught from teacher to student in oral mode in order to preserve the knowledge. Actually, it was considered less scholarly to write/read the Vedas; on the contrary, remember and recall was the process that was appreciated and encouraged. To ensure correctness and preservation of sound and delivery of each unaltered, students were taught right from childhood the complex recitation techniques that are based on tonal accents, a unique manner of pronouncing each letter and specific speech combinations. Besides, a remarkable method was in vogue to make sure that words and syllables are not altered. According to this the words of a mantra are strung together in eleven different patterns such asvakya, pada, krama, jata, mala, sikha, rekha, dhvaja, danda, ratha, and ghana [1].

Each method had its own inter se importance in ensuring that a passage is remembered in such way as to ensure that auto-correction is embedded in storage (memory) and transmission (recall and teaching another student of the next generation). The integration of flexibility and rigidity in learning was the key to the success of oral mode.

A cornerstone of teaching Vedas, Sanskrit, and related subjects has been the uncompromising emphasis on rigidity of various aspects—Grammar, Pronunciation, etc. Rigidity was adopted as the technique to preserve the language and contents across time, place and other factors, and not as a means of curbing one’s freedom of expression. In fact, every field of knowledge offered abundant opportunities for expressing creative thoughts within the ambit of certain rigid rules, for otherwise, we would not have the works of the best of poets, stories and dialogues, coming out of these systems. What rigidity has achieved is ensuring that what was spoken or heard some thousand years ago, is exactly the same as what is spoken, heard and understood in the same way even today, is amazing. The approach adopted helped in minimizing obsolescence of knowledge as well as avoiding ambiguity. On the contrary, one can see how the English language of a hundred years before, nay a decade earlier, is quite different from today’s language, putting the legacy content out of reach. The United Nations has recognized the Tradition of Vedic Chanting as an Intangible Cultural Heritage [2].

Manuscripts mode While the oral mode was unique to Vedas, Upanishads, and Sastras, another mode of preservation was used to communicate to next generation in the written format as manuscripts; they are the unique medium called palm leaves. The palmleaf manuscripts have proven to be a very good medium for its single-most important property; ability to remain intact for a few hundred years even when maintained in ambient conditions. Manuscripts were handwritten (actually engraved) on Palmyra Leaves with metal stylus. Contents span historical, cultural, scientific, and other domains. Palm leaf manuscripts contain information from a variety of knowledge domains including:

  1. 1.

    Indigenous medicine such as Siddha, Ayurveda, other systems.

  2. 2.

    Human anatomy (Varmam, surgery).

  3. 3.

    Veterinary science.

  4. 4.

    Agriculture.

  5. 5.

    Traditional art and architecture: Temple art, Temple architecture, Shipbuilding, Carpentry, Metalworking, Sculpture.

  6. 6.

    Traditional musicology..

  7. 7.

    Astrology and astronomy.

  8. 8.

    Yoga.

  9. 9.

    Animal husbandry.

  10. 10.

    Martial arts.

  11. 11.

    Physiognomy (Samudrika Laksanam).

Palmleaf Manuscripts are found across the country, inherited by families from their preceding generations. However, with the passage of time, their importance has greatly decreased, and presently few people understand the need for preserving and maintaining them. It is now left to individual scholars, institutions, libraries, universities, and the governments to take on the responsibility to preserve the palm-leaf manuscripts in their original form, as well as to take up their processing especially through digitization.

India possesses more than five million manuscripts; the largest collection in the world, and the onus is on us to preserve this vast treasure trove of knowledge. The Antiques and Art Treasure Act, 1972 of India article 2(II)b states that, “any manuscript is a handwritten composition, which has significant scientific, historical or aesthetic value and which has been in existence for not less than seventy-five years old”.

3 Establishment of the manuscripts library at SCSVMV

The Manuscripts division of SCSVMV (deemed to be University) library has rare treasure; a collection of a large number of palm leaf manuscripts and paper manuscripts. The manuscripts were collected from different parts of the country during the yatras of Their Holiness Shankaracharyas of Sri Kanchi Kamakoti Peetam and those that were in the safe custody of the Sri Sankara Matam at Kumbhakonam and later shifted to Sri Sankara Matam at Kanchipuram. The National Mission for Manuscripts of the Government of India has recognized the center as one of the Manuscript Resource Centres. Key aims of the Manuscript Library include:

  1. 1.

    Surveying and collecting manuscripts.

  2. 2.

    Conservation of manuscripts in their original form.

  3. 3.

    Digitization of manuscripts.

  4. 4.

    Editing and publishing books and periodicals.

  5. 5.

    Conducting outreach programmes and awareness camps.

The manuscript collection spans an array of subjects including Sahitya, Vyakarana, Vedanta, Vaidya, Agama, Tantra, Nyaya and Mantra Sastra. While most of the manuscripts are in Sanskrit language and the scripts are Grantha, Telugu, Tamil, Nandinagari, Devanagari, Malayalam and Kannadam. Some of the texts in the manuscripts are with commentaries while others without. A sizable percentage of manuscripts are considered to be rare and unpublished. Copies of Ramayana and Mahabharata are available on palm leaf.

4 Nature of the rare manuscripts

There are different types of manuscripts and they are unique in several ways. Some of the metric are, Pocket Books, Long Size, Short Size, Rare Scripts, Paper based, multi leaf illustrated, Very Large Set, and so on. One of the manuscripts is rather unique in that, the manuscript can be read, both from Left to Right (Story of Rama) and Right to Left (Story of Krishna). While it is common to come across manuscripts that are rectangular in shape, there are some which are in the shape of a fish.

.

Title

Rare feature (available in SCSVMV)

Not in SCSVMV library

Pocket Book:

Saptarishi Puja

Very Small complete book

No of leaves: 17

Size: 8.5 cm × 6 cm

Language: Sanskrit

Script: Grantha

View full size image

Long Size:

Vedanta Paribhasha

Very Long Manuscript

Author: Dharmaraja Adhwari

Size: 53 cm × 6 cm

Language: Sanskrit

Script: Grantha

Short Size:

Deva Devi Samvadha

(length wise) Small Manuscript

Size: 12 cm × 2 cm

Language: Sans/Script: Grantha

Vaidhyam

Very Small Manuscript (Breadthwise)

Size: 14 cm × 1.5 cm

Language: Tamil

Script: Tamil

Rare Script:

Adyatma Ramayana

(Uma Meheshwara Samvadha)

Script: Sharadha (Script of Kashmir)

Size: 8.5 cm × 6 cm

Language: Sanskrit

Paper Based:

Shivagita

Special feature: Nandinagari (a rare script in Tamil Nadu.)

Paper Manuscript

Language: Sanskrit

Very large Set:

Srimad Ramayanam

Complete book in one volume. (Very Large volume)

Author: Valmiki

No of pages: 617 pages

Language: Sanskrit

Script: Grantha

Multi Leaf Illustrated:

Gita Govindham

Illustrated Manuscript (Radha & Krishna)

A rare type of illustration. Palm leaves have been stitched together and the picture has been drawn continuously in the leaves

.

It may be pertinent to note that several thousand manuscripts from India are still in the possession of individuals, museums and other organizations in Europe, and other countries abroad.

5 Digital preservation

Keeping in mind the rich content and knowledge available in the manuscripts, the goal is to transform the manuscript content into an easily accessible form through the application of Information and Communication Technologies. The National Mission for Manuscripts is the pioneering effort in our country for the digitization of manuscripts [3]. As part of the NMM, various Manuscript Resource Centres, Conservation Centres and Manuscript Partner Centres have been identified for survey and documentation of manuscripts as well as conservation [3]. Kritisampada is an outcome of the project wherein a national database of manuscripts is made available on the internet, with some essential metadata [4]. Guidelines for digitization have been prepared and formalized. More than a crore pages of manuscripts have been digitized across various centres. This phase of Digital Preservation is limited to Identification and Cataloguing only. The larger question is conversion to a shareable, storable, retrievable, and searchable form. The rest of the paper describes the approach of SCSVMV in this direction.

6 Digital preservation challenges

There are some unique challenges while trying to preserve the palm leaf manuscripts. They are volume, variety, fragility, numbering, physical condition, availability of resource persons, to name a few.

  1. A.

    Volume The palm-leaf manuscripts have survived for a phenomenal time period spanning hundreds of years. This implies that the number of palm-leaf manuscripts available to transfer to digital domain is quite huge. While it is heartening to note that lots of content are preserved over centuries, their preservation and access is a daunting task. The volume is also spread geographically. The palm-leaves are found in various geographies, and with a lot of individuals or families who have inherited them from their forefathers.

  2. B.

    Variety The palm-leaf manuscripts are quite divergent as enumerated below:

    1. 1.

      Scripts—Grantha, Nandi Nagari, Tamil, Telugu, Malayalam, Modi etc. Many bundles have more than one script – Grantha + Telugu, Grantha + Malayalam.

    2. 2.

      Languages—A number of languages including Sanskrit, Tamil, Telugu etc.

    3. 3.

      Medium—Different kinds and shapes of palm –leaves, other types like Birch Bark etc.

    4. 4.

      Time-period—The manuscripts are across different time-periods, thereby adding to the variety in all aspects.

    5. 5.

      Physical condition—Damage types and extent of damage.

    6. 6.

      Multiple copies of the same manuscript exist, possibly with differences in text

  3. C.

    Fragility In general, manuscripts are fragile, and prone to damage. Large number of manuscripts are already damaged. Hence, they are to be handled carefully, and cannot be put through some automated scanners or other machines. Further, there is also reluctance on the part of the owners (both individuals and institutions) to share the original manuscripts even for the digitization process.

  4. D.

    Numbering The palm leaves in the manuscript are not numbered. They are tied together with a thread, and if this thread comes off, the bundle sequence is lost. They can be re-arranged only with the help of the contents and content-experts who understand what is written therein. In some cases, the library staff have inscribed numbers directly on the original palm-leaves using a pen, thereby permanently altering its original state.

  5. E.

    Resource persons Very few persons are familiar with the different aspects of manuscripts—right from handling manuscripts, how to digitize, understanding the various scripts and languages to catalog the manuscripts, etc. Besides, there is no punctuation mark in many palm-leaf manuscripts and hence, the need for training in understanding the writings for correct translations. Some of the damages that appear often are shown in the adjoining figure.

7 Saaswathaiswaryam

Saaswathaiswaryam is the complete end-to-end process that spans the spectrum of knowledge preservation to knowledge dissemination. Palm Leaf based manuscript is only a medium, not the knowledge itself. The following steps are deemed necessary to walk the spectrum:

  1. a.

    Palm leaf to digital image This is the first phase. Converting the Palm Leaf content into the digital equivalent using appropriate techniques such as imaging. Simplest form of imaging is a high-resolution photography. While this form of imaging is good for preserving in a computer system, it is no longer the visible written material that human eye can see. But this image has its own advantages. Now we have the Palm Leaf in a bit form of representation inside a computer storage in 1’s and 0’s.

  2. b.

    Image cleaning This is the second phase. The image in digital form is same as what human eye can see. The image would have captured all the damages, broken edges, holes, etc. in vividly; but in digital form. But the advantage is that the palm leaf can be ‘cleaned’ digitally. Image cleaning algorithms could be deployed to remove the damages [5]. For example, the original shape can be reconstructed; all the holes can be filled; that too digitally. While doing so, the curator can visually see as computer output in a high-resolution digital monitor, the entire palm leaf undergoing a phenomenal change (shown in Fig. 1). At the end of this phase it will look beautiful and shiny as a palm leaf written just now. By trial and error, this method can be perfected to be as original as possible. Several algorithms can be tailor made to suit the properties of the original palm leaf manuscript. However, the final approval and acceptance of the newborn palm leaf has to come from a subject expert.

    Fig. 1
    figure 1

    Original in Grantha script, recreated in Grantha script, and Transliteration in Sanskrit Devanagari script

  3. c.

    Image extension This is the third phase. Many times, some portions of a manuscript are damaged or lost beyond recognition. While reconstructing the physical shape is relatively easy, reconstructing the contents will be hard. However, after understanding the images in the undamaged portion, one can develop a probabilistic model of the way scripts ‘hang together’. Such a model can be fine-tuned using machine learning and online dictionary. The model can predict with some probability, the scripts/characters that are lost due to damage. While it is a herculean task to get the correct version, a near correct version can be machine generated. Once again, the final approval and acceptance of the newborn palm leaf has to come from a subject expert.

The published Sanskrit scriptures are to be coded to be searchable. OCR (Optical character recognition) based system is to be developed for Grantha script (and all other) in manuscripts to read the portions of the manuscript automatically by machines. The stray leaves can be identified through OCR and their placement along with the order. Besides, the cleaned image to be analyzed along with its graphical nature, while the damaged portions to be reconstructed using the handwriting orientation as well as the number of possible syllables. The experts in the subject as well as manuscriptology are to be consulted for the correct reading and the image of the manuscript with most accurate reading has to be made like a critical edition.

  1. d.

    Image corpus This is the fourth phase. This phase requires a corpus to be generated for every language, every script, and every style of writing that are found in the palm leaf manuscripts.

  2. e.

    Image editing This is the fifth phase. The outcome of the third phase has to be interactively edited using the corpus generated in the fourth phase, keeping the palm leaf in the digital image form, until we obtain an acceptable image as endorsed by a subject expert. At the end of this phase, we have a palm leaf equivalent in image form. For computer-based algorithmic processing, we will need the following additional steps.

  3. f.

    Image recognition This is the sixth phase. In this phase, the digital image that is perfected and accepted by a subject expert, is converted to a program editable version of the image. At this point we have two versions; digital image version which is digitally editable and a code-representation version of the palm leaf that is program editable. The image recognition will also require a corpus that maps character (of every script) into a character.

  4. g.

    Storage, retrieval, searching, dissemination This is the seventh phase, which is completely in the area of information and communication technologies. Many open source solutions are available in bits and pieces; but a system with an architecture and implementation is required for SCSVMV’s view of the Saaswathaiswaryam problem. This will lead to interesting research coalition of the oldest literature and the most fashionable literature, viz., AI and ML.

8 Saaswathaiswaryam research and development at SCSVMV

At SCSVMV, we have initiated research and development in this fertile area. At present we are focusing on enabling the process of searching through the ‘scripture related manuscripts’ that are originally in palm leafs and in the Sanskrit language. Through comparative studies, we intend to establish the correctness and sequence of the available (read as already published) manuscripts that are scriptures as well as dictionaries. We hope to bring out a critical edition of a manuscript by the coupled use of machine-oriented processing, guided by subject experts in various fields as well as paleographers associated with various styles of writing in Sanskrit as well as Grantha. Of course, the modern data processing knowledge associated with contemporary computer systems and related technologies will be deployed to the extend feasible; in areas such as character recognition, image processing and so on. Our efforts in SCSVMV is at present focused on using modern machines for ‘syntactic’ understanding and Ashtadhyayi (Panini’s Grammar) coupled with Chandas (Sanskrit prosody) for ‘semantic’ understanding. The system that results from our efforts will be called Saaswathaiswaryam. Of course, we expect the technology associated with Saaswathaiswaryam to unfold in stages. We, at SCSVMV, believe that we at the threshold of a major change in unfolding the Ancient Indian Knowledge System. SCSVMVians are happy to be a part of this great change.

9 Conclusions

Palm-leaf manuscripts are a unique repository of historic knowledge. To uncover the knowledge available in the manuscripts and make them easily accessible to domain experts and common-man, a comprehensive digitization process is required, incorporating the unique aspects of palm-leaf manuscripts. Various tools and technologies are to be developed to meet the challenges that specifically arise in the context of palm-leaf manuscripts. The following couplet is interesting to note:

figure b

Save me from water, protect me from oil and from loose binding, and do not give me into the hands of fools! says the manuscript.

Anonymous verse frequently found at the end of manuscripts