1001 Bytes
Newer Older
Oliver Hellwig's avatar
Oliver Hellwig committed
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
# Morpho-lexical annotations

The file 'rigveda.csv' contains the morpho-lexical annotations that were generated with the SanskritTagger tool and manually validated.

The file is a plain text file in UTF-8 encoding using # for separating fields.

The first line of the file contains the headline. Explanation of the individual fields:
* **book** of the Rigveda
* **chapter** = hymn of the Rigveda
* **strophe** = stanza
* **verse** = line in the stanza. A line is terminated by a (double) danda.
* **position** of a word in this line. Words are separated by spaces in the samhita text.
* **word** = a string of the samhita text; can contain multiple inflected lemmata.
* **substring_position**: If a word consists of multiple lemmata, this value gives the position of the lemma.
* **surface_form**: ''padapatha'' form of an inflected lemma
* **lemma**: dictionary form of the lemma
* verbal_root#verbal_prefixes#lemma_id#id_tea#sentence_boundary#coarse_pos#case#number#gender#person#tense_mode#synsets