Commit 584067f7 authored by Oliver Hellwig's avatar Oliver Hellwig

initial commit

parents
# Morpho-lexical annotations
The file 'rigveda.csv' contains the morpho-lexical annotations that were generated with the SanskritTagger tool and manually validated.
The file is a plain text file in UTF-8 encoding using # for separating fields.
The first line of the file contains the headline. Explanation of the individual fields:
* **book** of the Rigveda
* **chapter** = hymn of the Rigveda
* **strophe** = stanza
* **verse** = line in the stanza. A line is terminated by a (double) danda.
* **position** of a word in this line. Words are separated by spaces in the samhita text.
* **word** = a string of the samhita text; can contain multiple inflected lemmata.
* **substring_position**: If a word consists of multiple lemmata, this value gives the position of the lemma.
* **surface_form**: ''padapatha'' form of an inflected lemma
* **lemma**: dictionary form of the lemma
* verbal_root#verbal_prefixes#lemma_id#id_tea#sentence_boundary#coarse_pos#case#number#gender#person#tense_mode#synsets
\ No newline at end of file
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment