In morphology and lexicography, a lemma (: lemmas or lemmata) is the canonical form, dictionary form, or citation form of a set of word forms. In English, for example, break, breaks, broke, broken and breaking are forms of the same lexeme, with break as the lemma by which they are indexed. Lexeme, in this context, refers to the set of all the inflected or alternating forms in the paradigm of a single word, and lemma refers to the particular form that is chosen by convention to represent the lexeme. Lemmas have special significance in highly inflected languages such as Arabic, Turkish, and Russian. The process of determining the lemma for a given lexeme is called lemmatisation. The lemma can be viewed as the chief of the principal parts, although lemmatisation is at least partly arbitrary.
The form of a word that is chosen to serve as the lemma is usually the least marked form, but there are several exceptions such as the use of the infinitive for verbs in some languages.
For English, the citation form of a noun is the singular (and non-possessive) form: mouse rather than mice. For multiword lexemes that contain possessive adjectives or reflexive pronouns, the citation form uses a form of the indefinite pronoun one: do one's best, perjure oneself. In European languages with grammatical gender, the citation form of regular adjectives and nouns is usually the masculine singular. If the language also has cases, the citation form is often the masculine singular nominative.
For many languages, the citation form of a verb is the infinitive: French , German , Hindustani /, Spanish . English verbs usually have an infinitive, which in its bare form (without the particle to) is its least marked (for example, break is chosen over to break, breaks, broke, breaking, and broken); for defective verbs with no infinitive the present tense is used (for example, must has only one form while shall has no infinitive, and both lemmas are their lexemes' present tense forms).