Scott's pi (named after William A Scott) is a statistic for measuring inter-rater reliability for nominal data in communication studies. Textual entities are annotated with categories by different annotators, and various measures are used to assess the extent of agreement between the annotators, one of which is Scott's pi. Since automatically annotating text is a popular problem in natural language processing, and the goal is to get the computer program that is being developed to agree with the humans in the annotations it creates, assessing the extent to which humans agree with each other is important for establishing a reasonable upper limit on computer performance.
Scott's pi is similar to Cohen's kappa in that they improve on simple observed agreement by factoring in the extent of agreement that might be expected by chance. However, in each statistic, the expected agreement is calculated slightly differently. Scott's pi makes the assumption that annotators have the same distribution of responses, which makes Cohen's kappa slightly more informative. Scott's pi is extended to more than two annotators by Fleiss' kappa.
The equation for Scott's pi, as in Cohen's kappa, is:
However, Pr(e) is calculated using squared "joint proportions" which are squared arithmetic means of the marginal proportions (whereas Cohen's uses squared geometric means of them).
Confusion matrix for two annotators, three categories {Yes, No, Maybe} and 45 items rated (90 ratings for 2 annotators):
To calculate the expected agreement, sum marginals across annotators and divide by the total number of ratings to obtain joint proportions. Square and total these:
To calculate observed agreement, divide the number of items on which annotators agreed by the total number of items. In this case,
Given that Pr(e) = 0.
Cette page est générée automatiquement et peut contenir des informations qui ne sont pas correctes, complètes, à jour ou pertinentes par rapport à votre recherche. Il en va de même pour toutes les autres pages de ce site. Veillez à vérifier les informations auprès des sources officielles de l'EPFL.
Krippendorff's alpha coefficient, named after academic Klaus Krippendorff, is a statistical measure of the agreement achieved when coding a set of units of analysis. Since the 1970s, alpha has been used in content analysis where textual units are categorized by trained readers, in counseling and survey research where experts code open-ended interview data into analyzable terms, in psychological testing where alternative tests of the same phenomena need to be compared, or in observational studies where unstructured happenings are recorded for subsequent analysis.
In statistics, inter-rater reliability (also called by various similar names, such as inter-rater agreement, inter-rater concordance, inter-observer reliability, inter-coder reliability, and so on) is the degree of agreement among independent observers who rate, code, or assess the same phenomenon. Assessment tools that rely on ratings must exhibit good inter-rater reliability, otherwise they are not valid tests. There are a number of statistics that can be used to determine inter-rater reliability.
Kappa de Fleiss (nommé d'après Joseph L. Fleiss) est une mesure statistique qui évalue la concordance lors de l'assignation qualitative d'objets au sein de catégories pour un certain nombre d'observateurs. Cela contraste avec d'autres kappas tel que le Kappa de Cohen, qui ne fonctionne que pour évaluer la concordance entre deux observateurs. La mesure calcule le degré de concordance de la classification par rapport à ce qui pourrait être attendu si elle était faite au hasard.
Se penche sur l'évaluation de la PNL, couvrant les normes d'or, la précision, le rappel et la signification statistique.
Anthropogenic increases in atmospheric carbon dioxide (CO2) concentrations may affect vegetation distribution both directly through changes in photosynthesis and water-use efficiency, and indirectly through CO2-induced climate change. Using an equilibrium ...