[[TracNav]] = SimilarityMeasure: Acronym Matcher = Developer: [mailto:endres(at)dfki.uni-kl.de Björn Endres] == Description == This module uses the entities' labels to calculate the likelihood that one is meant to be an acronym of the other. The algorithm is pretty smart and recognises extensions as in ''W3C'' or basic leet as in ''2L8''. It is meant to be a supplement for more general similarity measures, improving them by the ability to detect acronyms. This measure is symmetric, since the shorter of the two labels is always checked for being an acronym of the longer one. A set of parameters allows for tuning the measure to different scenarios. == Characteristics == In order to demonstrate the abilities of this SimilarityMeasure, here some examples (using the default values): || '''Frame A name''' || '''Frame B name''' || '''Measure value''' || || Graduate Management in Admission Test (Educational Testing Service) || GMAT || 1.00 || || International Semantic Web Conference 2005 || ISWC05 || 1.00 || || The World Wide Web Consortium || W3C || 1.00 || || ventricular fibrillation || v-fib || 1.00 || || Bundesrepublik Deutschland || BRD || 0.92 || || Roll on the floor, laughing! || rofl || 0.89 || || || || || || '''false positive examples:''' || || || || Bundesrepublik Deutschland || brb || 0.63 || || Graduate Management in Admission Test (Educational Testing Service) || GNU || 0.49 || || ventricular fibrillation || BAT || 0.35 || The examples suggest, that a threshold of appromiately 0.9 should be applied in order to get reliable results. The values can, however, always be used as an additional evidence. == Evaluation/Performance == TODO == Specification == === Intitialisation === The SimilarityMeasure main class is {{{de.dfki.km.phaselib.impl.similarities.acronymMatch.AcronymMatcher}}} Initialisation is straight forward: {{{new AcronymMatcher()}}} === Parameters === || '''Parameter name''' || '''ValueType''' || '''Default''' || '''Description''' || || PARAM_MAX_ACRONYM_LENGTH || Integer || {{{12}}} || No acronyms longer than this value will be regarded, they will score 0.0 || || PARAM_MAX_EXTENSIONS || Integer || {{{12}}} || the maximal number of extensions extracted of a word || || PARAM_CASE_PENALTY || Float || {{{0.43}}} || the penalty given per acronym letter found wrong case in the term || || PARAM_JUMP_PENALTY || Float || {{{4.00}}} || a penalty which is given if a large part of the term is jumped over || || PARAM_INWORD_PENALTY || Float || {{{0.33}}} || the penalty given per acronym letter found within a word (not at the beginning) || === Dependencies === none == License Issues == TODO