Changes between Initial Version and Version 1 of similarity_SimPackAdapter


Ignore:
Timestamp:
03/01/10 14:35:59 (14 years ago)
Author:
kiesel
Comment:

spam: reset to 2006 page version

Legend:

Unmodified
Added
Removed
Modified
  • similarity_SimPackAdapter

    v1 v1  
     1[[TracNav]] 
     2 
     3= SimilarityMeasure: SimPack Adapter = 
     4 
     5Developer: [mailto:endres(at)dfki.uni-kl.de Björn Endres] 
     6 
     7== Description == 
     8This module tries to use any measure provided by the SimPack project as a PhaseLibs SimilarityMeasure. Since the two projects follow a completely different approach, the adapter needs a set of parameters to work properly. There is, however, a small database of these parameters included in the module, so just try with a SimPack measure configuration of the list below. If the measure config you need is not (yet) in the database, an autodetection process is started to guess the needed parameters. This may produce weird results, especially since the SimPack measures are not normalised in any way. 
     9 
     10The internal measure database is being extended constantly, but feel free to request the integration of a specific measure if you need it. 
     11 
     12 
     13== Evaluation/Performance == 
     14 
     15The SimPack project follows the approach of using one measure object in order to calculate a single value. Thus, calculating a similarity matrix requires the generation of a large number of such measure objects. Therefore the algorithm is performing suboptimal. However, this adapter allows the easy integration of a large number of measures. 
     16 
     17 
     18== Characteristics == 
     19The following list contains SimPack measures that have already been included into the parameter database. 
     20 
     21|| '''Config Name''' || '''SimPack measure class used''' || '''Description''' || '''Status''' || '''Remarks''' || 
     22|| || || || || || 
     23|| '''SimPack internal measures:''' || || || 
     24|| ~~Jaro~~ || {{{simpack.measure.string.Jaro}}} || Jaro metric || buggy || Original SimPack measure seems to be '''buggy'''. Use the other Jaro implementation instead || 
     25 
     26|| || || || || || 
     27|| '''Simmetrics based:''' || ({{{simpack.measure.external.simmetrics.-}}}) || || || 
     28|| SMJaro || {{{Jaro}}} || Jaro metric || works (unconfirmed)  || none || 
     29 
     30|| || || || || || 
     31|| '''Secondstring based:''' || ({{{simpack.measure.external.secondstring.-}}}) || || || 
     32|| SSLevenshtein || {{{Levenshtein}}} || Levenshtein editing distance || works (unconfirmed) || none || 
     33|| SSJaro || {{{Jaro}}} || Jaro metric || works (unconfirmed)  || none || 
     34|| SSMongeElkan || {{{MongeElkan}}} || The match method proposed by Monge and Elkan || works (unconfirmed)  || none || 
     35|| SSSLIM || {{{SLIM}}} || Experimental, invented by SecondString author William Cohen. || works (unconfirmed) || none || 
     36|| ~~SSSmithWaterman~~ || {{{SmithWaterman}}} || Smith-Waterman string distance, following Durban et al. || '''not working''' || normalisation fails || 
     37|| SSNeedlemanWunsch || {{{NeedlemanWunsch}}} || Needleman-Wunsch string distance, following Durban et al. Sec 2.3. || works (unconfirmed)  || none || 
     38|| ~~SSMixture~~ || {{{Mixture}}} || Mixture-based distance metric. || '''not working''' || Bug found in com.wcohen.ss.Mixture.java:48 || 
     39|| SSWinklerRescorerOnJaro || {{{WinklerRescorer}}} || Winkler's reweighting scheme for distance metrics, applied to the Jaro metric ('An Application of the Fellegi-Sunter Model of Record Linkage to the 1990 U.S. Decennial Census' by William E. Winkler and Yves Thibaudeau.) || works (unconfirmed) || none || 
     40|| ~~SSSoftTFIDF~~ || {{{SoftTFIDF}}} || TFIDF-based distance metric, extended to use "soft" token-matching. Specifically, tokens are considered a partial match if they get a good score using an inner string comparator. || '''not working''' || strange results || 
     41 
     42|| || || || || || 
     43|| '''Tree based:''' || ({{{simpack.measure.tree.-}}}) || || || 
     44|| SubTreeEditDistance || {{{TreeEditDistance}}} || This implements an edit distance calculation for the class-rooted subtrees. The algorithm is taken from Gabriel Valientes book "Algorithms on trees and graphs" (Springer) and described in chapter 2.1 "The tree edit distance problem". || works (unconfirmed)  || none || 
     45 
     46== Specification == 
     47=== Intitialisation === 
     48The SimilarityMeasure class is 
     49 {{{de.dfki.km.phaselib.impl.similarities.simPack.SimPackMeasure}}} 
     50 
     51The initialisation should be straight forward in most cases, and simply takes the SimPack measure class as an argument.  
     52'''Example:''' 
     53{{{ 
     54   SimilarityMeasure measure =  
     55      new SimPackMeasure("SSJaro"); 
     56}}} 
     57 
     58If you want to let the system to guess the config for a given measure, give it the class name: 
     59'''Example:''' 
     60{{{ 
     61   SimilarityMeasure measure =  
     62      new SimPackMeasure(Jaro.class.getName()); 
     63}}} 
     64 
     65If the measure used is not present in the internal database, i.e. not listed above, or if you want to use the measure with custom settings, you would have to use the full constructor: 
     66{{{ 
     67   SimilarityMeasure measure = new SimPackMeasure( 
     68 
     69      // well, of course you would still need the class 
     70      the_measureClass_I_want_to_use, // (e.g. Jaro, like above) 
     71       
     72      // this should intilialise the measure in the way you want it to 
     73      my_Initialiser_object,           
     74       
     75      // this will (re)norm the values calculated 
     76      my_ValueTransformer_object,      
     77 
     78      // this will provide the measure with IAccessor-objects 
     79      my_AccessorFactory_object); 
     80}}} 
     81 
     82=== Parameters === 
     83TODO 
     84 
     85=== Dependencies === 
     86Depending on the SimPack measures wrapped. Refer to the SimPack dependencies.  
     87 
     88== License Issues == 
     89The SimPack project is published under the following Creative Commons license: 
     90{{{ 
     91#!html 
     92<a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/2.5/"><img alt="Creative Commons License" src="http://www.ifi.unizh.ch/ddis/uploads/RTEmagicC_315761ebd2.png.png" border="0" height="31" width="88" /></a> 
     93}}}