KMRCLを眺める(229) SCORE-MULTIWORD-MATCH — #:g1

Posted 2010-11-26 14:51:00 GMT

xml-utils.lispも眺め終えたので、今回はKMRCLのstrmatch.lispから、SCORE-MULTIWORD-MATCHです。
名前からすると、与えられた引数の文字列の類似度を測定する関数のようです。
定義は、

(defun score-multiword-match (s1 s2)
  "Score a match between two strings with s1 being reference string.
S1 can be a string or a list or strings/conses"
  (let* ((word-list-1 (if (stringp s1)
                          (split-alphanumeric-string s1)
                        s1))
         (word-list-2 (split-alphanumeric-string s2))
         (n1 (length word-list-1))
         (n2 (length word-list-2))
         (unmatched n1)
         (score 0))
    (declare (fixnum n1 n2 score unmatched))
    (decf score (* 4 (abs (- n1 n2))))
    (dotimes (iword n1)
      (declare (fixnum iword))
      (let ((w1 (nth iword word-list-1))
            pos)
        (cond
         ((consp w1)
          (let ((first t))
            (dotimes (i-alt (length w1))
              (setq pos
                (position (nth i-alt w1) word-list-2
                          :test #'string-equal))
              (when pos
                (incf score (- 30
                               (if first 0 5)
                               (abs (- iword pos))))
                (decf unmatched)
                (return))
              (setq first nil))))
         ((stringp w1)
          (kmrcl:awhen (position w1 word-list-2
                               :test #'string-equal)
                       (incf score (- 30 (abs (- kmrcl::it iword))))
                       (decf unmatched))))))
    (decf score (* 4 unmatched))
    score))
となっていますが、どうも自分には、使い方がいまいち不明でした
(kl:score-multiword-match "foo" "foo")
;=> 30

(kl:score-multiword-match '("foo") "foo") ;=> 30

(kl:score-multiword-match '("foo" "foo") "foo") ;=> 55

(kl:score-multiword-match '("foo" "foo" "foo") "foo") ;=> 79

(kl:score-multiword-match '("foo" "foo" "foo" "foo") "foo") ;=> 102

(kl:score-multiword-match '("foo" "foo" "foo" "foa") "foo") ;=> 71

(kl:score-multiword-match '("foo" "fao" "foo" "foa") "foo") ;=> 38

有名なアルゴリズムだったりするのでしょうか。

comments powered by Disqus