tidy_stringdist.Rd
Tidy stringdist calculation
tidy_stringdist(df, v1 = V1, v2 = V2, method = c("osa", "lv", "dl", "hamming", "lcs", "qgram", "cosine", "jaccard", "jw", "soundex"), ...)
df | a dataframe containing the strings to compare |
---|---|
v1 | the name of the first columns |
v2 | the name of the second columns |
method | one of the methods implemented in the stringdist package — "osa", "lv", "dl", "hamming", "lcs", "qgram", "cosine", "jaccard", "jw", "soundex". See |
... | other parameters passed to |
a tibble with string distance
proust <- tidy_comb_all(c("Albertine", "Françoise", "Gilberte", "Odette", "Charles")) tidy_stringdist(proust)#> Warning: Non-printable ascii or non-ascii characters in soundex. Results may be unreliable. See ?printable_ascii.#> # A tibble: 10 x 12 #> V1 V2 osa lv dl hamming lcs qgram cosine jaccard jw #> * <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> #> 1 Albe… Fran… 7 7 7 7 12 10 0.497 0.692 0.444 #> 2 Albe… Gilb… 4 4 4 Inf 5 3 0.142 0.333 0.194 #> 3 Albe… Odet… 6 6 6 Inf 9 9 0.428 0.8 0.389 #> 4 Albe… Char… 8 8 8 Inf 12 10 0.544 0.75 0.579 #> 5 Fran… Gilb… 8 8 8 Inf 13 11 0.578 0.769 0.588 #> 6 Fran… Odet… 8 8 8 Inf 13 13 0.789 0.917 0.574 #> 7 Fran… Char… 7 7 7 Inf 12 8 0.496 0.667 0.495 #> 8 Gilb… Odet… 5 5 5 Inf 8 8 0.4 0.778 0.375 #> 9 Gilb… Char… 7 7 7 Inf 11 9 0.522 0.727 0.565 #> 10 Odet… Char… 6 6 6 Inf 11 11 0.761 0.9 0.563 #> # ... with 1 more variable: soundex <dbl>