Tidy stringdist calculation

tidy_stringdist(df, v1 = V1, v2 = V2, method = c("osa", "lv", "dl",
  "hamming", "lcs", "qgram", "cosine", "jaccard", "jw", "soundex"), ...)

Arguments

df

a dataframe containing the strings to compare

v1

the name of the first columns

v2

the name of the second columns

method

one of the methods implemented in the stringdist package — "osa", "lv", "dl", "hamming", "lcs", "qgram", "cosine", "jaccard", "jw", "soundex". See stringdist-metrics

...

other parameters passed to stringdist

Value

a tibble with string distance

Examples

proust <- tidy_comb_all(c("Albertine", "Françoise", "Gilberte", "Odette", "Charles")) tidy_stringdist(proust)
#> Warning: Non-printable ascii or non-ascii characters in soundex. Results may be unreliable. See ?printable_ascii.
#> # A tibble: 10 x 12 #> V1 V2 osa lv dl hamming lcs qgram cosine jaccard jw #> * <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> #> 1 Albe… Fran… 7 7 7 7 12 10 0.497 0.692 0.444 #> 2 Albe… Gilb… 4 4 4 Inf 5 3 0.142 0.333 0.194 #> 3 Albe… Odet… 6 6 6 Inf 9 9 0.428 0.8 0.389 #> 4 Albe… Char… 8 8 8 Inf 12 10 0.544 0.75 0.579 #> 5 Fran… Gilb… 8 8 8 Inf 13 11 0.578 0.769 0.588 #> 6 Fran… Odet… 8 8 8 Inf 13 13 0.789 0.917 0.574 #> 7 Fran… Char… 7 7 7 Inf 12 8 0.496 0.667 0.495 #> 8 Gilb… Odet… 5 5 5 Inf 8 8 0.4 0.778 0.375 #> 9 Gilb… Char… 7 7 7 Inf 11 9 0.522 0.727 0.565 #> 10 Odet… Char… 6 6 6 Inf 11 11 0.761 0.9 0.563 #> # ... with 1 more variable: soundex <dbl>