Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
Package: fuzzyjoin
Type: Package
Title: Join Tables Together on Inexact Matching
Version: 0.1.7
Version: 0.1.8
Authors@R: c(person("David", "Robinson", email = "admiral.david@gmail.com",
role = c("aut", "cre")),
person("Jennifer", "Bryan", email = "jenny@rstudio.com", role = "ctb"),
Expand Down
4 changes: 4 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,7 @@
# fuzzyjoin 0.1.8

* Updated README links to remove stale redirects and a dead external URL that triggered CRAN URL check notes.

# fuzzyjoin 0.1.7

* fixing documentation to keep it on CRAN.
Expand Down
12 changes: 6 additions & 6 deletions R/geo_join.R
Original file line number Diff line number Diff line change
Expand Up @@ -50,12 +50,12 @@
#' pairs
#'
#' # plot them
#' library(ggplot2)
#' ggplot(pairs, aes(x = longitude.x, y = latitude.x,
#' xend = longitude.y, yend = latitude.y)) +
#' geom_segment(color = "red") +
#' annotation_borders("state") +
#' theme_void()
#' if (requireNamespace("ggplot2", quietly = TRUE)) {
#' ggplot2::ggplot(pairs, ggplot2::aes(x = longitude.x, y = latitude.x,
#' xend = longitude.y, yend = latitude.y)) +
#' ggplot2::geom_segment(color = "red") +
#' ggplot2::theme_void()
#' }
#'
#' # also get distances
#' s1 %>%
Expand Down
23 changes: 11 additions & 12 deletions R/regex_join.R
Original file line number Diff line number Diff line change
Expand Up @@ -15,21 +15,20 @@
#' @examples
#'
#' library(dplyr)
#' library(ggplot2)
#' data(diamonds)
#' if (requireNamespace("ggplot2", quietly = TRUE)) {
#' diamonds <- tibble::as_tibble(ggplot2::diamonds)
#'
#' diamonds <- tibble::as_tibble(diamonds)
#' d <- tibble::tibble(regex_name = c("^Idea", "mium", "Good"),
#' type = 1:3)
#'
#' d <- tibble::tibble(regex_name = c("^Idea", "mium", "Good"),
#' type = 1:3)
#' # When they are inner_joined, only Good<->Good matches
#' diamonds %>%
#' inner_join(d, by = c(cut = "regex_name"))
#'
#' # When they are inner_joined, only Good<->Good matches
#' diamonds %>%
#' inner_join(d, by = c(cut = "regex_name"))
#'
#' # but we can regex match them
#' diamonds %>%
#' regex_inner_join(d, by = c(cut = "regex_name"))
#' # but we can regex match them
#' diamonds %>%
#' regex_inner_join(d, by = c(cut = "regex_name"))
#' }
#'
#' @export
regex_join <- function(x, y, by = NULL, mode = "inner", ignore_case = FALSE) {
Expand Down
23 changes: 12 additions & 11 deletions R/stringdist_join.R
Original file line number Diff line number Diff line change
Expand Up @@ -22,20 +22,21 @@
#' @examples
#'
#' library(dplyr)
#' library(ggplot2)
#' data(diamonds)
#' if (requireNamespace("ggplot2", quietly = TRUE)) {
#' diamonds <- tibble::as_tibble(ggplot2::diamonds)
#'
#' d <- tibble::tibble(approximate_name = c("Idea", "Premiums", "Premioom",
#' "VeryGood", "VeryGood", "Faiir"),
#' type = 1:6)
#' d <- tibble::tibble(approximate_name = c("Idea", "Premiums", "Premioom",
#' "VeryGood", "VeryGood", "Faiir"),
#' type = 1:6)
#'
#' # no matches when they are inner-joined:
#' diamonds %>%
#' inner_join(d, by = c(cut = "approximate_name"))
#' # no matches when they are inner-joined:
#' diamonds %>%
#' inner_join(d, by = c(cut = "approximate_name"))
#'
#' # but we can match when they're fuzzy joined
#' diamonds %>%
#' stringdist_inner_join(d, by = c(cut = "approximate_name"))
#' # but we can match when they're fuzzy joined
#' diamonds %>%
#' stringdist_inner_join(d, by = c(cut = "approximate_name"))
#' }
#'
#' @export
stringdist_join <- function(x, y, by = NULL, max_dist = 2,
Expand Down
6 changes: 3 additions & 3 deletions README.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -16,15 +16,15 @@ fuzzyjoin: Join data frames on inexact matching
------------------

[![CRAN_Status_Badge](https://www.r-pkg.org/badges/version/fuzzyjoin)](https://cran.r-project.org/package=fuzzyjoin)
[![Travis-CI Build Status](https://travis-ci.org/dgrtwo/fuzzyjoin.svg?branch=master)](https://travis-ci.org/dgrtwo/fuzzyjoin)
[![Travis-CI Build Status](https://api.travis-ci.com/dgrtwo/fuzzyjoin.svg?branch=master)](https://app.travis-ci.com/dgrtwo/fuzzyjoin)
[![AppVeyor Build Status](https://ci.appveyor.com/api/projects/status/github/dgrtwo/fuzzyjoin?branch=master&svg=true)](https://ci.appveyor.com/project/dgrtwo/fuzzyjoin)
[![Coverage Status](https://img.shields.io/codecov/c/github/dgrtwo/fuzzyjoin/master.svg)](https://codecov.io/github/dgrtwo/fuzzyjoin?branch=master)
[![Coverage Status](https://img.shields.io/codecov/c/github/dgrtwo/fuzzyjoin/master.svg)](https://app.codecov.io/github/dgrtwo/fuzzyjoin?branch=master)


The fuzzyjoin package is a variation on dplyr's join operations that allows matching not just on values that match between columns, but on inexact matching. This allows matching on:

* Numeric values that are within some tolerance (`difference_inner_join`)
* Strings that are similar in Levenshtein/cosine/Jaccard distance, or [other metrics](http://finzi.psych.upenn.edu/library/stringdist/html/stringdist-metrics.html) from the [stringdist](https://cran.r-project.org/package=stringdist) package (`stringdist_inner_join`)
* Strings that are similar in Levenshtein/cosine/Jaccard distance, or other metrics from the [stringdist](https://cran.r-project.org/package=stringdist) package (`stringdist_inner_join`)
* A regular expression in one column matching to another (`regex_inner_join`)
* Euclidean or Manhattan distance across multiple columns (`distance_inner_join`)
* Geographic distance based on longitude and latitude (`geo_inner_join`)
Expand Down
6 changes: 3 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,15 +6,15 @@ fuzzyjoin: Join data frames on inexact matching
------------------

[![CRAN_Status_Badge](https://www.r-pkg.org/badges/version/fuzzyjoin)](https://cran.r-project.org/package=fuzzyjoin)
[![Travis-CI Build Status](https://travis-ci.org/dgrtwo/fuzzyjoin.svg?branch=master)](https://travis-ci.org/dgrtwo/fuzzyjoin)
[![Travis-CI Build Status](https://api.travis-ci.com/dgrtwo/fuzzyjoin.svg?branch=master)](https://app.travis-ci.com/dgrtwo/fuzzyjoin)
[![AppVeyor Build Status](https://ci.appveyor.com/api/projects/status/github/dgrtwo/fuzzyjoin?branch=master&svg=true)](https://ci.appveyor.com/project/dgrtwo/fuzzyjoin)
[![Coverage Status](https://img.shields.io/codecov/c/github/dgrtwo/fuzzyjoin/master.svg)](https://codecov.io/github/dgrtwo/fuzzyjoin?branch=master)
[![Coverage Status](https://img.shields.io/codecov/c/github/dgrtwo/fuzzyjoin/master.svg)](https://app.codecov.io/github/dgrtwo/fuzzyjoin?branch=master)


The fuzzyjoin package is a variation on dplyr's join operations that allows matching not just on values that match between columns, but on inexact matching. This allows matching on:

* Numeric values that are within some tolerance (`difference_inner_join`)
* Strings that are similar in Levenshtein/cosine/Jaccard distance, or [other metrics](http://finzi.psych.upenn.edu/library/stringdist/html/stringdist-metrics.html) from the [stringdist](https://cran.r-project.org/package=stringdist) package (`stringdist_inner_join`)
* Strings that are similar in Levenshtein/cosine/Jaccard distance, or other metrics from the [stringdist](https://cran.r-project.org/package=stringdist) package (`stringdist_inner_join`)
* A regular expression in one column matching to another (`regex_inner_join`)
* Euclidean or Manhattan distance across multiple columns (`distance_inner_join`)
* Geographic distance based on longitude and latitude (`geo_inner_join`)
Expand Down
26 changes: 20 additions & 6 deletions cran-comments.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,26 @@
# fuzzyjoin 0.1.6
# fuzzyjoin 0.1.8

This release makes fuzzyjoin compatible with dplyr v1.0.0, which is planned to be submitted to CRAN on 2020-05-15.
Resubmission after archival.

## Bug fixes and maintenance
This release addresses CRAN URL check issues in the README:

* Updates to internals to make compatible with dplyr 1.0.0 (#67, @hadley)
* Rebuilt site with pkgdown
* Updated stale redirected URLs (Codecov and Travis).
* Removed a dead external link.
* Updated one example that used a no-longer-available ggplot2 helper.

## R CMD check results

There were no ERRORs, WARNINGs or NOTEs.
### Local (macOS)

`R CMD check --as-cran fuzzyjoin_0.1.8.tar.gz`

* 0 ERRORs | 0 WARNINGs | 2 NOTEs
* Notes:
* New submission / package archived on CRAN
* unable to verify current time

### Win-builder (R-devel)

* 0 ERRORs | 0 WARNINGs | 1 NOTE
* NOTE: "Package was archived on CRAN"
* Log: https://win-builder.r-project.org/kSocccxGdXNv/00check.log
12 changes: 6 additions & 6 deletions man/geo_join.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

23 changes: 11 additions & 12 deletions man/regex_join.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

23 changes: 12 additions & 11 deletions man/stringdist_join.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 2 additions & 2 deletions tests/testthat/test_regex_join.R
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,8 @@ d <- tibble::tibble(cut_regex = c("^Idea", "emiu",
dplyr::mutate(type = dplyr::row_number())

test_that("regex joins work", {
library(ggplot2)
data("diamonds")
testthat::skip_if_not_installed("ggplot2")
diamonds <- tibble::as_tibble(ggplot2::diamonds)
j <- diamonds %>%
regex_inner_join(d, by = c(cut = "cut_regex"))

Expand Down
7 changes: 6 additions & 1 deletion tests/testthat/test_stringdist_join.R
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,12 @@ d <- tibble::tibble(
) %>%
dplyr::mutate(type = dplyr::row_number())

if (!requireNamespace("ggplot2", quietly = TRUE)) {
testthat::skip("ggplot2 not installed")
}

diamonds <- tibble::as_tibble(ggplot2::diamonds)

test_that("stringdist_inner_join works on a large df with multiples in each", {
# create something with names close to the cut column in the diamonds dataset
j <- stringdist_inner_join(diamonds, d, by = c(cut = "cut2"), distance_col = "distance")
Expand Down Expand Up @@ -332,4 +338,3 @@ test_that("stringdist_ joins where there are no overlapping rows still get a dis
result <- stringdist_anti_join(a, b, by = c(x = "y"), max_dist = 1, distance_col = "distance")
expect_equal(a, result)
})