Skip to content

Using dataspice for multiple datasets #112

Open
@robitalec

Description

@robitalec

Continuing our discussion from #110, I found two obvious hurdles when using dataspice for multiple datasets. In this example, I am splitting up the mtcars example data into an uneven and overlapping set of columns, and distinct set of rows. Then using create_spice, prep_attributes and prep_access, followed by edit_* to setup our metadata files.

Setup

library(dataspice)

dir.create('data')
write.csv(mtcars[1:10, 1:4], 'data/mtcars1.csv')
write.csv(mtcars[11:20, 2:6], 'data/mtcars2.csv')

prep_access()

# The following fileNames have been added to the access file: mtcars1.csv, mtcars2.csv

prep_attributes()

# The following variableNames have been added to the attributes file for mtcars1.csv: X1, mpg, cyl, disp, hp
# The following variableNames have been added to the attributes file for mtcars2.csv: X1, cyl, disp, hp, drat, wt
# Warning messages:
# 1: Missing column names filled in: 'X1' [1] 
# 2: Missing column names filled in: 'X1' [1] 

Then I added some filler information to the metadata. Here are those files zipped: metadata.zip

edit_access()
edit_attributes()
edit_biblio()
edit_creators()

In this example biblio, I added another row for "mtcars2" as suggested in the Shiny app with a right click. It looks like this:

read.csv('data/metadata/biblio.csv')

#     title description datePublished
# 1 mtcars 1          NA          1974
# 2 mtcars 2          NA          1974

#                                              citation
# 1 Henderson and Velleman (1981), Building multiple regression models interactively. Biometrics, 37, 391–411.
# 2 Henderson and Velleman (1981), Building multiple regression models interactively. Biometrics, 37, 391–411.

#   keywords license funder geographicDescription northBoundCoord
# 1       NA      NA     NA                    NA              47
# 2       NA      NA     NA                    NA              57

#   eastBoundCoord southBoundCoord westBoundCoord wktString  startDate
# 1            -98              32           -120        NA 1974-01-01
# 2            -88              42           -110        NA 1974-01-01
 
#     endDate
# 1 1975-01-01
# 2 1975-01-01

Challenges

In write_spice(), we get a warning from the is.na(biblio$keyworks) check, which is only expecting keywords from one row of data.

https://github.com/ropensci/dataspice/blob/main/R/write_spice.R#L67

write_spice()
Warning message:
In if (is.na(biblio$keywords)) { :
  the condition has length > 1 and only the first element will be used

In build_site(), we get an error trying to parse the boxes described in data/metadata/biblio.csv. I was expecting this to simply generate two boxes, instead of one when we are using a single dataset.

build_site()

# Error: Failed to parse box in spatialCoverage$geo$box of '47 -98 32 -12057 -88 42 -110'. 

If you try and remove the second set of east/west/north/south coordinates, the same error occurs:

build_site()

# Error: Failed to parse box in spatialCoverage$geo$box of '47 -98 32 -120NA NA NA NA'. 

This error occurs in build_site() but originates in write_spice() (L88) as the output spatialCoverage is an unexpected list of length 2.

write_spice()

# In dataspice.json
# ...
#  "spatialCoverage": {
#     "type": "Place",
#     "name": [null, null],
#     "geo": {
#       "type": "GeoShape",
#       "box": ["47 -98 32 -120", "37 -88 42 -130"]
#    }
#  }

Within build_site(), the error occurs in the length check == 1 in function parse_GeoShape_box().

biblio <- read.csv('data/metadata/biblio.csv')

box <- paste(biblio$northBoundCoord, biblio$eastBoundCoord,
            biblio$southBoundCoord, biblio$westBoundCoord)
box

# [1] "47 -98 32 -120" "37 -88 42 -130"

tokens <- stringr::str_split(box, " ")

tokens

# [[1]]
# [1] "47"   "-98"  "32"   "-120"

# [[2]]
# [1] "37"   "-88"  "42"   "-130"

if (!length(tokens) == 1) {
  stop("Failed to parse box in spatialCoverage$geo$box of '", 
       box, "'.", call. = FALSE)
}

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions