Skip to content

Commit c713c62

Browse files
authored
Adding join challenge
1 parent be83cd8 commit c713c62

File tree

1 file changed

+27
-0
lines changed

1 file changed

+27
-0
lines changed

episodes/05-dplyr.Rmd

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -699,6 +699,33 @@ left %>%
699699
```
700700
This result makes it easier to see the accumulation of more SNPs at later generations, without us having to know the sample IDs.
701701

702+
::::::::::: challenge
703+
704+
## What about right joins?
705+
706+
1. How many rows and columns would you expect from the following right join?
707+
708+
`right_join(variants, metadata_sub, by = join_by(sample_id == run))`
709+
710+
2. How many rows and columns would you expect from the following right join?
711+
712+
`right_join(metadata_sub, variants, by = join_by(run == sample_id))`
713+
714+
Think carefully about the data in question and which data frame is on the right and which is on the left.
715+
716+
:::::::: solution
717+
718+
**Part 1** There will be 860 rows and 31 variables, just like the full join.
719+
All of the `sample_id`'s in the `variants` data frame have matches and will be kept and then it will also add on the `run` values that do not match but were represented in the `metadata_sub` data frame with empty info in the other columns since there is no matching rows in the `variants` data frame.
720+
721+
**Part 2** There will be 801 rows and 31 variables, just like the inner and left joins.
722+
This join should always match exactly the left join as it is the mirrored right join.
723+
It will only match the inner join if all of the samples in the `by` match-up in the right data frame are in the left data frame as well, otherwise it will drop the rows not listed in the left for the inner join.
724+
725+
:::::::::::::::::
726+
727+
:::::::::::::::::::::
728+
702729

703730
### Reshaping data frames - Extra
704731

0 commit comments

Comments
 (0)