inflation_rates <- c(2.1, 3.4, 1.8, 4.2, 2.9)
inflation_rates[3][1] 1.8
inflation_rates containing these values: 2.1, 3.4, 1.8, 4.2, 2.9. What is the result of inflation_rates[3]?countries with elements: “Netherlands”, “Germany”, “France”, “Italy”. Use negative indexing to return all countries except “Germany” (the second element). Then confirm you get the same result using positive indexing. When would negative indexing be more convenient than positive indexing?[1] "Netherlands" "France" "Italy"
[1] "Netherlands" "France" "Italy"
Both return "Netherlands" "France" "Italy". Negative indexing is more convenient when you want to exclude a small number of elements from a long vector — specifying what to drop is simpler than listing all positions to keep.
x <- c(10, 20, 30, 40, 50), what does x[c(2, 4)] return? What does x[x > 25] return?[1] 20 40
[1] 30 40 50
cities with three columns:
name: “Amsterdam”, “Rotterdam”, “The Hague”population: 872680, 651406, 545838province: “Noord-Holland”, “Zuid-Holland”, “Zuid-Holland”[1] 651406
[1] 651406
[1] 651406
All three return 651406.
df with columns x and y:
df$xdf[["x"]]df[, "x"]Try running all three on your cities data frame. Is the output always identical?
df$x: Uses dollar-sign notation; convenient but does not support programmatic column names (e.g., df$var fails if var is a character object containing "x").df[["x"]]: Extracts a single column as a vector; supports programmatic access (e.g., col <- "x"; df[[col]]) and is safer with non-syntactic column names.df[, "x"]: Returns a data frame with one column by default (use drop = TRUE to get a vector); preserves data frame structure and supports matrix-style subsetting.[1] 872680 651406 545838
[1] 872680 651406 545838
[1] 872680 651406 545838
The output is identical for a regular data frame — all three return the population column as a vector. However, df[, "x"] can behave differently in some contexts (e.g., inside functions or with tibbles, where it may return a tibble instead of a vector).
Your project folder structure looks like this:
my_project/
├── data/
│ └── gdp.csv
├── scripts/
│ └── analysis.R
└── report.qmd
If your working directory is my_project/scripts/, write the relative path to access gdp.csv. Then write the equivalent here() call (assuming my_project is your Positron project root).
[1] "../data/gdp.csv"
[1] "/home/bas/Documents/git/iads_website/data/gdp.csv"
here package approach, what single command would reliably read gdp.csv regardless of your current working directory (assuming you’ve opened my_project as your Positron project)?list.files("../data") do when your working directory is my_project/scripts/?Lists all files in the ../data directory relative to scripts/—i.e., it lists files in my_project/data/.
here("data", "filename.csv") generally safer than "./data/filename.csv" in scripts?here() constructs paths relative to the project root (detected via .Rproj, .here, or version control files), making scripts portable across sessions and users. "./data/..." depends on the current working directory, which may change during an R session and cause path failures.
.csv files in your project’s data/ folderstudents.csv file here and put it into a folder named tutorials in your working directory. Read the file students.csv from your tutorials/ folder using read_delim(). How many rows and columns does the resulting data frame have?Rows: 3 Columns: 4
── Column specification ────────────────────────────────────────────────────────
Delimiter: ";"
chr (1): name
dbl (1): id
num (1): gpa
lgl (1): has_job
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
[1] 3
[1] 4
students.csv file is available online at: http://basm92.quarto.pub/intro-to-applied-data-science/tutorials/students.csv. Write the code to read this remote file directly into R.read_csv(), you notice that a column containing postal codes (e.g., “1012 AB”) is being converted to numeric. How would you prevent this and keep it as character/text?read_csv() (from readr) and read.csv() (base R) when importing large datasets? Name at least two advantages of read_csv().read_csv() is significantly faster and more memory-efficient for large files.tibble, which prints neatly and avoids row name complications.gdp_data <- read_csv(here("tutorials", "gdp.csv")), write code to:
Rows: 30 Columns: 5
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (1): country
dbl (4): year, gdp_per_capita, population_millions, inflation_rate
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# A tibble: 6 × 5
country year gdp_per_capita population_millions inflation_rate
<chr> <dbl> <dbl> <dbl> <dbl>
1 Netherlands 2019 52300 17.3 4.8
2 Netherlands 2020 51800 17.4 7.9
3 Netherlands 2021 54200 17.5 3.8
4 Netherlands 2022 56700 17.6 3.9
5 Netherlands 2023 58900 17.7 6.5
6 Belgium 2019 47800 11.4 4.5
[1] "country" "year" "gdp_per_capita"
[4] "population_millions" "inflation_rate"
spc_tbl_ [30 × 5] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
$ country : chr [1:30] "Netherlands" "Netherlands" "Netherlands" "Netherlands" ...
$ year : num [1:30] 2019 2020 2021 2022 2023 ...
$ gdp_per_capita : num [1:30] 52300 51800 54200 56700 58900 47800 NA 48900 50100 51300 ...
$ population_millions: num [1:30] 17.3 17.4 17.5 17.6 17.7 11.4 11.5 11.6 11.7 11.8 ...
$ inflation_rate : num [1:30] 4.8 7.9 3.8 3.9 6.5 4.5 8.2 1.5 7.6 2.8 ...
- attr(*, "spec")=
.. cols(
.. country = col_character(),
.. year = col_double(),
.. gdp_per_capita = col_double(),
.. population_millions = col_double(),
.. inflation_rate = col_double()
.. )
- attr(*, "problems")=<pointer: 0x5559ee2d88a0>
eurostat.xlsx that contains multiple sheets.1 How would you:
here() starts at /home/bas/Documents/git/iads_website
[1] "Sheet1" "Population_2023"
# A tibble: 5 × 2
country Population_2023
<chr> <dbl>
1 Netherlands 18
2 Germany 83
3 Belgium 11
4 France 67
5 Italy 63
read_excel(), you notice that date columns are being imported as character strings instead of proper dates. What parameter would you use to specify the correct column type during import?Use the col_types parameter:
trade_data.xlsx has column headers starting on row 3 (rows 1-2 contain metadata). How would you skip these first two rows when reading the data?survey_results.txt where missing values are coded as “NA” and “MISSING”.2 Write code to read this file while treating both codes as missing values (NA).Rows: 10 Columns: 6
── Column specification ────────────────────────────────────────────────────────
Delimiter: "\t"
chr (2): gender, comments
dbl (4): respondent_id, age, satisfaction, score
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# A tibble: 10 × 6
respondent_id age gender satisfaction score comments
<dbl> <dbl> <chr> <dbl> <dbl> <chr>
1 101 28 Male 5 92 Great service
2 102 35 Female 4 85 <NA>
3 103 42 Male NA 78 <NA>
4 104 29 Female 5 95 Excellent experience
5 105 51 Male 3 70 <NA>
6 106 33 Female NA NA <NA>
7 107 45 Male 4 88 Good but slow
8 108 39 Female 2 NA Needs improvement
9 109 26 Male 5 97 Perfect
10 110 48 Female NA 75 <NA>
gdp.csv from your tutorials folderhigh_income_avg
Attaching package: 'dplyr'
The following objects are masked from 'package:stats':
filter, lag
The following objects are masked from 'package:base':
intersect, setdiff, setequal, union
Rows: 30 Columns: 5
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (1): country
dbl (4): year, gdp_per_capita, population_millions, inflation_rate
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
[1] 48738.89