exam_scores <- c(65, 78, 92, 88, 73)
mean_score <- mean(exam_scores)
sd_score <- sd(exam_scores)
mean_score[1] 79.2
sd_score[1] 10.98636
exam_scores containing the values 65, 78, 92, 88, and 73. Calculate the mean and standard deviation using built-in functions.[1] 79.2
[1] 10.98636
df with a column named “price”:
df$pricedf[["price"]]df["price"]df$price: Returns a vector (atomic vector of the column’s underlying type). Most convenient for interactive use but doesn’t support programmatic column names well.df[["price"]]: Returns a vector (same type as $). Preferred for programmatic access since it accepts character strings and handles special characters in column names.df["price"]: Returns a data.frame containing only the “price” column (one-column subset). Maintains data frame structure.student name <- "Maria"age <- twenty fiveErrors:
student name is invalid)twenty five is interpreted as an undefined object)Fixed code:
countries with three columns: name (character), population (numeric in millions), and continent (character). Include data for at least three countries. name population continent
1 Netherlands 17.8 Europe
2 Brazil 214.3 South America
3 Japan 125.7 Asia
x <- 10 followed by x <- x + 5? Explain what happens in memory during this operation.After execution, x equals 15.
Memory behavior:
x <- 10) allocates memory for the value 10 and binds the symbol x to it.x <- x + 5) retrieves the current value of x (10), computes 10 + 5 = 15, allocates new memory for 15, and rebinds x to this new location. The original memory holding 10 becomes eligible for garbage collection. R uses copy-on-modify semantics—objects are immutable, so reassignment creates a new object rather than modifying the existing one in place.ls() and see objects named data, data_clean, and data_final. Why is this naming convention preferable to repeatedly overwriting a single object called data?This convention preserves data provenance and enables reproducibility:
v <- c(5, 10, 15, 20, 25), write R code to:
[1] 15
[1] 10 15 20
[1] 20 25
employees with columns name, department, and salary:
employees[3, 2] and employees[3, "department"]# a) All Finance employees
employees[employees$department == "Finance", ]
# b) Names of high earners
employees$name[employees$salary > 70000]
# c) Explanation:
# employees[3, 2] accesses row 3, column 2 by POSITION (numeric index)
# employees[3, "department"] accesses row 3, column named "department" by NAME
# Both return the same value if "department" is the second column, but the named approach is safer against column reorderingexperiment <- list(trial1 = c(1.2, 1.5, 1.3), trial2 = c(2.1, 2.4, 2.0), success = TRUE), how would you:
trial1 vector?trial2?[1] 1.2 1.5 1.3
[1] 2.4
[1] TRUE
R uses 1-based indexing primarily due to its origins in statistical computing environments (like S language) where human readability was prioritized—researchers naturally count starting from 1.
Common error: Attempting to access vector[0] returns an empty vector (not an error), causing silent failures in loops or subsetting. For example, for(i in 0:4) print(v[i]) would skip the first element and print four empty results before accessing valid indices.
students_df from Part A have grades above 8.0. Use this vector to subset the data frame to show only high-performing students.This analogy clarifies that APIs act as intermediaries: you don’t need to know kitchen operations (server implementation details) to get your meal (data). You simply make a request through the waiter (API) using a standard protocol (menu), and receive a prepared response.
429 Too Many Requests: Client has exceeded rate limits. Response: Implement exponential backoff, reduce request frequency, or check API documentation for quota limits.
503 Service Unavailable: Server is temporarily down/maintenance (server-side issue). Response: Retry later with backoff; not caused by client behavior.
Key difference: 429 is client-induced (fix by throttling requests); 503 is server-induced (requires waiting for service restoration).
https://api.example.com/v2/products?category=electronics&limit=10&api_key=abc123httpsapi.example.com/v2/productscategory = "electronics"limit = "10"api_key = "abc123"Why do most APIs require authentication via API keys rather than allowing completely open access? Name two legitimate reasons API providers implement this requirement.
You need weather data for Paris, Berlin, and Rome. Why is it better to make three separate API requests (one per city) rather than downloading a complete global weather dataset containing billions of records?
Convert this JSON structure into its equivalent R objects (specify whether each becomes a vector, list, or data frame):
"university" → character vector (length 1): "Utrecht"
"departments" → character vector: c("Economics", "Computer Science", "Law")
"enrollment" → data frame (after conversion):
Note: When parsed with jsonlite::fromJSON(), the entire structure becomes a list containing these components.
GET() from the httr package, why should you always check status_code(response) before attempting to parse the content? What error might occur if you skip this step?Checking status code prevents attempting to parse error responses (e.g., HTML error pages) as valid data. Skipping this may cause:
jsonlite::fromJSON() to fail with cryptic parsing errors when receiving HTML instead of JSONstatus_code(response) == 200 before parsing.Error in open.connection(con, "rb") : HTTP error 401. What is the most likely cause, and what steps should you take to resolve it?Cause: 401 Unauthorized indicates missing, invalid, or expired authentication credentials (e.g., incorrect API key).
Resolution steps:
/status)Design a safe workflow for using an API key in your R project that prevents accidental exposure when sharing code on GitHub. Describe two specific techniques you would implement.
Store keys in .Renviron:
Use usethis::edit_r_environ() to add API_KEY="your_key" to your user-level .Renviron file (outside project directory). Access via Sys.getenv("API_KEY") in scripts. Never commit .Renviron to version control.
Include startup validation in scripts:
This fails safely if keys are missing while preventing accidental commits of credentials.