---
title: "Logic and types in R"
subtitle: "Lecture 02"
author: "Dr. Colin Rundel"
footer: "Sta 523 - Fall 2022"
format:
  revealjs:
    theme: slides.scss
    transition: fade
    slide-number: true
    self-contained: true
execute: 
  echo: true
---


```{r setup, message=FALSE, warning=FALSE, include=FALSE}
options(
  htmltools.dir.version = FALSE, # for blogdown
  width=80
)

```


# In R (almost) <br/> everything is a vector


## Vectors

The fundamental building block of data in R are vectors (collections of related values, objects, data structures, etc). 

. . .

R has two types of vectors:

* **atomic** vectors (*vectors*)

    - homogeneous collections of the *same* type (e.g. all `true`/`false` values, all numbers, or all character strings).

* **generic** vectors (*lists*)
  
    - heterogeneous collections of *any* type of R object, even other lists
    (meaning they can have a hierarchical/tree-like structure).


# Atomic Vectors


## Atomic Vectors

R has six atomic vector types, we can check the type of any object in R using the `typeof()` function

| `typeof()`  |  `mode()`  |
|:------------|:-----------|
| logical     |  logical   |
| double      |  numeric   |
| integer     |  numeric   |
| character   |  character |
| complex     |  complex   |
| raw         |  raw       |



::: {.aside}
Mode is a higher level abstraction, we will discuss this in detail a bit later.

There are additional types in R, e.g. `list`, `closure`, `environment`, etc. we will see these in the next couple of weeks. See `?typeof` for more information.
:::




## `logical` - boolean values (`TRUE` and `FALSE`)

:::: {.columns}

::: {.column width='50%'}
```{r}
typeof(TRUE)
typeof(FALSE)
```
:::

::: {.column width='50%'}
```{r}
mode(TRUE)
mode(FALSE)
```
:::

::::

. . .

::: {.medium}
R will let you use `T` and `F` as shortcuts to `TRUE` and `FALSE`, this is a bad practice as these values are actually *global variables* that can be overwritten.
:::

```{r}
T
T = FALSE
T
```



## `character` - text strings

Either single or double quotes are fine, opening and closing quote must match.

:::: {.columns}
::: {.column width='50%'}
```{r}
typeof("hello")
typeof('world')
```
:::

::: {.column width='50%'}
```{r}
mode("hello")
mode('world')
```
:::
::::

. . .

<br/>

Quote characters can be included by escaping or using a non-matching quote.

:::: {.columns}
::: {.column width='50%'}
```{r}
"abc'123"
'abc"123'
```
:::

::: {.column width='50%'}
```{r}
"abc\"123"
'abc\'123'
```
:::
::::

::: {.aside}
RStudio's syntax highlighting is helpful here to indicate where it thinks a string begins and ends.
:::




## Numeric types

`double` - floating point values (these are the default numerical type)

:::: {.columns}
::: {.column width='50%'}
```{r}
typeof(1.33)
typeof(7)
```
:::

::: {.column width='50%'}
```{r}
mode(1.33)
mode(7)
```
:::
::::

. . .

<br/>

`integer` - integer values (literals are indicated with an `L` suffix)

<div>

:::: {.columns}
::: {.column width='50%'}
```{r}
typeof( 7L )
typeof( 1:3 )
```
:::

::: {.column width='50%'}
```{r}
mode( 7L )
mode( 1:3 )
```
:::
::::

</div>



## Concatenation

Atomic vectors can be grown (combined) using the concatenate `c()` function.

```{r}
c(1, 2, 3)
```

. . .

```{r}
c("Hello", "World!")
```

. . .

```{r}
c(1, 1:10)
```

. . .

```{r}
c(1,c(2, c(3)))
```

::: {.aside}
**Note** - atomic vectors are inherently flat.
:::




## Inspecting types

* `typeof(x)` - returns a character vector (length 1) of the *type* of object `x`.

* `mode(x)` - returns a character vector (length 1) of the *mode* of object `x`.


:::: {.columns}

::: {.column width='50%'}
```{r}
typeof(1)
typeof(1L)
typeof("A")
typeof(TRUE)
```
:::

::: {.column width='50%'}
```{r}
mode(1)
mode(1L)
mode("A")
mode(TRUE)
```
:::
::::



## Type Predicates

* `is.logical(x)`   - returns `TRUE` if `x` has *type* `logical`.
* `is.character(x)` - returns `TRUE` if `x` has *type* `character`.
* `is.double(x)`    - returns `TRUE` if `x` has *type* `double`.
* `is.integer(x)`   - returns `TRUE` if `x` has *type* `integer`.
* `is.numeric(x)`   - returns `TRUE` if `x` has *mode* `numeric`.

::::: {.medium}
:::: {.columns}
::: {.column width='33%'}
```{r}
is.integer(1)
is.integer(1L)
is.integer(3:7)
```
:::

::: {.column width='33%'}
```{r}
is.double(1)
is.double(1L)
is.double(3:8)
```
:::

::: {.column width='33%'}
```{r}
is.numeric(1)
is.numeric(1L)
is.numeric(3:7)
```
:::
::::
:::::



## Other useful predicates

* `is.atomic(x)` - returns `TRUE` if `x` is an *atomic vector*.
* `is.list(x)` - returns `TRUE` if `x` is a *list* (generic vector).
* `is.vector(x)` - returns `TRUE` if `x` is either an *atomic* or *generic* vector.

:::: {.columns}
::: {.column width='50%'}
```{r}
is.atomic(c(1,2,3))
is.list(c(1,2,3))
is.vector(c(1,2,3))
```
:::

::: {.column width='50%'}
```{r}
is.atomic(list(1,2,3))
is.list(list(1,2,3))
is.vector(list(1,2,3))
```
:::
::::




## Type Coercion

R is a dynamically typed language -- it will automatically convert between most types without raising warnings or errors. Keep in mind the rule that atomic vectors must always contain values of the same type.

```{r}
c(1, "Hello")
```

. . .

```{r}
c(FALSE, 3L)
```

. . .

```{r}
c(1.2, 3L)
```

. . .

```{r}
c(FALSE, "Hello")
```



## Operator coercion

Builtin operators and functions (e.g. `+`, `&`, `log()`, etc.) will generally attempt to coerce values to an appropriate type for the given operation

<div>
:::: {.columns}
::: {.column width='50%'}
```{r}
3.1+1L
5 + FALSE
```
:::

::: {.column width='50%'}
```{r}
log(1)
log(TRUE)
```
:::
::::
</div>

. . .

<br/><br/>

:::: {.columns}

::: {.column width='50%'}
```{r}
TRUE & FALSE
TRUE & 7
```
:::

::: {.column width='50%'}
```{r}
TRUE | FALSE
FALSE | !5
```
:::
::::




## Explicit Coercion

Most of the `is` functions we just saw have an `as` variant which can be used for *explicit* coercion.

:::: {.columns}

::: {.column width='50%'}
```{r}
as.logical(5.2)
as.character(TRUE)
as.integer(pi)
```
:::

::: {.column width='50%'}
```{r}
as.numeric(FALSE)
as.double("7.2")
as.double("one")
```
:::
::::

# Missing Values

## Missing Values
  
R uses `NA` to represent missing values in its data structures, what may not be obvious is that there are different `NA`s for different atomic types.

:::: {.columns}
::: {.column width='50%'}
```{r}
typeof(NA)
typeof(NA+1)
typeof(NA+1L)
typeof(c(NA,""))
```
:::

::: {.column width='50%'}
```{r}
typeof(NA_character_)
typeof(NA_real_)
typeof(NA_integer_)
typeof(NA_complex_)
```
:::
::::


## NA "stickiness" 

Because `NA`s represent missing values it makes sense that any calculation using them should also be missing.

:::: {.columns}
::: {.column width='50%'}
```{r}
1 + NA
1 / NA
NA * 5
```
:::

::: {.column width='50%'}
```{r}
sqrt(NA)
3^NA
sum(c(1, 2, 3, NA))
```
:::
::::

<br/>

Summarizing functions (e.g. `sum()`, `mean()`, `sd()`, etc.) will often have a `na.rm` argument which will allow you to *drop* missing values.

```{r}
sum(c(1, 2, 3, NA), na.rm = TRUE)
mean(c(1, 2, 3, NA), na.rm = TRUE)
```

  
## NAs are not always sticky 
  
A useful mental model for `NA`s is to consider them as a unknown value that could take any of the possible values for a type. 

. . .

For numbers or characters this isn't very helpful, but for a logical value we know that the value must either be `TRUE` or `FALSE` and we can use that when deciding what value to return.

. . .

```{r}
TRUE & NA
```

. . .

```{r}
FALSE & NA
```

. . .

```{r}
TRUE | NA
```

. . .

```{r}
FALSE | NA
```


## Other Special values (double)

These are defined as part of the IEEE floating point standard (not unique to R)

* `NaN` - Not a number
* `Inf` - Positive infinity
* `-Inf` - Negative infinity

:::: {.columns}
::: {.column width='50%'}
```{r}
pi / 0
0 / 0
1/0 + 1/0
```
:::

::: {.column width='50%'}
```{r}
1/0 - 1/0
NaN / NA
NaN * NA
```
:::
::::


## Testing for `Inf` and `NaN`

`NaN` and `Inf` don't have the same testing issues that `NA`s do, but there are still convenience functions for testing for these types of values

:::: {.columns}
::: {.column width='50%'}
```{r}
is.finite(Inf)
is.infinite(-Inf)
is.nan(Inf)
is.nan(-Inf)
Inf > 1
-Inf > 1
```
:::

::: {.column width='50%'}
```{r}
is.finite(NaN)
is.infinite(NaN)
is.nan(NaN)
is.finite(NA)
is.infinite(NA)
is.nan(NA)
```
:::
::::


## Coercion for infinity and NaN
  
First remember that `Inf`, `-Inf`, and `NaN` are doubles, however their coercion behavior is not the same as other doubles

```{r}
as.integer(Inf)
as.integer(NaN)
```

. . .

:::: {.columns}
::: {.column width='50%'}
```{r}
as.logical(Inf)
as.logical(-Inf)
as.logical(NaN)
```
:::

::: {.column width='50%'}
```{r}
as.character(Inf)
as.character(-Inf)
as.character(NaN)
```
:::
::::


## Exercise 1

### **Part 1**

What is the type of the following vectors? Explain why they have that type.

* `c(1, NA+1L, "C")`
* `c(1L / 0, NA)`
* `c(1:3, 5)`
* `c(3L, NaN+1L)`
* `c(NA, TRUE)`


### **Part 2**

Considering only the four (common) data types, what is R's implicit type conversion hierarchy (from highest priority to lowest priority)? 

::: {.aside}
*Hint* - think about the pairwise interactions between types.
:::

```{r echo=FALSE}
countdown::countdown(minutes=5)
```



# Conditionals & Control Flow


## Logical (boolean) operators

<br/>

|  Operator                     |  Operation    |  Vectorized? |
|:-----------------------------:|:-------------:|:------------:|
| <code>x &#124; y</code>       |  or           |   Yes        |
| `x & y`                       |  and          |   Yes        |
| `!x`                          |  not          |   Yes        |
| <code>x &#124;&#124; y</code> |  or           |   No         |
| `x && y`                      |  and          |   No         |
|`xor(x, y)`                    |  exclusive or |   Yes        |




## Vectorized?

```{r}
x = c(TRUE,FALSE,TRUE)
y = c(FALSE,TRUE,TRUE)
```

. . .

:::: {.columns}
::: {.column width='50%'}
```{r}
x | y
x & y
```
:::

::: {.column width='50%'}
```{r}
x || y
x && y
```
:::
::::

::: {.aside}
**Note** both `||` and `&&` only use the *first* value in the vector, all other values are ignored, there is no warning about the ignored values.
:::


## Vectorization and math

Almost all of the basic mathematical operations (and many other functions) in R are vectorized.

:::: {.columns}
::: {.column width='50%'}
```{r}
c(1, 2, 3) + c(3, 2, 1)
c(1, 2, 3) / c(3, 2, 1)
```
:::

::: {.column width='50%'}
```{r}
log(c(1, 3, 0))
sin(c(1, 2, 3))
```
:::
::::



## Length coercion (aka recycling)

If the lengths of the vector do not match, then the shorter vector has its values recycled to match the length of the longer vector.

```{r}
x = c(TRUE, FALSE, TRUE)
y = c(TRUE)
z = c(FALSE, TRUE)
```

. . .

:::: {.columns}
::: {.column width='50%'}
```{r}
x | y
x & y
```
:::

::: {.column width='50%'}
```{r}
y | z
y & z
```
:::
::::

<div/>

. . .

```{r}
x | z
```



## Length coercion and math

The same length coercion rules apply for most basic mathematical operators,

```{r}
x = c(1, 2, 3)
y = c(5, 4)
z = 10L
```

:::: {.columns}
::: {.column width='50%'}
```{r}
x + x
x + z
```
:::

::: {.column width='50%'}
```{r}
y / z
log(x)+z
```
:::
::::

. . .

```{r}
x %% y
```




## Comparison operators

<br/>

|  Operator  |  Comparison                |  Vectorized?
|:----------:|:---------------------------:|:---------------:
| `x < y`    |  less than                 |  Yes
| `x > y`    |  greater than              |  Yes
| `x <= y`   |  less than or equal to     |  Yes
| `x >= y`   |  greater than or equal to  |  Yes
| `x != y`   |  not equal to              |  Yes
| `x == y`   |  equal to                  |  Yes
| `x %in% y` |  contains                  |  Yes (over `x`)

::: {.aside}
over `x` here means the returned value will have the length of `x` regardless of the length of `y`
:::



## Comparisons

```{r}
x = c("A","B","C")
y = c("A")
```



:::: {.columns}
::: {.column width='50%'}
```{r}
x == y
x != y
```
:::

::: {.column width='50%'}
```{r}
x %in% y
y %in% x
```
:::
::::

. . .

Type coercion also applies for comparison opperators which can result in *interesting behavior*

:::: {.columns}
::: {.column width='50%'}
```{r}
TRUE == "TRUE"
FALSE == 1
```
:::

::: {.column width='50%'}
```{r}
TRUE == 1
TRUE == 5
```
:::
::::


## `>` & `<` with characters

While maybe somewhat unexpected, these comparison operators can be used character values.

:::: {.columns}
::: {.column width='50%'}
```{r}
"A" < "B"
"A" > "B"
"A" < "a"
"a" > "!"
```
:::

::: {.column width='50%'}
```{r}
"Good" < "Goodbye"
c("Alice", "Bob", "Carol") <= "B"
```
:::
::::

::: {.aside}
Note - to better understand how this works / the ordering used take a look at [ASCII code](https://www.ascii-code.com/)
:::


## Conditional Control Flow

Conditional execution of code blocks is achieved via `if` statements. 

```{r}
x = c(1, 3)
```

. . .

:::: {.columns}
::: {.column width='50%'}
```{r}
if (3 %in% x) {
  print("Contains 3!")
}
```
:::

::: {.column width='50%'}
```{r}
if (1 %in% x)
  print("Contains 1!")
```
:::
::::

. . .

:::: {.columns}
::: {.column width='50%'}
```{r}
if (5 %in% x) {
  print("Contains 5!")
}
```
:::

::: {.column width='50%'}
```{r}
if (5 %in% x) {
  print("Contains 5!")
} else {
  print("Does not contain 5!")
}
```
:::
::::







## `if` is not vectorized

```{r}
x = c(1, 3)
```

. . .

```{r error=TRUE}
if (x == 1)
  print("x is 1!")
```

. . .

```{r error=TRUE}
if (x == 3)
  print("x is 3!")
```

. . .

<br/>

Note that the behavior seen above (thrown errors) is new in R 4.2, previous versions (e.g. on the server) will only throw warnings (using on the first value in the condition vector).



## Collapsing logical vectors

There are a couple of helper functions for collapsing a logical vector down to a single value: `any`, `all`

```{r}
x = c(3,4,1)
```



:::: {.columns}
::: {.column width='50%'}
```{r}
x >= 2
any(x >= 2)
all(x >= 2)
```
:::

::: {.column width='50%'}
```{r}
x <= 4
any(x <= 4)
all(x <= 4)
```
:::
::::

<div/>

. . .

```{r}
if (any(x == 3)) 
  print("x contains 3!")
```



## `else if` and `else`

:::: {.columns}
::: {.column width='50%'}
```{r}
x = 3

if (x < 0) {
  "x is negative"
} else if (x > 0) {
  "x is positive"
} else {
  "x is zero"
}
```
:::

::: {.column width='50%'}
```{r}
x = 0

if (x < 0) {
  "x is negative"
} else if (x > 0) {
  "x is positive"
} else {
  "x is zero"
}
```
:::
::::



## `if` and `return`

R's `if` conditional statements return a value (invisibly), the two following implementations are equivalent.

:::: {.columns}
::: {.column width='50%'}
```{r}
x = 5
```

```{r}
s = if (x %% 2 == 0) {
  x / 2
} else {
  3*x + 1
}
```

```{r}
s
```
:::

::: {.column width='50%'}
```{r}
x = 5
```

```{r}
if (x %% 2 == 0) {
  s = x / 2
} else {
  s = 3*x + 1
}
```

```{r}
s
```
:::
::::

::: {.aside}
Notice that conditional expressions are evaluated in the parent scope.
:::



## Exercise 2

Take a look at the following code below on the left, without running it in R what do you expect the outcome will be for each call on the right?

:::: {.columns}
::: {.column width='50%'}
```r
f = function(x) {
  # Check small prime
  if (x > 10 || x < -10) {
    stop("Input too big")
  } else if (x %in% c(2, 3, 5, 7)) {
    cat("Input is prime!\n")
  } else if (x %% 2 == 0) {
    cat("Input is even!\n")
  } else if (x %% 2 == 1) {
    cat("Input is odd!\n")
  }
}
```
:::

::: {.column width='50%'}
```r
f(1)
f(3)
f(8)
f(-1)
f(-3)
f(1:2)
f("0")
f("3")
f("zero")
```
:::
::::

::: {.aside}
More on functions next time
:::

```{r echo=FALSE}
countdown::countdown(minutes = 5)
```



## Conditionals and missing values

`NA`s can be particularly problematic for control flow,

:::: {.columns}
::: {.column width='50%'}
```{r error=TRUE}
if (2 != NA) {
  "Here"
}
```
:::

::: {.column width='50%'}
```{r error=TRUE}
2 != NA
```
:::
::::


. . .

:::: {.columns}
::: {.column width='50%'}

```{r error=TRUE}
if (all(c(1,2,NA,4) >= 1)) {
  "There"
}
```
:::

::: {.column width='50%'}

```{r error=TRUE}
all(c(1,2,NA,4) >= 1)
```
:::
::::

. . .

:::: {.columns}
::: {.column width='50%'}
```{r error=TRUE}
if (any(c(1,2,NA,4) >= 1)) {
  "There"
}
```
:::

::: {.column width='50%'}
```{r error=TRUE}
any(c(1,2,NA,4) >= 1)
```
:::
::::





## Testing for `NA`

To explicitly test if a value is missing it is necessary to use `is.na` (often along with `any` or `all`).

:::: {.columns}
::: {.column width='50%'}
```{r}
NA == NA
is.na(NA)
is.na(1)
```
:::

::: {.column width='50%'}
```{r}
is.na(c(1,2,3,NA))
any(is.na(c(1,2,3,NA)))
all(is.na(c(1,2,3,NA)))
```
:::
::::

