# (Tutorial) Utilities in R (article)

In this tutorial, you are going to learn about several functions and utilities that are easy and often used in the R programming language. First, we’ll discuss some mathematical related functions. Then you will look at functions that relate more closely to R’s data structure like manipulating nested-list, regular expressions, time, and dates.

### Mathematical Functions

Let’s look at the following cells of code that uses several mathematical functions.

You will first define two vectors, namely `x` and `y` that comprises of both positive and negative values.

``````x <- c(1.1, -2.3, -4.5)
y <- c(2.4,-44, -2.2)
``````
``````print(x)
``````
``````  1.1 -2.3 -4.5
``````
``````print(y)
``````
``````   2.4 -44.0  -2.2
``````

Let’s take these two vectors `x` and `y`:

• Take the absolute values of them,
• Round them up to zero decimal places,
• Sum them up each and
• Finally, take the average of both.
``````mean(c(sum(round(abs(x))),sum(round(abs(y)))))
``````
``````27.5
``````

How about you break the above line of code into small pieces and take a microscopic look into each function.

• The `abs()` function simply considers the positive or an absolute value of the elements of the vector `x` and `y`.

For example, on applying the `abs()` function of `x` and `y`, you would expect all positive values, as shown below.

``````abs(x)
``````
1. 1.1
2. 2.3
3. 4.5
``````abs(y)
``````
1. 2.4
2. 44
3. 2.2
• The next function in the list is the `round()` function. That does nothing but rounds the input. It takes an extra argument, where you can specify how many decimal places you would like the input to be rounded to. In a general setting, the `round()` function rounds the input to zero decimal places.

For example, after you apply the `round()` function to the vectors `x` and `y`, the output would look the one shown below:

``````round(x)
``````
1. 1
2. -2
3. -4
``````round(y)
``````
1. 2
2. -44
3. -2
• While the `sum()` function will simply compute the sum of the elements of the vector or matrix. For example, if you pass a row-vector as an argument to the sum function, the sum of all the row-vector elements will be returned as a scalar.

In this case, vector `x` and `y` are passed to the `sum()` function. Hence, R simply calculates the sum of the vector elements and returns a scalar as an output.

``````sum(abs(round(x)))
``````
``````7
``````
``````sum(abs(round(y)))
``````
``````48
``````
• Finally, the `mean()` or the `average()` function will calculate the arithmetic mean. It will take the average of a set of numerical values, add them together, and divide them by the number of terms in the set.

In this case, the input to the `mean()` function is a row vector of length two containing numbers `7` and `48`. So, the mean of these two values would be the sum of these values divided by the number of elements, i.e., 2.

``````mean(c(7,48))
``````
``````27.5
``````

This was still pretty easy. Isn’t it?

Let’s now move onto the next and most interesting segment of this tutorial!

### Functions for Data Structures

Now let’s look at some of the data structures like list and vectors. Different ways in which you can operate on `list` data structure, reversing a `list`, and how you can convert a list to a vector and vice-versa.

The below function is a `list()`, or you can say it is a `list of lists`, also known as a nested-list. It creates a list of elements that can comprise of logical values, numerical values, and strings.

In the below example, there are three lists inside a list, namely `log`, `ch`, and `int_vec`. Each of them has a data type or an R object of logical, numeric, and string/character.

``````list_define <- list(log = TRUE, ch = "hello_datacamp", int_vec = sort(rep(seq(8,2, by = -2), times = 2)))
``````
``````list_define
``````
\$log
TRUE
\$ch
‘hello_datacamp’
\$int_vec
1. 2
2. 2
3. 4
4. 4
5. 6
6. 6
7. 8
8. 8
• `log` is simply a logical operator, TRUE or FALSE
• `ch` is a character string `hello_datacamp`
• `int_vec` is a sequence of numerical values.

Since the `log` and `ch` are pretty straightforward, let’s have a closer look at `int_vec`.

``````int_vec = sort(rep(seq(8,2, by = -2), times = 2))
``````
``````int_vec
``````
1. 2
2. 2
3. 4
4. 4
5. 6
6. 6
7. 8
8. 8

Let’s understand the above expression step-by-step.

The `seq` function that produces a sequence of numbers in descending order ranging from 8 to 2.

Syntax of `seq()` function is given as: seq(x1,x2, by = y)

The first two arguments `x1` and `x2` tells R the range of the sequence, i.e., where to start and end the sequence. The `by` argument specifies the amount of increment or decrement of the sequence at each interval.

For example, the below line of code will generate a sequence starting from 100 to 200 with an increment step of 20.

``````seq(100,200, by = 20)
``````
1. 100
2. 120
3. 140
4. 160
5. 180
6. 200

In our example, the sequence function will output a sequence from 8 till 2 (inclusive) with a decrement step of 2, which returns a vector of length 4.

``````a = seq(8,2, by = -2)
``````
``````a
``````
1. 8
2. 6
3. 4
4. 2

Let’s now understand the `rep` function.

It can repeat the input argument, which is usually a vector or a list using the `times` function, which takes an integer as an argument and repeats that many times through the input or the sequence.
The `rep` function takes in two arguments: the input and the number of times you want the input to repeat or replicated.

Applying the `rep` function on our example, which is a vector of length 4 yields an output vector of length 8.

``````b = rep(a, times = 2)
``````

If you want each element of the vector or list to be repeated instead of the complete vector, then there is an alternative to `times`, i.e., by using the `each` argument.

The apparent difference by using `times` and `each` is that pattern in which each element occurs is not the same.

``````rep(a, each = 2)
``````
1. 8
2. 8
3. 6
4. 6
5. 4
6. 4
7. 2
8. 2

Last but not least, the `sort()` function. It is a self-explanatory and a generic function used for sorting many data structures like a vector or list. It is not limited to only numerical values but can also be used on logical values, and characters. By default, it sorts the elements in ascending order.

Let’s put the output of the `rep` function to the `sort` function to arrive at the final output.

``````sort(b)
``````
1. 2
2. 2
3. 4
4. 4
5. 6
6. 6
7. 8
8. 8

Great! So you were successful in solving the lengthy-expression `int_vec`, which was inside the list `list_define` in such an easy manner.

Further, let’s find out the contents of the list `list_define` for which you will make use of the `str()` function. The `str()` function in R allows you to display the structure of R objects.

``````str(list_define)
``````
``````List of 3
\$ log    : logi TRUE
\$ ch     : chr "hello_datacamp"
\$ int_vec: num [1:8] 2 2 4 4 6 6 8 8
``````

Let’s look at some of the cool R expressions:

• `is` function can be used to check the type of your data structure, which returns a logical and can come handy when dealing with conditional statements.
``````is.list(list_define) #returns true if the argument passed is a list.
``````
``````TRUE
``````

Whereas, it returns FALSE if a vector, which is not a list, is passed as shown in the cell below.

``````is.list(c(1,2,3)) #returns false since you passed a vector.
``````
``````FALSE
``````
• Converting a vector to a list is so simple in R. All you need to do is use the `as` function followed by `.list()`, and pass the vector as an argument. That’s all it takes to convert a vector to a list.
``````vec_to_list <- as.list(c(1,2,3))
``````
``````is.list(vec_to_list) #verify it with is.list()
``````
``````TRUE
``````
• On the other hand, a list can be unrolled into a vector by using the `unlist` function. R does this conversion by simply flattening the entire list structure and finally outputting a single vector.
``````list_to_vec <- unlist(vec_to_list)
``````
``````list_to_vec
``````
1. 1
2. 2
3. 3

Similar to `is.list()`, you can use `is.vector()` to find out whether the given argument is a vector or not.

``````is.vector(list_to_vec)
``````
``````TRUE
``````

Let’s convert the big list `list_define` to a vector. Point to note here is that a vector can only contain a single data type or R object. Hence, both the `logical` as well as `numerical` values will be converted to strings.

``````unlist(list_define)
``````
log
‘TRUE’
ch
‘hello_datacamp’
int_vec1
‘2’
int_vec2
‘2’
int_vec3
‘4’
int_vec4
‘4’
int_vec5
‘6’
int_vec6
‘6’
int_vec7
‘8’
int_vec8
‘8’

Before moving onto the next topic, let’s look at the `append()` and `rev()` function.

• As the name suggests, `append()` function allows you to append or add two or more vector or a list to an existing or a new vector or list.

Let’s try out the `append()` function on the `list_define` list. You would notice that the list will now consist of 6 elements instead of 3 since you appended the same list with itself.

``````str(append(list_define, list_define))
``````
``````List of 6
\$ log    : logi TRUE
\$ ch     : chr "hello_datacamp"
\$ int_vec: num [1:8] 2 2 4 4 6 6 8 8
\$ log    : logi TRUE
\$ ch     : chr "hello_datacamp"
\$ int_vec: num [1:8] 2 2 4 4 6 6 8 8
``````
• Finally, let’s reverse or change the order of the `list_define` list with the help of the `rev()` function in R.
``````str(rev(list_define))
``````
``````List of 3
\$ int_vec: num [1:8] 2 2 4 4 6 6 8 8
\$ ch     : chr "hello_datacamp"
\$ log    : logi TRUE
``````

### Regular Expressions

A lot of people find regular expression a complex topic to learn. However, it is an essential topic not only in R but across various programming languages like Python. Many programming languages, including R, provide in-built regular expression capability.

A regular expression can be used in so many applications, and it comes in handy, especially when you want to preprocess text data. It is used in various Natural Language Processing (NLP) problems.

It is also used in query search engines and text editors.

Regular expressions, also known as `regex` or `regexp` or `rational operators`, are a sequence of characters that define a search pattern. Generally, these search patterns are used by various string searching algorithms for finding a pattern or finding and replacing the pattern or filtering a matched pattern.

Let’s start this topic by understanding the use of `grep()` and `grepl()` function.

For simplicity, let’s define a row-vector `animals_regex` of length 5 on which you will learn to apply regex patterns.

``````animals_regex <- c('cat','dog','cheetah','lion','mice')
``````

First, let’s understand the `grepl()` function. The `grepl()` function returns a logical output meaning that if the string matches the pattern, then it returns TRUE else FALSE.

Below is a straightforward and intuitive syntax of the `grepl()` function where the first argument is the pattern you want to match while the second argument is the string or the input from which you want to find or filter the pattern. You can ignore the remaining arguments for now.
grepl(pattern, x, ignore.case = FALSE, perl = FALSE, fixed = FALSE, useBytes = FALSE)

Let’s find out which out of the above five animals have a `c` in them with the help of the `grepl()` function.

In this case, you are looking for the animals which have `c` in them, which directly means that the pattern here is nothing but `c` itself.

``````grepl(pattern = 'c', x = animals_regex)
``````

From the above output, you can observe that, since there is a `c` in cat, cheetah, and mice, so a TRUE is returned for those indices in the vector `animals_regex`, while FALSE was returned for the ones which did not match with the pattern `c`.

Let’s find out the elements that start with `c` and not just have a character `c` in their name.

To achieve this, all you need to do is use a \$^\$ (caret) sign at the beginning of the pattern you would like to find.

``````grepl(pattern = '^c', x = animals_regex) #only cat and cheetah start with `c`.
``````

So from the above output, you can see that since only cat and cheetah start with a `c`, hence, only those positions are returned as TRUE.

Similar to the \$^\$ sign, the `\$` sign can be used at the end of the pattern you would like to find to match the elements that end with the specified pattern. To find out an animal that ends with an `n`, you can simply use `n`, followed by the `\$` sign.

``````grepl(pattern = 'n\$', x = animals_regex) #only lion ends with an `n`.
``````

Note: To learn more about regular expressions, simply type `?regex` in jupyter notebook code cell and documentation on regex will pop-up.
this source, which provides you a tool to design your search patterns and then allows you to test it on your input strings.

• Similar to the `grepl()` function, there is a `grep()` function, which instead of the logical output, returns the index of the vector/matrix that matches the given pattern.

The syntax of `grep()` function is exactly same as the `grepl()` function and is given as:
grep(pattern, x, ignore.case = FALSE, perl = FALSE, value = FALSE, fixed = FALSE, useBytes = FALSE, invert = FALSE)
Let’s take the same example as above but this time apply the `grep()` function on it!

``````grep(pattern = 'c', x = animals_regex)
``````
1. 1
2. 3
3. 5

As you would expect, the above output returns the index of the elements cat, cheetah, and mice, and not TRUE/FALSE. And that’s pretty much about it!

Let’s use the `which` function to compare the `grep()` and `grepl()` function. The `which()` function simply returns the indices of the vector for the `TRUE` indices of a logical object.

Now if you connect the dots, you would have understood that since the `grepl()` function has the capability to return a logical object, it will be simply passed to the `which()` function which will then convert the output similar to what you would expect from a `grep()` function.

``````which(grepl(pattern = 'c', x = animals_regex))
``````
1. 1
2. 3
3. 5

Similar to the `grepl()` function, the `grep()` function also knows how to handle different types of regular expression patterns.

If you apply the `grep()` function to find out the elements in animals_regex vector that end with `n`, you would expect an output of 4 since only `lion` ends with `n` as shown below.

``````grep(pattern = 'n\$', x = animals_regex)
``````
``````4
``````

Well done!

You have learned some basics of regular expressions like how you can filter out the elements from a vector that matches the given pattern. However, R is not limited to just pattern matching. It has a handful of functions, and out of which `sub()` function is one of them.

The `sub()` function, instead of filtering the matched pattern, replaces the matches with other strings. Let’s understand it more deeply!

• The `sub()` function primarily takes three arguments as an input that are:
• pattern which you would like to match or the regular expression,
• replacement value which will be placed at the matched element of the vector and,
• x the input vector string on which you will apply the regex.

The syntax is given as:
sub(pattern, replacement, x, ignore.case = FALSE, perl = FALSE,fixed = FALSE, useBytes = FALSE)
This time also let’s take the same example as above to understand its functionality!

``````sub(pattern = 'c', replacement = 'a', x = animals_regex)
``````
1. ‘aat’
2. ‘dog’
3. ‘aheetah’
4. ‘lion’
5. ‘miae’

From the above output, you can observe that the `cat` string gets replaced with `aat`, the `cheetah` string gets replaced to `aheetah`, and the `mice` string gets converted to `miae`. For these elements, the pattern was successfully matched.

Also, note that the `sub()` looks for only the first match in the string, which means that if there are two `c` in a string, only the first occurrence of `c` will be replaced with `a` while the second one will remain unchanged.

If you still want to replace every single match of a pattern in a vector string better, use the `gsub()` function, which is out of the scope of this tutorial!

Before moving on to the next topic, let’s try one more interesting expression.

This time you will make use of the `|` (or) operator which will try to match any of the defined patterns and if it matches, replace it with `-`. Remember, since you use `gsub()` function, it will replace every single match of a pattern in a string.

``````gsub(pattern = 'c|d|l', replacement = '-', x = animals_regex) #animals with `c`,`d`,`l` gets replaced with `-`.
``````
1. ‘-at’
2. ‘-og’
3. ‘-heetah’
4. ‘-ion’
5. ‘mi-e’

Let’s move onto the final topic of today’s tutorial, i.e., Time and Dates!

### Time and Dates

Time and date information can be quite useful in various scenarios. For example, let’s say you are working on a Computer Vision related problem and you would like to find out the FPS (frames per second) at which your algorithm is running. In such a use-case, you could use the Time object to find out the processing speed of your computer vision algorithm. For other specific problems like time-series forecasting and seasonality studies, R’s potential can be used to the full extent.

For starters, let’s quickly print today’s date using R with a simple command `Sys.Date()`. Here Sys refers to the system, which means it returns systems approximation of date.

``````Sys.Date()
``````

Simple, isn’t it?

R’s time and dates belong to `Date` object, or you can say that the data type is `Date`. It can be verified using the `class` function that you learned in Data Types in R tutorial.

Similar to the Date function, you have a `time()` function which returns the systems current time, in fact, it returns both the time and date as an output.

``````Sys.time()
``````
`````` "2020-02-04 02:05:04 IST"
``````

#### Creating Date Objects

You learned how to get the current date and time. Let’s now find out how you can create dates for other days by passing a mere string as an argument.

To create a date object for 10th May 1993, you will use the following syntax:

``````date_may <- as.Date('1993-05-10') #converts character string to a date object
``````
``````date_may
``````

``````class(date_may)
``````
``````'Date'
``````

One important point to note here is that the R’s `Date` function by default expects you to enter the date in `YYYY-MM-DD` format if you try to interchange the year with the month or day it would result in an error. Let’s try it out!

``````date_may <- as.Date('05-1993-10') #R follows the ISO date format by default
``````
``````Error in charToDate(x): character string is not in a standard unambiguous format
Traceback:

1. as.Date("05-1993-10")

2. as.Date.character("05-1993-10")

3. charToDate(x)

4. stop("character string is not in a standard unambiguous format")
``````

But the good thing is, you could change the format explicitly by passing an argument `format` and customize it accordingly.

``````date_may <- as.Date('05-1993-10', format = '%m-%Y-%d')
``````

The `as.Date()` function will accept different date formats, but at the end, it will convert it back to the ISO date format, you can see that by printing the `date_may` variable.

``````date_may
``````

``````date_may <- as.Date('05-10-1993', format = '%m-%d-%Y')
``````
``````date_may
``````

#### Date Arithmetic

Wouldn’t it be awesome if you could apply mathematical operations like addition and subtraction to the Date objects in R?

Let’s add 1 to the `date_may` variable, and you would observe that it will show you one day later date.

``````date_may + 1
``````

Great, so as you can see from the above output, adding one changed the date to 11th May 1993 from 10th May 1993. Similarly, you could subtract one from the date.

Let’s say you want to find out the time difference between you and your elder sibling’s date of birth.

``````elder_sib <- as.Date('1989-03-21')
``````
``````date_may - elder_sib
``````
``````Time difference of 1511 days
``````

Congratulations on finishing the tutorial.

This tutorial was a good starting point for beginners who are eager to learn about various utility functions in R. As a good exercise, you might want to learn more about Regular Expressions as they are used in a variety of applications and indeed are a very powerful tool when it comes to cleaning or preprocessing the text data. 