(Tutorial) Data Reshaping in R
Data Reshaping in R is something like arranged rows and columns in your own way to use it as per your requirements, mostly data is taken as a data frame format in R to do data processing using functions like ‘rbind()’, ‘cbind()’, etc.
In this process, you reshape or re-organize the data into rows and columns. Reshaping is re-organized data in a particular way which you need the data to process further.
Creating your data
Let’s create a random number generator and make columns to concatenate them in the data frame. Also, insert the id for later use. ‘set.seed(123)’ is used for producing the random numbers where the same sample is reproduced across all the machine anyone who uses it. Three variables, colm1, colm2, colm3, are made. With the help of ‘sample()’, the value from 1 to 15 is generated, and the same value can get repeated. Also, a data frame is used to store data in the table, and id is also joined, which is a unique number.
set.seed(123)
N <- 15
colm1 <- sample(1:15, N, replace=TRUE)
colm2 <- sample(1:15, N, replace=TRUE)
colm3 <- sample(1:15, N, replace=TRUE)
df_Temp <- data.frame(colm1, colm2,colm3)
df_Temp$id<-seq(nrow(df_Temp))
df_Temp
The above code gives the following output where there are three columns named as ‘colm1’, ‘colm2’, ‘colm3’ and consists of id from 1 to 15. The values in the respective column are filled up from 1 to 15, where some numbers are repeated, whereas some numbers may not occur.
colm1 colm2 colm3 id
15 11 14 1
15 5 3 2
3 3 4 3
14 11 14 4
3 9 1 5
10 12 11 6
2 9 7 7
6 9 5 8
11 13 12 9
5 3 15 10
4 8 10 11
14 10 13 12
6 7 7 13
9 10 9 14
10 9 9 15
cbind function
Usage: used to combine vectors, matrix, and data frames by columns.
Parameters:cbind(v1,v2): v1,v2 can be vectors, matrix or data frames
Let’s see the example of binding in action.
id can be selected by ‘df_Temp[,4]’ whereas ‘df_Temp[,2]’ selects ‘colm2’ of df_Temp as below which acts as a parameter to ‘cbind()’ and gets stored in ‘cbindexample’ variable. Further, the column names can be changed through the help of ‘colnames()’ which accepts the variable with the vectors input c to a new value, i.e. ‘newid’, ‘new_colm2’.
cbindexample<-cbind(df_Temp[,4],df_Temp[,2])
colnames(cbindexample)<- c('newid','new_colm2')
cbindexample
The above code gives the following output that you can bind two columns of id and column2 using ‘cbind’ function to make a new data frame. Also, the column name has been changed to ‘new_colm2’ whereas ‘id’ changed to ‘newid’.
newid new_colm2
1 11
2 5
3 3
4 11
5 9
6 12
7 9
8 9
9 13
10 3
11 8
12 10
13 7
14 10
15 9
rbind function:
usage: rbind used to combines the vectors, matrix, or data frames by columns.
Parameters: rbind(v1,v2):v1,v2 can be vectors, matrix or data frames.
Let’s create a new vector called ‘new_vector’ and combine it with a new ‘cbindexample’ of 2 columns by using ‘rbind’ where both are concatenated and stored to ‘rbindexample’.
new_vector<- c(16,15)
rbindexample<- rbind(cbindexample,new_vector)
rbindexample
The above code gives the output below where the new row is added with the values of 16 and 15, respectively, in ‘newid’ and ‘new_colm2’.
newid new_colm2
1 11
2 5
3 3
4 11
5 9
6 12
7 9
8 9
9 13
10 3
11 8
12 10
13 7
14 10
15 9
16 15
Melt Function:
Usage: Melt function used to convert an object to convert into a molten state, means that it takes multiple columns of data and convert it into a single column of data.
Parameters: melt(data,…,na.rm=FALSE/TRUE, value.name=”value” ),
Data: Input which you are going to melt.
…: Input that is passed to or from.
Na.rm: It is used to convert explicit missing values into implicit missing.
Value.name:for storing values into variables
Let’s look at code that you ‘molt’ the data using the id variable into one column with column name and value:
Let’s import the library named ‘reshape2’ using ‘library()’ and use melt to combine the ‘dfTemp’ columns called colm1,colm2,colm3 in a single place called ‘variable’ according to the ‘id’ variable.
library(reshape2)
molted=melt(df_Temp,id.vars=c("id"))
molted
id variable value
1 colm1 15
2 colm1 15
3 colm1 3
4 colm1 14
5 colm1 3
6 colm1 10
7 colm1 2
8 colm1 6
9 colm1 11
10 colm1 5
11 colm1 4
12 colm1 14
13 colm1 6
14 colm1 9
15 colm1 10
1 colm2 11
2 colm2 5
3 colm2 3
4 colm2 11
5 colm2 9
6 colm2 12
7 colm2 9
8 colm2 9
9 colm2 13
10 colm2 3
11 colm2 8
12 colm2 10
13 colm2 7
14 colm2 10
15 colm2 9
1 colm3 14
2 colm3 3
3 colm3 4
4 colm3 14
5 colm3 1
6 colm3 11
7 colm3 7
8 colm3 5
9 colm3 12
10 colm3 15
11 colm3 10
12 colm3 13
13 colm3 7
14 colm3 9
15 colm3 9
The above output shows that when you molt the data of colm1, colm2, colm3 according to the id variable, it combined into one column named as ‘variable’ and the values of the column are contained in ‘value’.
Dcast function:
Usage: when you have a molten dataset then you can convert the molten dataset into an original format using this function.
Parameters: dcast(data,id_variable~value),
data: molten data which needs to convert into original form.
Id_variable: single or multiple columns that used to molten the data of other columns into one column.
~: after this sign, we use the values or new molted column of the molten dataset.
You can see below code where the ‘reshape2’ is imported using ‘library()’ where ‘dcast()’ takes the first parameter as the data which was performed ‘molt()’ function and the ‘~’ sign with id where the new molted column gets formed.
library(reshape2)
dcast(molted,id~variable)
id colm1 colm2 colm3
1 15 11 14
2 15 5 3
3 3 3 4
4 14 11 14
5 3 9 1
6 10 12 11
7 2 9 7
8 6 9 5
9 11 13 12
10 5 3 15
11 4 8 10
12 14 10 13
13 6 7 7
14 9 10 9
15 10 9 9
You can see that the data changes after ‘molt()’ function gets changed to the original dataset. There are three columns with their respective values in the columns and id in the separate column.
Transpose function:
Usage: It is used to change the rows into columns and columns into rows.
Parameters: t(data), data is the data frame which you need to pass and get transpose.
Let’s change the row to column and vice-versa by using the transpose function. It can be simply done by using ‘t(df_Temp)’ as done below.
trans <- t(df_Temp)
trans
The above code gives following output:
colm1 15 15 3 14 3 10 2 6 11 5 4 14 6 9 10
colm2 11 5 3 11 9 12 9 9 13 3 8 10 7 10 9
colm3 14 3 4 14 1 11 7 5 12 15 10 13 7 9 9
id 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
You can see that the data frame having the shape of 15 rows and four columns is changed into 15 columns and 3 rows by getting transpose of data.
Congratulations
Congratulations, you have made it to the end of this tutorial!
In this tutorial, you have covered R’s different functions like ‘rbind(),’cbind()’, along with ‘Melt()’, ‘Dcast()’, and finally about the transpose function.
If you would like to learn more about R, take DataCamp’s Introduction to R course.
References:
Melt functions
Reference: Source link