Review a data Frame in RStudio with R
Share
There are several functions and commands that allow you to examine a data frame quickly. It helps us to visualize its structure, explore its content, its columns, the formats and types of data present, etc…
inspect the structure of a data frame
Let’s start with a function already explored in the article “Examining a Data Frame” the str() function. It allows us to visualize the details of the structure of a data frame and thus observed a description of its content. The data frame previously created has the following structure:
str(df)
## 'data.frame': 6 obs. of 2 variables:
## $ taille: num 177 167 181 179 168 175
## $ poids : num 71 68 78 75 68 64
It contains 6 observations (the columns) and 2 variables (the rows). The information of the columns “size” and “weight” are in “numeric” formation.
Visualize the column headers
It is sometimes useful to display only the column headers, for example in the case of a particularly large data frame, with a large number of entries. In this case, the name() function is the most suitable
data(murders)
names(murders)
## [1] "state" "abb" "region" "population" "total"
The head() function displays by default the first 6 rows of a data frame and allows to have an overview of it. By default we can add the parameter n = x to define the number of lines to display.
head(murders, n = 3)
## state abb region population total
## 1 Alabama AL South 4779736 135
## 2 Alaska AK West 710231 19
## 3 Arizona AZ West 6392017 232
Knowing the number and length of a variable in a data frame
Several functions allow to go into details about the size and the number of entries in the different variables.
The levels() function gives the list of obs. of a variable,
The length() function gives the length/number of obs. of a variable.
levels(murders$region)
[1] "Northeast" "South" "North Central" "West"
length(murders$region)
[1] 51
We can see with these two functions that in the variable “region” of the data frame murders (present in the package dslabs) the unique observations are “Northeast/South/North Central/West”. Moreover its length is 51 lines.
By combining the two functions we get the length/number of unique entries in the variable and not the list or the number of lines. We can see that there are 4 different entries (previously mentioned)
length(levels(murders$region))
[1] 4
Knowing the minimum and maximum values of a numeric variable in RStudio
There are 4 main functions to get the min/max values or their position in a data frame.
The max() or min() functions display the largest or smallest values of a variable.
While the functions which.max() and which.min() give the input/line numbers of the minimum and maximum values.
max(murders$total)
## [1] 1257
which.max(murders$total)
## [1] 5
# The largest entry is located in the 5th line, its value is 1257.
min(murders$total)
## [1] 2
which.min(murders$total)
## [1] 46
#The smallest entry is located at the 46th line, its value is 2.