Warning: Undefined array key "options" in /htdocs/wp-content/plugins/elementor-pro/modules/theme-builder/widgets/site-logo.php on line 192
Review a data Frame in RStudio with R - Rstudio-data
Start with R

Review a data Frame in RStudio with R

Share

There are several functions and commands that allow you to examine a data frame quickly. It helps us to visualize its structure, explore its content, its columns, the formats and types of data present, etc…

inspect the structure of a data frame

Let’s start with a function already explored in the article “Examining a Data Frame” the str() function. It allows us to visualize the details of the structure of a data frame and thus observed a description of its content. The data frame previously created has the following structure:

str(df)
## 'data.frame':    6 obs. of  2 variables:
##  $ taille: num  177 167 181 179 168 175
##  $ poids : num  71 68 78 75 68 64

It contains 6 observations (the columns) and 2 variables (the rows). The information of the columns “size” and “weight” are in “numeric” formation.

Visualize the column headers

It is sometimes useful to display only the column headers, for example in the case of a particularly large data frame, with a large number of entries. In this case, the name() function is the most suitable

data(murders)
names(murders)
## [1] "state"      "abb"        "region"     "population" "total"

The head() function displays by default the first 6 rows of a data frame and allows to have an overview of it. By default we can add the parameter n = x to define the number of lines to display.

head(murders, n = 3)
##     state abb region population total
## 1 Alabama  AL  South    4779736   135
## 2  Alaska  AK   West     710231    19
## 3 Arizona  AZ   West    6392017   232

Knowing the number and length of a variable in a data frame

Several functions allow to go into details about the size and the number of entries in the different variables.
The levels() function gives the list of obs. of a variable,
The length() function gives the length/number of obs. of a variable.

levels(murders$region)
[1] "Northeast"     "South"         "North Central" "West"
         
length(murders$region)
[1] 51

We can see with these two functions that in the variable “region” of the data frame murders (present in the package dslabs) the unique observations are “Northeast/South/North Central/West”. Moreover its length is 51 lines.
By combining the two functions we get the length/number of unique entries in the variable and not the list or the number of lines. We can see that there are 4 different entries (previously mentioned)

length(levels(murders$region))
[1] 4

Knowing the minimum and maximum values of a numeric variable in RStudio

There are 4 main functions to get the min/max values or their position in a data frame.
The max() or min() functions display the largest or smallest values of a variable.
While the functions which.max() and which.min() give the input/line numbers of the minimum and maximum values.

max(murders$total)
## [1] 1257

which.max(murders$total)
## [1] 5
# The largest entry is located in the 5th line, its value is 1257.

min(murders$total)
## [1] 2

which.min(murders$total)
## [1] 46
#The smallest entry is located at the 46th line, its value is 2.
Tags:

You Might also Like

Related Stories

Next Up