Another common problem in data entry is that of missing values. This results in data point values of the NA type. They create havoc trying to work with values in a Dataframe. Fortunately, we can get rid of rows that contain NA values in a few ways.
$ julia
_
_ _ _(_)_ | A fresh approach to technical computing
(_) | (_) (_) | Documentation: https://docs.julialang.org
_ _ _| |_ __ _ | Type "?help" for help.
| | | | | | |/ _` | |
| | |_| | | | (_| | | Version 0.6.0 (2017-06-19 13:05 UTC)
_/ |\__'_|_|_|\__'_| | Official http://julialang.org/ release
|__/ | x86_64-pc-linux-gnu
julia> using DataFrames
julia> numeric_df = DataFrame(Col1= 1:10, Col2 = 11:20, Col3 = 21:30)
10×3 DataFrames.DataFrame
│ Row │ Col1 │ Col2 │ Col3 │
├─────┼──────┼──────┼──────┤
│ 1 │ 1 │ 11 │ 21 │
│ 2 │ 2 │ 12 │ 22 │
│ 3 │ 3 │ 13 │ 23 │
│ 4 │ 4 │ 14 │ 24 │
│ 5 │ 5 │ 15 │ 25 │
│ 6 │ 6 │ 16 │ 26 │
│ 7 │ 7 │ 17 │ 27 │
│ 8 │ 8 │ 18 │ 28 │
│ 9 │ 9 │ 19 │ 29 │
│ 10 │ 10 │ 20 │ 30 │
julia>
Let us set some NA values:
julia> numeric_df[3,:Col1] = NA
NA
julia> numeric_df[4,:Col2] = NA
NA
julia> numeric_df[[5,9],:Col3] = NA # row 5 and 9 in column C
NA
julia> numeric_df
10×3 DataFrames.DataFrame
│ Row │ Col1 │ Col2 │ Col3 │
├─────┼──────┼──────┼──────┤
│ 1 │ 1 │ 11 │ 21 │
│ 2 │ 2 │ 12 │ 22 │
│ 3 │ NA │ 13 │ 23 │
│ 4 │ 4 │ NA │ 24 │
│ 5 │ 5 │ 15 │ NA │
│ 6 │ 6 │ 16 │ 26 │
│ 7 │ 7 │ 17 │ 27 │
│ 8 │ 8 │ 18 │ 28 │
│ 9 │ 9 │ 19 │ NA │
│ 10 │ 10 │ 20 │ 30 │
julia>
The completecases() function returns Boolean values for each row, with a false return if the rows contain a NA value.
julia> completecases(numeric_df)
10-element DataArrays.DataArray{Bool,1}:
true
true
false
false
false
true
true
true
false
true
julia>
The completecases!() function permanentely deletes rows with NA values.
julia> completecases!(numeric_df)
6×3 DataFrames.DataFrame
│ Row │ Col1 │ Col2 │ Col3 │
├─────┼──────┼──────┼──────┤
│ 1 │ 1 │ 11 │ 21 │
│ 2 │ 2 │ 12 │ 22 │
│ 3 │ 6 │ 16 │ 26 │
│ 4 │ 7 │ 17 │ 27 │
│ 5 │ 8 │ 18 │ 28 │
│ 6 │ 10 │ 20 │ 30 │
julia> numeric_df
6×3 DataFrames.DataFrame
│ Row │ Col1 │ Col2 │ Col3 │
├─────┼──────┼──────┼──────┤
│ 1 │ 1 │ 11 │ 21 │
│ 2 │ 2 │ 12 │ 22 │
│ 3 │ 6 │ 16 │ 26 │
│ 4 │ 7 │ 17 │ 27 │
│ 5 │ 8 │ 18 │ 28 │
│ 6 │ 10 │ 20 │ 30 │
julia>
We observe that the rows containing NA values are deleted. Let us now recreate the DataFrame numeric_df to use another way of deleting rows with NA values.
julia> numeric_df = DataFrame(Col1= 1:10, Col2 = 11:20, Col3 = 21:30);
julia> numeric_df
10×3 DataFrames.DataFrame
│ Row │ Col1 │ Col2 │ Col3 │
├─────┼──────┼──────┼──────┤
│ 1 │ 1 │ 11 │ 21 │
│ 2 │ 2 │ 12 │ 22 │
│ 3 │ 3 │ 13 │ 23 │
│ 4 │ 4 │ 14 │ 24 │
│ 5 │ 5 │ 15 │ 25 │
│ 6 │ 6 │ 16 │ 26 │
│ 7 │ 7 │ 17 │ 27 │
│ 8 │ 8 │ 18 │ 28 │
│ 9 │ 9 │ 19 │ 29 │
│ 10 │ 10 │ 20 │ 30 │
Also, let us reset the NA values
julia> numeric_df[3,:Col1] = NA
NA
julia> numeric_df[4,:Col2] = NA
NA
julia> numeric_df[[5,9],:Col3] = NA # row 5 and 9 in column C
NA
julia>
Back to the original DataFrame, we can use isna() function to show whether a value is of NA type:
julia> isna(numeric_df[:Col1])
10-element BitArray{1}:
false
false
true
false
false
false
false
false
false
false
julia>
By adding the findin() function we can identify only the NA rows. The findin() function allows us to specify what we want to find, i.e, true or false boolean values (in this case for the isna())
julia> findin(isna(numeric_df[:Col1]),true)
1-element Array{Int64,1}:
3
julia>
We can also use the find() function to simply find the rows with NA values.
julia> find(isna(numeric_df[:Col1]))
1-element Array{Int64,1}:
3
julia>
This presents us with a way to delete all the rows that contain NA values.
julia> rows,cols = size(numeric_df)
(10, 3)
julia> rows
10
julia> cols
3
julia>
Creating a for loop to go through all the columns and deleting rows with NA values:
julia> for i in 1:cols
deleterows!(numeric_df, find(isna(numeric_df[:,i])))
end
julia> numeric_df
6×3 DataFrames.DataFrame
│ Row │ Col1 │ Col2 │ Col3 │
├─────┼──────┼──────┼──────┤
│ 1 │ 1 │ 11 │ 21 │
│ 2 │ 2 │ 12 │ 22 │
│ 3 │ 6 │ 16 │ 26 │
│ 4 │ 7 │ 17 │ 27 │
│ 5 │ 8 │ 18 │ 28 │
│ 6 │ 10 │ 20 │ 30 │
julia>
$ julia
_
_ _ _(_)_ | A fresh approach to technical computing
(_) | (_) (_) | Documentation: https://docs.julialang.org
_ _ _| |_ __ _ | Type "?help" for help.
| | | | | | |/ _` | |
| | |_| | | | (_| | | Version 0.6.0 (2017-06-19 13:05 UTC)
_/ |\__'_|_|_|\__'_| | Official http://julialang.org/ release
|__/ | x86_64-pc-linux-gnu
julia> using DataFrames
julia> numeric_df = DataFrame(Col1= 1:10, Col2 = 11:20, Col3 = 21:30)
10×3 DataFrames.DataFrame
│ Row │ Col1 │ Col2 │ Col3 │
├─────┼──────┼──────┼──────┤
│ 1 │ 1 │ 11 │ 21 │
│ 2 │ 2 │ 12 │ 22 │
│ 3 │ 3 │ 13 │ 23 │
│ 4 │ 4 │ 14 │ 24 │
│ 5 │ 5 │ 15 │ 25 │
│ 6 │ 6 │ 16 │ 26 │
│ 7 │ 7 │ 17 │ 27 │
│ 8 │ 8 │ 18 │ 28 │
│ 9 │ 9 │ 19 │ 29 │
│ 10 │ 10 │ 20 │ 30 │
julia>
Let us set some NA values:
julia> numeric_df[3,:Col1] = NA
NA
julia> numeric_df[4,:Col2] = NA
NA
julia> numeric_df[[5,9],:Col3] = NA # row 5 and 9 in column C
NA
julia> numeric_df
10×3 DataFrames.DataFrame
│ Row │ Col1 │ Col2 │ Col3 │
├─────┼──────┼──────┼──────┤
│ 1 │ 1 │ 11 │ 21 │
│ 2 │ 2 │ 12 │ 22 │
│ 3 │ NA │ 13 │ 23 │
│ 4 │ 4 │ NA │ 24 │
│ 5 │ 5 │ 15 │ NA │
│ 6 │ 6 │ 16 │ 26 │
│ 7 │ 7 │ 17 │ 27 │
│ 8 │ 8 │ 18 │ 28 │
│ 9 │ 9 │ 19 │ NA │
│ 10 │ 10 │ 20 │ 30 │
julia>
The completecases() function returns Boolean values for each row, with a false return if the rows contain a NA value.
julia> completecases(numeric_df)
10-element DataArrays.DataArray{Bool,1}:
true
true
false
false
false
true
true
true
false
true
julia>
The completecases!() function permanentely deletes rows with NA values.
julia> completecases!(numeric_df)
6×3 DataFrames.DataFrame
│ Row │ Col1 │ Col2 │ Col3 │
├─────┼──────┼──────┼──────┤
│ 1 │ 1 │ 11 │ 21 │
│ 2 │ 2 │ 12 │ 22 │
│ 3 │ 6 │ 16 │ 26 │
│ 4 │ 7 │ 17 │ 27 │
│ 5 │ 8 │ 18 │ 28 │
│ 6 │ 10 │ 20 │ 30 │
julia> numeric_df
6×3 DataFrames.DataFrame
│ Row │ Col1 │ Col2 │ Col3 │
├─────┼──────┼──────┼──────┤
│ 1 │ 1 │ 11 │ 21 │
│ 2 │ 2 │ 12 │ 22 │
│ 3 │ 6 │ 16 │ 26 │
│ 4 │ 7 │ 17 │ 27 │
│ 5 │ 8 │ 18 │ 28 │
│ 6 │ 10 │ 20 │ 30 │
julia>
We observe that the rows containing NA values are deleted. Let us now recreate the DataFrame numeric_df to use another way of deleting rows with NA values.
julia> numeric_df = DataFrame(Col1= 1:10, Col2 = 11:20, Col3 = 21:30);
julia> numeric_df
10×3 DataFrames.DataFrame
│ Row │ Col1 │ Col2 │ Col3 │
├─────┼──────┼──────┼──────┤
│ 1 │ 1 │ 11 │ 21 │
│ 2 │ 2 │ 12 │ 22 │
│ 3 │ 3 │ 13 │ 23 │
│ 4 │ 4 │ 14 │ 24 │
│ 5 │ 5 │ 15 │ 25 │
│ 6 │ 6 │ 16 │ 26 │
│ 7 │ 7 │ 17 │ 27 │
│ 8 │ 8 │ 18 │ 28 │
│ 9 │ 9 │ 19 │ 29 │
│ 10 │ 10 │ 20 │ 30 │
Also, let us reset the NA values
julia> numeric_df[3,:Col1] = NA
NA
julia> numeric_df[4,:Col2] = NA
NA
julia> numeric_df[[5,9],:Col3] = NA # row 5 and 9 in column C
NA
julia>
Back to the original DataFrame, we can use isna() function to show whether a value is of NA type:
julia> isna(numeric_df[:Col1])
10-element BitArray{1}:
false
false
true
false
false
false
false
false
false
false
julia>
By adding the findin() function we can identify only the NA rows. The findin() function allows us to specify what we want to find, i.e, true or false boolean values (in this case for the isna())
julia> findin(isna(numeric_df[:Col1]),true)
1-element Array{Int64,1}:
3
julia>
We can also use the find() function to simply find the rows with NA values.
julia> find(isna(numeric_df[:Col1]))
1-element Array{Int64,1}:
3
julia>
This presents us with a way to delete all the rows that contain NA values.
julia> rows,cols = size(numeric_df)
(10, 3)
julia> rows
10
julia> cols
3
julia>
Creating a for loop to go through all the columns and deleting rows with NA values:
julia> for i in 1:cols
deleterows!(numeric_df, find(isna(numeric_df[:,i])))
end
julia> numeric_df
6×3 DataFrames.DataFrame
│ Row │ Col1 │ Col2 │ Col3 │
├─────┼──────┼──────┼──────┤
│ 1 │ 1 │ 11 │ 21 │
│ 2 │ 2 │ 12 │ 22 │
│ 3 │ 6 │ 16 │ 26 │
│ 4 │ 7 │ 17 │ 27 │
│ 5 │ 8 │ 18 │ 28 │
│ 6 │ 10 │ 20 │ 30 │
julia>
No comments:
Post a Comment