Sometimes, we may want to sort our DataFrame - Ascending / Descending. Sorting the data point values in a column or columns is an important part of Data Analysis. There are much possibilites using sort() function.
$ julia
_
_ _ _(_)_ | A fresh approach to technical computing
(_) | (_) (_) | Documentation: https://docs.julialang.org
_ _ _| |_ __ _ | Type "?help" for help.
| | | | | | |/ _` | |
| | |_| | | | (_| | | Version 0.6.0 (2017-06-19 13:05 UTC)
_/ |\__'_|_|_|\__'_| | Official http://julialang.org/ release
|__/ | x86_64-pc-linux-gnu
julia> using DataFrames;
julia> movies_df = DataFrame(Group=rand(["Amistad","Batman Begins","Catch Me If You Can","Dunkirk","Empire of the Sun","Firelight","Gladiator","Hannibal","Interstellar"],15), Variable1 = randn(15), Variable2=rand(15));
julia> movies_df
15×3 DataFrames.DataFrame
│ Row │ Group │ Variable1 │ Variable2 │
├─────┼───────────────────────┼───────────┼─────────────┤
│ 1 │ "Gladiator" │ 1.59811 │ 0.870574 │
│ 2 │ "Batman Begins" │ -0.735571 │ 0.146472 │
│ 3 │ "Gladiator" │ -1.18307 │ 0.000210997 │
│ 4 │ "Catch Me If You Can" │ 1.30825 │ 0.813809 │
│ 5 │ "Interstellar" │ 0.510712 │ 0.428202 │
│ 6 │ "Dunkirk" │ 0.0282217 │ 0.26914 │
│ 7 │ "Firelight" │ -0.615212 │ 0.659826 │
│ 8 │ "Interstellar" │ -2.12227 │ 0.239937 │
│ 9 │ "Firelight" │ -2.19173 │ 0.643379 │
│ 10 │ "Empire of the Sun" │ -0.509172 │ 0.548794 │
│ 11 │ "Firelight" │ -0.485559 │ 0.542529 │
│ 12 │ "Amistad" │ -1.11516 │ 0.896659 │
│ 13 │ "Dunkirk" │ -0.647961 │ 0.969086 │
│ 14 │ "Interstellar" │ -0.785771 │ 0.870153 │
│ 15 │ "Dunkirk" │ 0.488741 │ 0.697622 │
julia>
Let us sort this DataFrame. Just observe the below usage of sort!() function. It is having a bang sign(!) in it. This means that, the sort will be made permanent. It is also called as In-Place Sorting when you use the sort!() function.
## rev = true will do decending order for numerical values and reverse alphabetical order for strings.
julia> sort!(movies_df,cols=[:Group,:Variable1],rev=true)
15×3 DataFrames.DataFrame
│ Row │ Group │ Variable1 │ Variable2 │
├─────┼───────────────────────┼───────────┼─────────────┤
│ 1 │ "Interstellar" │ 0.510712 │ 0.428202 │
│ 2 │ "Interstellar" │ -0.785771 │ 0.870153 │
│ 3 │ "Interstellar" │ -2.12227 │ 0.239937 │
│ 4 │ "Gladiator" │ 1.59811 │ 0.870574 │
│ 5 │ "Gladiator" │ -1.18307 │ 0.000210997 │
│ 6 │ "Firelight" │ -0.485559 │ 0.542529 │
│ 7 │ "Firelight" │ -0.615212 │ 0.659826 │
│ 8 │ "Firelight" │ -2.19173 │ 0.643379 │
│ 9 │ "Empire of the Sun" │ -0.509172 │ 0.548794 │
│ 10 │ "Dunkirk" │ 0.488741 │ 0.697622 │
│ 11 │ "Dunkirk" │ 0.0282217 │ 0.26914 │
│ 12 │ "Dunkirk" │ -0.647961 │ 0.969086 │
│ 13 │ "Catch Me If You Can" │ 1.30825 │ 0.813809 │
│ 14 │ "Batman Begins" │ -0.735571 │ 0.146472 │
│ 15 │ "Amistad" │ -1.11516 │ 0.896659 │
julia>
## rev = false will do ascending order for numerical values and alphabetical order for strings.
julia> sort!(movies_df,cols=[:Group,:Variable1],rev=false)
15×3 DataFrames.DataFrame
│ Row │ Group │ Variable1 │ Variable2 │
├─────┼───────────────────────┼───────────┼─────────────┤
│ 1 │ "Amistad" │ -1.11516 │ 0.896659 │
│ 2 │ "Batman Begins" │ -0.735571 │ 0.146472 │
│ 3 │ "Catch Me If You Can" │ 1.30825 │ 0.813809 │
│ 4 │ "Dunkirk" │ -0.647961 │ 0.969086 │
│ 5 │ "Dunkirk" │ 0.0282217 │ 0.26914 │
│ 6 │ "Dunkirk" │ 0.488741 │ 0.697622 │
│ 7 │ "Empire of the Sun" │ -0.509172 │ 0.548794 │
│ 8 │ "Firelight" │ -2.19173 │ 0.643379 │
│ 9 │ "Firelight" │ -0.615212 │ 0.659826 │
│ 10 │ "Firelight" │ -0.485559 │ 0.542529 │
│ 11 │ "Gladiator" │ -1.18307 │ 0.000210997 │
│ 12 │ "Gladiator" │ 1.59811 │ 0.870574 │
│ 13 │ "Interstellar" │ -2.12227 │ 0.239937 │
│ 14 │ "Interstellar" │ -0.785771 │ 0.870153 │
│ 15 │ "Interstellar" │ 0.510712 │ 0.428202 │
julia>
julia> sort!(movies_df,cols=[:Group,:Variable1,:Variable2],rev=[false,true,true])
15×3 DataFrames.DataFrame
│ Row │ Group │ Variable1 │ Variable2 │
├─────┼───────────────────────┼───────────┼─────────────┤
│ 1 │ "Amistad" │ -1.11516 │ 0.896659 │
│ 2 │ "Batman Begins" │ -0.735571 │ 0.146472 │
│ 3 │ "Catch Me If You Can" │ 1.30825 │ 0.813809 │
│ 4 │ "Dunkirk" │ 0.488741 │ 0.697622 │
│ 5 │ "Dunkirk" │ 0.0282217 │ 0.26914 │
│ 6 │ "Dunkirk" │ -0.647961 │ 0.969086 │
│ 7 │ "Empire of the Sun" │ -0.509172 │ 0.548794 │
│ 8 │ "Firelight" │ -0.485559 │ 0.542529 │
│ 9 │ "Firelight" │ -0.615212 │ 0.659826 │
│ 10 │ "Firelight" │ -2.19173 │ 0.643379 │
│ 11 │ "Gladiator" │ 1.59811 │ 0.870574 │
│ 12 │ "Gladiator" │ -1.18307 │ 0.000210997 │, Duplicates and NA
│ 13 │ "Interstellar" │ 0.510712 │ 0.428202 │
│ 14 │ "Interstellar" │ -0.785771 │ 0.870153 │
│ 15 │ "Interstellar" │ -2.12227 │ 0.239937 │
julia>
$ julia
_
_ _ _(_)_ | A fresh approach to technical computing
(_) | (_) (_) | Documentation: https://docs.julialang.org
_ _ _| |_ __ _ | Type "?help" for help.
| | | | | | |/ _` | |
| | |_| | | | (_| | | Version 0.6.0 (2017-06-19 13:05 UTC)
_/ |\__'_|_|_|\__'_| | Official http://julialang.org/ release
|__/ | x86_64-pc-linux-gnu
julia> using DataFrames;
julia> movies_df = DataFrame(Group=rand(["Amistad","Batman Begins","Catch Me If You Can","Dunkirk","Empire of the Sun","Firelight","Gladiator","Hannibal","Interstellar"],15), Variable1 = randn(15), Variable2=rand(15));
julia> movies_df
15×3 DataFrames.DataFrame
│ Row │ Group │ Variable1 │ Variable2 │
├─────┼───────────────────────┼───────────┼─────────────┤
│ 1 │ "Gladiator" │ 1.59811 │ 0.870574 │
│ 2 │ "Batman Begins" │ -0.735571 │ 0.146472 │
│ 3 │ "Gladiator" │ -1.18307 │ 0.000210997 │
│ 4 │ "Catch Me If You Can" │ 1.30825 │ 0.813809 │
│ 5 │ "Interstellar" │ 0.510712 │ 0.428202 │
│ 6 │ "Dunkirk" │ 0.0282217 │ 0.26914 │
│ 7 │ "Firelight" │ -0.615212 │ 0.659826 │
│ 8 │ "Interstellar" │ -2.12227 │ 0.239937 │
│ 9 │ "Firelight" │ -2.19173 │ 0.643379 │
│ 10 │ "Empire of the Sun" │ -0.509172 │ 0.548794 │
│ 11 │ "Firelight" │ -0.485559 │ 0.542529 │
│ 12 │ "Amistad" │ -1.11516 │ 0.896659 │
│ 13 │ "Dunkirk" │ -0.647961 │ 0.969086 │
│ 14 │ "Interstellar" │ -0.785771 │ 0.870153 │
│ 15 │ "Dunkirk" │ 0.488741 │ 0.697622 │
julia>
Let us sort this DataFrame. Just observe the below usage of sort!() function. It is having a bang sign(!) in it. This means that, the sort will be made permanent. It is also called as In-Place Sorting when you use the sort!() function.
Sort Decending:
## rev = true will do decending order for numerical values and reverse alphabetical order for strings.
julia> sort!(movies_df,cols=[:Group,:Variable1],rev=true)
15×3 DataFrames.DataFrame
│ Row │ Group │ Variable1 │ Variable2 │
├─────┼───────────────────────┼───────────┼─────────────┤
│ 1 │ "Interstellar" │ 0.510712 │ 0.428202 │
│ 2 │ "Interstellar" │ -0.785771 │ 0.870153 │
│ 3 │ "Interstellar" │ -2.12227 │ 0.239937 │
│ 4 │ "Gladiator" │ 1.59811 │ 0.870574 │
│ 5 │ "Gladiator" │ -1.18307 │ 0.000210997 │
│ 6 │ "Firelight" │ -0.485559 │ 0.542529 │
│ 7 │ "Firelight" │ -0.615212 │ 0.659826 │
│ 8 │ "Firelight" │ -2.19173 │ 0.643379 │
│ 9 │ "Empire of the Sun" │ -0.509172 │ 0.548794 │
│ 10 │ "Dunkirk" │ 0.488741 │ 0.697622 │
│ 11 │ "Dunkirk" │ 0.0282217 │ 0.26914 │
│ 12 │ "Dunkirk" │ -0.647961 │ 0.969086 │
│ 13 │ "Catch Me If You Can" │ 1.30825 │ 0.813809 │
│ 14 │ "Batman Begins" │ -0.735571 │ 0.146472 │
│ 15 │ "Amistad" │ -1.11516 │ 0.896659 │
julia>
Sort Ascending:
## rev = false will do ascending order for numerical values and alphabetical order for strings.
julia> sort!(movies_df,cols=[:Group,:Variable1],rev=false)
15×3 DataFrames.DataFrame
│ Row │ Group │ Variable1 │ Variable2 │
├─────┼───────────────────────┼───────────┼─────────────┤
│ 1 │ "Amistad" │ -1.11516 │ 0.896659 │
│ 2 │ "Batman Begins" │ -0.735571 │ 0.146472 │
│ 3 │ "Catch Me If You Can" │ 1.30825 │ 0.813809 │
│ 4 │ "Dunkirk" │ -0.647961 │ 0.969086 │
│ 5 │ "Dunkirk" │ 0.0282217 │ 0.26914 │
│ 6 │ "Dunkirk" │ 0.488741 │ 0.697622 │
│ 7 │ "Empire of the Sun" │ -0.509172 │ 0.548794 │
│ 8 │ "Firelight" │ -2.19173 │ 0.643379 │
│ 9 │ "Firelight" │ -0.615212 │ 0.659826 │
│ 10 │ "Firelight" │ -0.485559 │ 0.542529 │
│ 11 │ "Gladiator" │ -1.18307 │ 0.000210997 │
│ 12 │ "Gladiator" │ 1.59811 │ 0.870574 │
│ 13 │ "Interstellar" │ -2.12227 │ 0.239937 │
│ 14 │ "Interstellar" │ -0.785771 │ 0.870153 │
│ 15 │ "Interstellar" │ 0.510712 │ 0.428202 │
julia>
Provding sort attribute to each column:
julia> sort!(movies_df,cols=[:Group,:Variable1,:Variable2],rev=[false,true,true])
15×3 DataFrames.DataFrame
│ Row │ Group │ Variable1 │ Variable2 │
├─────┼───────────────────────┼───────────┼─────────────┤
│ 1 │ "Amistad" │ -1.11516 │ 0.896659 │
│ 2 │ "Batman Begins" │ -0.735571 │ 0.146472 │
│ 3 │ "Catch Me If You Can" │ 1.30825 │ 0.813809 │
│ 4 │ "Dunkirk" │ 0.488741 │ 0.697622 │
│ 5 │ "Dunkirk" │ 0.0282217 │ 0.26914 │
│ 6 │ "Dunkirk" │ -0.647961 │ 0.969086 │
│ 7 │ "Empire of the Sun" │ -0.509172 │ 0.548794 │
│ 8 │ "Firelight" │ -0.485559 │ 0.542529 │
│ 9 │ "Firelight" │ -0.615212 │ 0.659826 │
│ 10 │ "Firelight" │ -2.19173 │ 0.643379 │
│ 11 │ "Gladiator" │ 1.59811 │ 0.870574 │
│ 12 │ "Gladiator" │ -1.18307 │ 0.000210997 │, Duplicates and NA
│ 13 │ "Interstellar" │ 0.510712 │ 0.428202 │
│ 14 │ "Interstellar" │ -0.785771 │ 0.870153 │
│ 15 │ "Interstellar" │ -2.12227 │ 0.239937 │
julia>
No comments:
Post a Comment