Monday, September 11, 2017

Julia - Language - DataFrames - Sorting

Sometimes, we may want to sort our DataFrame - Ascending / Descending. Sorting the data point values in a column or columns is an important part of Data Analysis. There are much possibilites using sort() function. 

 $ julia
               _
   _       _ _(_)_     |  A fresh approach to technical computing
  (_)     | (_) (_)    |  Documentation: https://docs.julialang.org
   _ _   _| |_  __ _   |  Type "?help" for help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 0.6.0 (2017-06-19 13:05 UTC)
 _/ |\__'_|_|_|\__'_|  |  Official http://julialang.org/ release
|__/                   |  x86_64-pc-linux-gnu

julia> using DataFrames;

julia> movies_df = DataFrame(Group=rand(["Amistad","Batman Begins","Catch Me If You Can","Dunkirk","Empire of the Sun","Firelight","Gladiator","Hannibal","Interstellar"],15), Variable1 = randn(15), Variable2=rand(15));

julia> movies_df
15×3 DataFrames.DataFrame
│ Row │ Group                 │ Variable1 │ Variable2   │
├─────┼───────────────────────┼───────────┼─────────────┤
│ 1   │ "Gladiator"           │ 1.59811   │ 0.870574    │
│ 2   │ "Batman Begins"       │ -0.735571 │ 0.146472    │
│ 3   │ "Gladiator"           │ -1.18307  │ 0.000210997 │
│ 4   │ "Catch Me If You Can" │ 1.30825   │ 0.813809    │
│ 5   │ "Interstellar"        │ 0.510712  │ 0.428202    │
│ 6   │ "Dunkirk"             │ 0.0282217 │ 0.26914     │
│ 7   │ "Firelight"           │ -0.615212 │ 0.659826    │
│ 8   │ "Interstellar"        │ -2.12227  │ 0.239937    │
│ 9   │ "Firelight"           │ -2.19173  │ 0.643379    │
│ 10  │ "Empire of the Sun"   │ -0.509172 │ 0.548794    │
│ 11  │ "Firelight"           │ -0.485559 │ 0.542529    │
│ 12  │ "Amistad"             │ -1.11516  │ 0.896659    │
│ 13  │ "Dunkirk"             │ -0.647961 │ 0.969086    │
│ 14  │ "Interstellar"        │ -0.785771 │ 0.870153    │
│ 15  │ "Dunkirk"             │ 0.488741  │ 0.697622    │

julia> 

Let us sort this DataFrame. Just observe the below usage of sort!() function. It is having a bang sign(!) in it. This means that, the sort will be made permanent. It is also called as In-Place Sorting when you use the sort!() function.


Sort Decending:


## rev = true will do decending order for numerical values and reverse alphabetical order for strings.

julia> sort!(movies_df,cols=[:Group,:Variable1],rev=true)
15×3 DataFrames.DataFrame
│ Row │ Group                 │ Variable1 │ Variable2   │
├─────┼───────────────────────┼───────────┼─────────────┤
│ 1   │ "Interstellar"        │ 0.510712  │ 0.428202    │
│ 2   │ "Interstellar"        │ -0.785771 │ 0.870153    │
│ 3   │ "Interstellar"        │ -2.12227  │ 0.239937    │
│ 4   │ "Gladiator"           │ 1.59811   │ 0.870574    │
│ 5   │ "Gladiator"           │ -1.18307  │ 0.000210997 │
│ 6   │ "Firelight"           │ -0.485559 │ 0.542529    │
│ 7   │ "Firelight"           │ -0.615212 │ 0.659826    │
│ 8   │ "Firelight"           │ -2.19173  │ 0.643379    │
│ 9   │ "Empire of the Sun"   │ -0.509172 │ 0.548794    │
│ 10  │ "Dunkirk"             │ 0.488741  │ 0.697622    │
│ 11  │ "Dunkirk"             │ 0.0282217 │ 0.26914     │
│ 12  │ "Dunkirk"             │ -0.647961 │ 0.969086    │
│ 13  │ "Catch Me If You Can" │ 1.30825   │ 0.813809    │
│ 14  │ "Batman Begins"       │ -0.735571 │ 0.146472    │
│ 15  │ "Amistad"             │ -1.11516  │ 0.896659    │

julia> 

Sort Ascending:


## rev = false will do ascending order for numerical values and alphabetical order for strings.

julia> sort!(movies_df,cols=[:Group,:Variable1],rev=false)
15×3 DataFrames.DataFrame
│ Row │ Group                 │ Variable1 │ Variable2   │
├─────┼───────────────────────┼───────────┼─────────────┤
│ 1   │ "Amistad"             │ -1.11516  │ 0.896659    │
│ 2   │ "Batman Begins"       │ -0.735571 │ 0.146472    │
│ 3   │ "Catch Me If You Can" │ 1.30825   │ 0.813809    │
│ 4   │ "Dunkirk"             │ -0.647961 │ 0.969086    │
│ 5   │ "Dunkirk"             │ 0.0282217 │ 0.26914     │
│ 6   │ "Dunkirk"             │ 0.488741  │ 0.697622    │
│ 7   │ "Empire of the Sun"   │ -0.509172 │ 0.548794    │
│ 8   │ "Firelight"           │ -2.19173  │ 0.643379    │
│ 9   │ "Firelight"           │ -0.615212 │ 0.659826    │
│ 10  │ "Firelight"           │ -0.485559 │ 0.542529    │
│ 11  │ "Gladiator"           │ -1.18307  │ 0.000210997 │
│ 12  │ "Gladiator"           │ 1.59811   │ 0.870574    │
│ 13  │ "Interstellar"        │ -2.12227  │ 0.239937    │
│ 14  │ "Interstellar"        │ -0.785771 │ 0.870153    │
│ 15  │ "Interstellar"        │ 0.510712  │ 0.428202    │

julia> 

Provding sort attribute to each column:


julia> sort!(movies_df,cols=[:Group,:Variable1,:Variable2],rev=[false,true,true])
15×3 DataFrames.DataFrame
│ Row │ Group                 │ Variable1 │ Variable2   │
├─────┼───────────────────────┼───────────┼─────────────┤
│ 1   │ "Amistad"             │ -1.11516  │ 0.896659    │
│ 2   │ "Batman Begins"       │ -0.735571 │ 0.146472    │
│ 3   │ "Catch Me If You Can" │ 1.30825   │ 0.813809    │
│ 4   │ "Dunkirk"             │ 0.488741  │ 0.697622    │
│ 5   │ "Dunkirk"             │ 0.0282217 │ 0.26914     │
│ 6   │ "Dunkirk"             │ -0.647961 │ 0.969086    │
│ 7   │ "Empire of the Sun"   │ -0.509172 │ 0.548794    │
│ 8   │ "Firelight"           │ -0.485559 │ 0.542529    │
│ 9   │ "Firelight"           │ -0.615212 │ 0.659826    │
│ 10  │ "Firelight"           │ -2.19173  │ 0.643379    │
│ 11  │ "Gladiator"           │ 1.59811   │ 0.870574    │
│ 12  │ "Gladiator"           │ -1.18307  │ 0.000210997 │, Duplicates and NA
│ 13  │ "Interstellar"        │ 0.510712  │ 0.428202    │
│ 14  │ "Interstellar"        │ -0.785771 │ 0.870153    │
│ 15  │ "Interstellar"        │ -2.12227  │ 0.239937    │

julia> 

No comments:

Post a Comment