Monday, September 11, 2017

Julia - Language - DataFrames - Renaming Columns

One more common problem in data science is the naming convention some data collectors use for their column names (variables). It is often required to rename these, at times even to help with deidentifying data to comply with regulations. The rename() and the permanent effect rename!() function can help us achieve just this.

$ julia
               _
   _       _ _(_)_     |  A fresh approach to technical computing
  (_)     | (_) (_)    |  Documentation: https://docs.julialang.org
   _ _   _| |_  __ _   |  Type "?help" for help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 0.6.0 (2017-06-19 13:05 UTC)
 _/ |\__'_|_|_|\__'_|  |  Official http://julialang.org/ release
|__/                   |  x86_64-pc-linux-gnu

julia> using DataFrames;

julia> my_small_df = DataFrame(x=1:10,y=rand(10),z=rand(["Afghanistan","Brazil","China","Denmark","England","Fiji","Guatemala"],10));

julia> my_small_df
10×3 DataFrames.DataFrame
│ Row │ x  │ y         │ z           │
├─────┼────┼───────────┼─────────────┤
│ 1   │ 1  │ 0.22024   │ "Guatemala" │
│ 2   │ 2  │ 0.0271676 │ "Denmark"   │
│ 3   │ 3  │ 0.757901  │ "China"     │
│ 4   │ 4  │ 0.605231  │ "China"     │
│ 5   │ 5  │ 0.779193  │ "Guatemala" │
│ 6   │ 6  │ 0.01555   │ "Brazil"    │
│ 7   │ 7  │ 0.441247  │ "England"   │
│ 8   │ 8  │ 0.35073   │ "Guatemala" │
│ 9   │ 9  │ 0.63757   │ "Denmark"   │
│ 10  │ 10 │ 0.922693  │ "China"     │

julia> 


The rename() function will rename temporarily only for display purposes.

julia> rename(my_small_df,:z,:Countries)
10×3 DataFrames.DataFrame
│ Row │ x  │ y         │ Countries   │
├─────┼────┼───────────┼─────────────┤
│ 1   │ 1  │ 0.22024   │ "Guatemala" │
│ 2   │ 2  │ 0.0271676 │ "Denmark"   │
│ 3   │ 3  │ 0.757901  │ "China"     │
│ 4   │ 4  │ 0.605231  │ "China"     │
│ 5   │ 5  │ 0.779193  │ "Guatemala" │
│ 6   │ 6  │ 0.01555   │ "Brazil"    │
│ 7   │ 7  │ 0.441247  │ "England"   │
│ 8   │ 8  │ 0.35073   │ "Guatemala" │
│ 9   │ 9  │ 0.63757   │ "Denmark"   │
│ 10  │ 10 │ 0.922693  │ "China"     │

julia> my_small_df
10×3 DataFrames.DataFrame
│ Row │ x  │ y         │ z           │
├─────┼────┼───────────┼─────────────┤
│ 1   │ 1  │ 0.22024   │ "Guatemala" │
│ 2   │ 2  │ 0.0271676 │ "Denmark"   │
│ 3   │ 3  │ 0.757901  │ "China"     │
│ 4   │ 4  │ 0.605231  │ "China"     │
│ 5   │ 5  │ 0.779193  │ "Guatemala" │
│ 6   │ 6  │ 0.01555   │ "Brazil"    │
│ 7   │ 7  │ 0.441247  │ "England"   │
│ 8   │ 8  │ 0.35073   │ "Guatemala" │
│ 9   │ 9  │ 0.63757   │ "Denmark"   │
│ 10  │ 10 │ 0.922693  │ "China"     │

To make it permanent, we need to use the rename!() function. Let us use Dict() , dictionary to rename the columns. 

julia> rename!(my_small_df,Dict(:x => :Col1, :y => :Col2, :z=>:Countries));

julia> my_small_df
10×3 DataFrames.DataFrame
│ Row │ Col1 │ Col2      │ Countries   │
├─────┼──────┼───────────┼─────────────┤
│ 1   │ 1    │ 0.22024   │ "Guatemala" │
│ 2   │ 2    │ 0.0271676 │ "Denmark"   │
│ 3   │ 3    │ 0.757901  │ "China"     │
│ 4   │ 4    │ 0.605231  │ "China"     │
│ 5   │ 5    │ 0.779193  │ "Guatemala" │
│ 6   │ 6    │ 0.01555   │ "Brazil"    │
│ 7   │ 7    │ 0.441247  │ "England"   │
│ 8   │ 8    │ 0.35073   │ "Guatemala" │
│ 9   │ 9    │ 0.63757   │ "Denmark"   │
│ 10  │ 10   │ 0.922693  │ "China"     │

julia> 


We can also use the names!() function to rename the columns:

julia> names!(my_small_df,[:Col01,:Col02,:Country]);

julia> my_small_df
10×3 DataFrames.DataFrame
│ Row │ Col01 │ Col02     │ Country     │
├─────┼───────┼───────────┼─────────────┤
│ 1   │ 1     │ 0.22024   │ "Guatemala" │
│ 2   │ 2     │ 0.0271676 │ "Denmark"   │
│ 3   │ 3     │ 0.757901  │ "China"     │
│ 4   │ 4     │ 0.605231  │ "China"     │
│ 5   │ 5     │ 0.779193  │ "Guatemala" │
│ 6   │ 6     │ 0.01555   │ "Brazil"    │
│ 7   │ 7     │ 0.441247  │ "England"   │
│ 8   │ 8     │ 0.35073   │ "Guatemala" │
│ 9   │ 9     │ 0.63757   │ "Denmark"   │
│ 10  │ 10    │ 0.922693  │ "China"     │

julia>

No comments:

Post a Comment