Wednesday, July 12, 2017

Julia - Language - Installing RDatasets

Since RDatasets is a registered Julia Package - installation is straightforward.

$ julia
               _
   _       _ _(_)_     |  A fresh approach to technical computing
  (_)     | (_) (_)    |  Documentation: https://docs.julialang.org
   _ _   _| |_  __ _   |  Type "?help" for help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 0.6.0 (2017-06-19 13:05 UTC)
 _/ |\__'_|_|_|\__'_|  |  Official http://julialang.org/ release
|__/                   |  x86_64-pc-linux-gnu


julia> Pkg.update()
INFO: Updating METADATA...
INFO: Computing changes...
INFO: Upgrading DataArrays: v0.5.3 => v0.6.1
INFO: Upgrading DataStructures: v0.5.3 => v0.6.0
INFO: Upgrading FileIO: v0.4.1 => v0.5.0
INFO: Upgrading StatsBase: v0.16.0 => v0.17.0

julia> Pkg.add("RDatasets")
INFO: Cloning cache of RData from https://github.com/JuliaStats/RData.jl.git
INFO: Cloning cache of RDatasets from https://github.com/johnmyleswhite/RDatasets.jl.git
INFO: Installing RData v0.1.0
INFO: Installing RDatasets v0.2.0
INFO: Package database updated

This adds RDatasets package to the current namespace. To use this package:

julia> using RDatasets
INFO: Recompiling stale cache file /home/ubuntu/.julia/lib/v0.6/FileIO.ji for module FileIO.
INFO: Recompiling stale cache file /home/ubuntu/.julia/lib/v0.6/DataStructures.ji for module DataStructures.
INFO: Recompiling stale cache file /home/ubuntu/.julia/lib/v0.6/DataFrames.ji for module DataFrames.

julia> 

To see the available R packages:

julia> RDatasets.packages()
33x2 DataFrames.DataFrame
| Row | Package        | Title                                                                     |
---------------------------------------------------------------------------------------------------------------
| 1   | "COUNT"        | "Functions, data and code for count data."                                |
| 2   | "Ecdat"        | "Data sets for econometrics"                                              |
| 3   | "HSAUR"        | "A Handbook of Statistical Analyses Using R (1st Edition)"                |
| 4   | "HistData"     | "Data sets from the history of statistics and data visualization"         |
| 5   | "ISLR"         | "Data for An Introduction to Statistical Learning with Applications in R" |
| 6   | "KMsurv"       | "Data sets from Klein and Moeschberger (1997), Survival Analysis"         |
| 7   | "MASS"         | "Support Functions and Datasets for Venables and Ripley's MASS"           |
| 8   | "SASmixed"     | "Data sets from \"SAS System for Mixed Models\""                            |
| 9   | "Zelig"        | "Everyone's Statistical Software"                                         |
| 10  | "adehabitatLT" | "Analysis of Animal Movements"                                            |
| 11  | "boot"         | "Bootstrap Functions (Originally by Angelo Canty for S)"                  |
| 12  | "car"          | "Companion to Applied Regression"                                         |
| 13  | "cluster"      | "Cluster Analysis Extended Rousseeuw et al."                              |
| 14  | "datasets"     | "The R Datasets Package"                                                  |
| 15  | "gap"          | "Genetic analysis package"                                                |
| 16  | "ggplot2"      | "An Implementation of the Grammar of Graphics"                            |
| 17  | "lattice"      | "Lattice Graphics"                                                        |
| 18  | "lme4"         | "Linear mixed-effects models using Eigen and S4"                          |
| 19  | "mgcv"         | "Mixed GAM Computation Vehicle with GCV/AIC/REML smoothness estimation"   |
| 20  | "mlmRev"       | "Examples from Multilevel Modelling Software Review"                      |
| 21  | "nlreg"        | "Higher Order Inference for Nonlinear Heteroscedastic Models"             |
| 22  | "plm"          | "Linear Models for Panel Data"                                            |
| 23  | "plyr"         | "Tools for splitting, applying and combining data"                        |
| 24  | "pscl"         | "Political Science Computational Laboratory, Stanford University"         |
| 25  | "psych"        | "Procedures for Psychological, Psychometric, and Personality Research"    |
| 26  | "quantreg"     | "Quantile Regression"                                                     |
| 27  | "reshape2"     | "Flexibly Reshape Data: A Reboot of the Reshape Package."                 |
| 28  | "robustbase"   | "Basic Robust Statistics"                                                 |
| 29  | "rpart"        | "Recursive Partitioning and Regression Trees"                             |
| 30  | "sandwich"     | "Robust Covariance Matrix Estimators"                                     |
| 31  | "sem"          | "Structural Equation Models"                                              |
| 32  | "survival"     | "Survival Analysis"                                                       |
| 33  | "vcd"          | "Visualizing Categorical Data"                                            |

julia> 

The 14th row contains the data sets available to R. To use this dataset:

julia> iris_dataset = dataset("datasets","iris")
INFO: Precompiling module RData.
150x5 DataFrames.DataFrame
| Row | SepalLength | SepalWidth | PetalLength | PetalWidth | Species     |
-------------------------------------------------------------------------------------------
| 1   | 5.1         | 3.5        | 1.4         | 0.2        | "setosa"    |
| 2   | 4.9         | 3.0        | 1.4         | 0.2        | "setosa"    |
| 3   | 4.7         | 3.2        | 1.3         | 0.2        | "setosa"    |
| 4   | 4.6         | 3.1        | 1.5         | 0.2        | "setosa"    |
| 5   | 5.0         | 3.6        | 1.4         | 0.2        | "setosa"    |
| 6   | 5.4         | 3.9        | 1.7         | 0.4        | "setosa"    |
| 7   | 4.6         | 3.4        | 1.4         | 0.3        | "setosa"    |
| 8   | 5.0         | 3.4        | 1.5         | 0.2        | "setosa"    |
| 9   | 4.4         | 2.9        | 1.4         | 0.2        | "setosa"    |
| 10  | 4.9         | 3.1        | 1.5         | 0.1        | "setosa"    |
| 11  | 5.4         | 3.7        | 1.5         | 0.2        | "setosa"    |
| 12  | 4.8         | 3.4        | 1.6         | 0.2        | "setosa"    |
| 13  | 4.8         | 3.0        | 1.4         | 0.1        | "setosa"    |
| 14  | 4.3         | 3.0        | 1.1         | 0.1        | "setosa"    |
| 15  | 5.8         | 4.0        | 1.2         | 0.2        | "setosa"    |
| 16  | 5.7         | 4.4        | 1.5         | 0.4        | "setosa"    |
| 17  | 5.4         | 3.9        | 1.3         | 0.4        | "setosa"    |
|
| 133 | 6.4         | 2.8        | 5.6         | 2.2        | "virginica" |
| 134 | 6.3         | 2.8        | 5.1         | 1.5        | "virginica" |
| 135 | 6.1         | 2.6        | 5.6         | 1.4        | "virginica" |
| 136 | 7.7         | 3.0        | 6.1         | 2.3        | "virginica" |
| 137 | 6.3         | 3.4        | 5.6         | 2.4        | "virginica" |
| 138 | 6.4         | 3.1        | 5.5         | 1.8        | "virginica" |
| 139 | 6.0         | 3.0        | 4.8         | 1.8        | "virginica" |
| 140 | 6.9         | 3.1        | 5.4         | 2.1        | "virginica" |
| 141 | 6.7         | 3.1        | 5.6         | 2.4        | "virginica" |
| 142 | 6.9         | 3.1        | 5.1         | 2.3        | "virginica" |
| 143 | 5.8         | 2.7        | 5.1         | 1.9        | "virginica" |
| 144 | 6.8         | 3.2        | 5.9         | 2.3        | "virginica" |
| 145 | 6.7         | 3.3        | 5.7         | 2.5        | "virginica" |
| 146 | 6.7         | 3.0        | 5.2         | 2.3        | "virginica" |
| 147 | 6.3         | 2.5        | 5.0         | 1.9        | "virginica" |
| 148 | 6.5         | 3.0        | 5.2         | 2.0        | "virginica" |
| 149 | 6.2         | 3.4        | 5.4         | 2.3        | "virginica" |
| 150 | 5.9         | 3.0        | 5.1         | 1.8        | "virginica" |

julia> 

We see, dataset is a function which takes two arguments. 
  1. package name
  2. dataset name within package which we intend to load

This, we have loaded the famous iris dataset into memory. We can see that the dataset function has returned a DataFrame. This dataset contains 5 columns.
  1. SepalLength
  2. SepalWidth
  3. PetalLength
  4. PetalWidth
  5. Species

Also, the data can be easily understood. A large number of samples have been taken for every species, and the length and width of sepal and petal have been measured which can be used later to distinguish between them. 

For packages in Julia - refer: https://pkg.julialang.org/



No comments:

Post a Comment