Technology Works: July 2017

Sunday, July 23, 2017

Loading CSV files using D3.js in Chrome Brower

When we try to load files (json,csv,tsv,dsv) files to javascript console in Chrome Browser:

d3.json('/opt/pupils_museum.json', function(data){console.log(data[0]); });

we receiving the following errors:

XMLHttpRequest cannot load file://opt/pupils_museum.json. Cross origin requests are only supported for protocol schemes: http, data, chrome, chrome-extension, https.
XMLHttpRequest cannot load http://localhost:8888/pupils_museum.json. No 'Access-Control-Allow-Origin' header is present on the requested resource. Origin 'null' is therefore not allowed access.

For the first error message, I consulted the stackoverflow link: https://stackoverflow.com/questions/10752055/cross-origin-requests-are-only-supported-for-http-error-when-loading-a-local and the solution provided is :

Expose the files using a simple HTTP Server:

/opt/ $ python -m SimpleHTTPServer 8888
Serving HTTP on 0.0.0.0 port 8888 ...

Now we can try to access the file

Now, we try to access the json file from console:

d3.json('http://localhost:8888/pupils_museum.json', function(data){console.log(data[0]);});

We receive the second error message mentioned above. To solve this problem I have consulted the stackoverflow website: https://stackoverflow.com/questions/20035101/no-access-control-allow-origin-header-is-present-on-the-requested-resource

For this , we install the following chrome plugin: https://chrome.google.com/webstore/detail/allow-control-allow-origi/nlfbmbojpeacfghkpbjhddihlkkiljbi?hl=en-US

Now we try to acces the csv file again from localhost:

d3.json('http://localhost:8888/pupils_museum.json', function(data){console.log(data[0]);});
Object {header: function, mimeType: function, responseType: function, response: function, get: function…}
VM244:1 Object {name: "Annie Bell", gender: "girl", age: 10, attending: true}age: 10attending: truegender: "girl"name: "Annie Bell"__proto__: Object

Success!!

Wednesday, July 12, 2017

Julia - Language - Read and Write from files into DataFrames

Check the current working directory - where we will download the iris.csv file:

$ julia
_
_ _ _(_)_ | A fresh approach to technical computing
(_) | (_) (_) | Documentation: https://docs.julialang.org
_ _ _| |_ __ _ | Type "?help" for help.
| | | | | | |/ _` | |
| | |_| | | | (_| | | Version 0.6.0 (2017-06-19 13:05 UTC)
_/ |\__'_|_|_|\__'_| | Official http://julialang.org/ release
|__/ | x86_64-pc-linux-gnu

julia>

julia> ;

shell> pwd
/home/ubuntu

Download the iris.csv dataset file from the github URL:

julia> ;

shell> wget https://raw.githubusercontent.com/scikit-learn/scikit-learn/master/sklearn/datasets/data/iris.csv
--2017-07-12 15:00:08-- https://raw.githubusercontent.com/scikit-learn/scikit-learn/master/sklearn/datasets/data/iris.csv
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.8.133
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|151.101.8.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 2734 (2.7K) [text/plain]
Saving to: iris.csv

iris.csv 100%[=====================================================================================>] 2.67K --.-KB/s in 0s

2017-07-12 15:00:08 (32.1 MB/s) - iris.csv saved [2734/2734]

In real data science problems - data is loaded from files or someother external source. These files can be in any format:

CSV - comma separated values
TSV - tab separated values
WSV - whitespace separated values

Now, let us load the iris dataset into a dataframe using readtable() function. Even though the iris dataset is available in RDatasets package, we will use the downloaded csv file to work with external datasets.

julia> using DataFrames

julia> df_iris = readtable("iris.csv",header=true,separator=',');

julia> df_iris
150x5 DataFrames.DataFrame
| Row | x150 | x4 | setosa | versicolor | virginica |
-------------------------------------------------------
| 1 | 5.1 | 3.5 | 1.4 | 0.2 | 0 |
| 2 | 4.9 | 3.0 | 1.4 | 0.2 | 0 |
| 3 | 4.7 | 3.2 | 1.3 | 0.2 | 0 |
| 4 | 4.6 | 3.1 | 1.5 | 0.2 | 0 |
| 5 | 5.0 | 3.6 | 1.4 | 0.2 | 0 |
| 6 | 5.4 | 3.9 | 1.7 | 0.4 | 0 |
| 7 | 4.6 | 3.4 | 1.4 | 0.3 | 0 |
| 8 | 5.0 | 3.4 | 1.5 | 0.2 | 0 |
| 9 | 4.4 | 2.9 | 1.4 | 0.2 | 0 |
| 10 | 4.9 | 3.1 | 1.5 | 0.1 | 0 |
| 11 | 5.4 | 3.7 | 1.5 | 0.2 | 0 |
| 12 | 4.8 | 3.4 | 1.6 | 0.2 | 0 |
| 13 | 4.8 | 3.0 | 1.4 | 0.1 | 0 |
| 14 | 4.3 | 3.0 | 1.1 | 0.1 | 0 |
| 15 | 5.8 | 4.0 | 1.2 | 0.2 | 0 |
| 16 | 5.7 | 4.4 | 1.5 | 0.4 | 0 |
| 17 | 5.4 | 3.9 | 1.3 | 0.4 | 0 |
|
| 133 | 6.4 | 2.8 | 5.6 | 2.2 | 2 |
| 134 | 6.3 | 2.8 | 5.1 | 1.5 | 2 |
| 135 | 6.1 | 2.6 | 5.6 | 1.4 | 2 |
| 136 | 7.7 | 3.0 | 6.1 | 2.3 | 2 |
| 137 | 6.3 | 3.4 | 5.6 | 2.4 | 2 |
| 138 | 6.4 | 3.1 | 5.5 | 1.8 | 2 |
| 139 | 6.0 | 3.0 | 4.8 | 1.8 | 2 |
| 140 | 6.9 | 3.1 | 5.4 | 2.1 | 2 |
| 141 | 6.7 | 3.1 | 5.6 | 2.4 | 2 |
| 142 | 6.9 | 3.1 | 5.1 | 2.3 | 2 |
| 143 | 5.8 | 2.7 | 5.1 | 1.9 | 2 |
| 144 | 6.8 | 3.2 | 5.9 | 2.3 | 2 |
| 145 | 6.7 | 3.3 | 5.7 | 2.5 | 2 |
| 146 | 6.7 | 3.0 | 5.2 | 2.3 | 2 |
| 147 | 6.3 | 2.5 | 5.0 | 1.9 | 2 |
| 148 | 6.5 | 3.0 | 5.2 | 2.0 | 2 |
| 149 | 6.2 | 3.4 | 5.4 | 2.3 | 2 |
| 150 | 5.9 | 3.0 | 5.1 | 1.8 | 2 |

The readtable() function has been implemented with different method behaviours so that it can use the multiple dispatch functionality provided by Julia Language:

julia> methods(readtable)
# 3 methods for generic function "readtable":
readtable(io::IO) in DataFrames at /home/ubuntu/.julia/v0.6/DataFrames/src/dataframe/io.jl:820
readtable(io::IO, nbytes::Integer; header, separator, quotemark, decimal, nastrings, truestrings, falsestrings, makefactors, nrows, names, eltypes, allowcomments, commentmark, ignorepadding, skipstart, skiprows, skipblanks, encoding, allowescapes, normalizenames) in DataFrames at /home/ubuntu/.julia/v0.6/DataFrames/src/dataframe/io.jl:820
readtable(pathname::AbstractString; header, separator, quotemark, decimal, nastrings, truestrings, falsestrings, makefactors, nrows, names, eltypes, allowcomments, commentmark, ignorepadding, skipstart, skiprows, skipblanks, encoding, allowescapes, normalizenames) in DataFrames at /home/ubuntu/.julia/v0.6/DataFrames/src/dataframe/io.jl:930

We can see that there are three methods for the readtable() function. These methods support various kinds of data formats. For meanings of the keyword arguments refer the link:
https://juliastats.github.io/DataFrames.jl/stable/man/io/

We may want to output the results to a file. We do this by using the writetable() function.

julia> writetable("output_iris.csv",df_iris,header=true,separator=',')

julia> ;

shell> ls -l | grep iris
-rw-rw-r-- 1 ubuntu ubuntu 2734 Jul 12 15:00 iris.csv
-rw-rw-r-- 1 ubuntu ubuntu 2746 Jul 12 16:00 output_iris.csv

julia>

Julia - Language - Installing RDatasets

Since RDatasets is a registered Julia Package - installation is straightforward.

$ julia
_
_ _ _(_)_ | A fresh approach to technical computing
(_) | (_) (_) | Documentation: https://docs.julialang.org
_ _ _| |_ __ _ | Type "?help" for help.
| | | | | | |/ _` | |
| | |_| | | | (_| | | Version 0.6.0 (2017-06-19 13:05 UTC)
_/ |\__'_|_|_|\__'_| | Official http://julialang.org/ release
|__/ | x86_64-pc-linux-gnu

julia> Pkg.update()
INFO: Updating METADATA...
INFO: Computing changes...
INFO: Upgrading DataArrays: v0.5.3 => v0.6.1
INFO: Upgrading DataStructures: v0.5.3 => v0.6.0
INFO: Upgrading FileIO: v0.4.1 => v0.5.0
INFO: Upgrading StatsBase: v0.16.0 => v0.17.0

julia> Pkg.add("RDatasets")
INFO: Cloning cache of RData from https://github.com/JuliaStats/RData.jl.git
INFO: Cloning cache of RDatasets from https://github.com/johnmyleswhite/RDatasets.jl.git
INFO: Installing RData v0.1.0
INFO: Installing RDatasets v0.2.0
INFO: Package database updated

This adds RDatasets package to the current namespace. To use this package:

julia> using RDatasets
INFO: Recompiling stale cache file /home/ubuntu/.julia/lib/v0.6/FileIO.ji for module FileIO.
INFO: Recompiling stale cache file /home/ubuntu/.julia/lib/v0.6/DataStructures.ji for module DataStructures.
INFO: Recompiling stale cache file /home/ubuntu/.julia/lib/v0.6/DataFrames.ji for module DataFrames.

julia>

To see the available R packages:

julia> RDatasets.packages()
33x2 DataFrames.DataFrame
| Row | Package | Title |
---------------------------------------------------------------------------------------------------------------
| 1 | "COUNT" | "Functions, data and code for count data." |
| 2 | "Ecdat" | "Data sets for econometrics" |
| 3 | "HSAUR" | "A Handbook of Statistical Analyses Using R (1st Edition)" |
| 4 | "HistData" | "Data sets from the history of statistics and data visualization" |
| 5 | "ISLR" | "Data for An Introduction to Statistical Learning with Applications in R" |
| 6 | "KMsurv" | "Data sets from Klein and Moeschberger (1997), Survival Analysis" |
| 7 | "MASS" | "Support Functions and Datasets for Venables and Ripley's MASS" |
| 8 | "SASmixed" | "Data sets from \"SAS System for Mixed Models\"" |
| 9 | "Zelig" | "Everyone's Statistical Software" |
| 10 | "adehabitatLT" | "Analysis of Animal Movements" |
| 11 | "boot" | "Bootstrap Functions (Originally by Angelo Canty for S)" |
| 12 | "car" | "Companion to Applied Regression" |
| 13 | "cluster" | "Cluster Analysis Extended Rousseeuw et al." |
| 14 | "datasets" | "The R Datasets Package" |
| 15 | "gap" | "Genetic analysis package" |
| 16 | "ggplot2" | "An Implementation of the Grammar of Graphics" |
| 17 | "lattice" | "Lattice Graphics" |
| 18 | "lme4" | "Linear mixed-effects models using Eigen and S4" |
| 19 | "mgcv" | "Mixed GAM Computation Vehicle with GCV/AIC/REML smoothness estimation" |
| 20 | "mlmRev" | "Examples from Multilevel Modelling Software Review" |
| 21 | "nlreg" | "Higher Order Inference for Nonlinear Heteroscedastic Models" |
| 22 | "plm" | "Linear Models for Panel Data" |
| 23 | "plyr" | "Tools for splitting, applying and combining data" |
| 24 | "pscl" | "Political Science Computational Laboratory, Stanford University" |
| 25 | "psych" | "Procedures for Psychological, Psychometric, and Personality Research" |
| 26 | "quantreg" | "Quantile Regression" |
| 27 | "reshape2" | "Flexibly Reshape Data: A Reboot of the Reshape Package." |
| 28 | "robustbase" | "Basic Robust Statistics" |
| 29 | "rpart" | "Recursive Partitioning and Regression Trees" |
| 30 | "sandwich" | "Robust Covariance Matrix Estimators" |
| 31 | "sem" | "Structural Equation Models" |
| 32 | "survival" | "Survival Analysis" |
| 33 | "vcd" | "Visualizing Categorical Data" |

julia>

The 14th row contains the data sets available to R. To use this dataset:

julia> iris_dataset = dataset("datasets","iris")

INFO: Precompiling module RData.

150x5 DataFrames.DataFrame

-------------------------------------------------------------------------------------------

| 1 | 5.1 | 3.5 | 1.4 | 0.2 | "setosa" |

| 2 | 4.9 | 3.0 | 1.4 | 0.2 | "setosa" |

| 3 | 4.7 | 3.2 | 1.3 | 0.2 | "setosa" |

| 4 | 4.6 | 3.1 | 1.5 | 0.2 | "setosa" |

| 5 | 5.0 | 3.6 | 1.4 | 0.2 | "setosa" |

| 6 | 5.4 | 3.9 | 1.7 | 0.4 | "setosa" |

| 7 | 4.6 | 3.4 | 1.4 | 0.3 | "setosa" |

| 8 | 5.0 | 3.4 | 1.5 | 0.2 | "setosa" |

| 9 | 4.4 | 2.9 | 1.4 | 0.2 | "setosa" |

| 10 | 4.9 | 3.1 | 1.5 | 0.1 | "setosa" |

| 11 | 5.4 | 3.7 | 1.5 | 0.2 | "setosa" |

| 12 | 4.8 | 3.4 | 1.6 | 0.2 | "setosa" |

| 13 | 4.8 | 3.0 | 1.4 | 0.1 | "setosa" |

| 14 | 4.3 | 3.0 | 1.1 | 0.1 | "setosa" |

| 15 | 5.8 | 4.0 | 1.2 | 0.2 | "setosa" |

| 16 | 5.7 | 4.4 | 1.5 | 0.4 | "setosa" |

| 17 | 5.4 | 3.9 | 1.3 | 0.4 | "setosa" |

| 133 | 6.4 | 2.8 | 5.6 | 2.2 | "virginica" |

| 134 | 6.3 | 2.8 | 5.1 | 1.5 | "virginica" |

| 135 | 6.1 | 2.6 | 5.6 | 1.4 | "virginica" |

| 136 | 7.7 | 3.0 | 6.1 | 2.3 | "virginica" |

| 137 | 6.3 | 3.4 | 5.6 | 2.4 | "virginica" |

| 138 | 6.4 | 3.1 | 5.5 | 1.8 | "virginica" |

| 139 | 6.0 | 3.0 | 4.8 | 1.8 | "virginica" |

| 140 | 6.9 | 3.1 | 5.4 | 2.1 | "virginica" |

| 141 | 6.7 | 3.1 | 5.6 | 2.4 | "virginica" |

| 142 | 6.9 | 3.1 | 5.1 | 2.3 | "virginica" |

| 143 | 5.8 | 2.7 | 5.1 | 1.9 | "virginica" |

| 144 | 6.8 | 3.2 | 5.9 | 2.3 | "virginica" |

| 145 | 6.7 | 3.3 | 5.7 | 2.5 | "virginica" |

| 146 | 6.7 | 3.0 | 5.2 | 2.3 | "virginica" |

| 147 | 6.3 | 2.5 | 5.0 | 1.9 | "virginica" |

| 148 | 6.5 | 3.0 | 5.2 | 2.0 | "virginica" |

| 149 | 6.2 | 3.4 | 5.4 | 2.3 | "virginica" |

| 150 | 5.9 | 3.0 | 5.1 | 1.8 | "virginica" |

julia>

We see, dataset is a function which takes two arguments.

package name
dataset name within package which we intend to load

This, we have loaded the famous iris dataset into memory. We can see that the dataset function has returned a DataFrame. This dataset contains 5 columns.

SepalLength
SepalWidth
PetalLength
PetalWidth
Species

Also, the data can be easily understood. A large number of samples have been taken for every species, and the length and width of sepal and petal have been measured which can be used later to distinguish between them.

For packages in Julia - refer: https://pkg.julialang.org/

Tuesday, July 11, 2017

Julia - Language - Install DataFrames.jl

DataFrames are recommended data structures for statistical analysis. Julia provides a package called DataFrames.jl, which has all necessary functions to work with DataFrames.

To install DataFrames.jl in julia pls follow the following steps:

$ julia
_
_ _ _(_)_ | A fresh approach to technical computing
(_) | (_) (_) | Documentation: https://docs.julialang.org
_ _ _| |_ __ _ | Type "?help" for help.
| | | | | | |/ _` | |
| | |_| | | | (_| | | Version 0.6.0 (2017-06-19 13:05 UTC)
_/ |\__'_|_|_|\__'_| | Official http://julialang.org/ release
|__/ | x86_64-pc-linux-gnu

julia> Pkg.add("DataFrames")
INFO: Cloning cache of DataArrays from https://github.com/JuliaStats/DataArrays.jl.git
INFO: Cloning cache of DataFrames from https://github.com/JuliaStats/DataFrames.jl.git
INFO: Cloning cache of DataStructures from https://github.com/JuliaCollections/DataStructures.jl.git
INFO: Cloning cache of FileIO from https://github.com/JuliaIO/FileIO.jl.git
INFO: Cloning cache of GZip from https://github.com/JuliaIO/GZip.jl.git
INFO: Cloning cache of SortingAlgorithms from https://github.com/JuliaCollections/SortingAlgorithms.jl.git
INFO: Cloning cache of SpecialFunctions from https://github.com/JuliaMath/SpecialFunctions.jl.git
INFO: Cloning cache of StatsBase from https://github.com/JuliaStats/StatsBase.jl.git
INFO: Installing Compat v0.26.0
INFO: Installing DataArrays v0.5.3
INFO: Installing DataFrames v0.10.0
INFO: Installing DataStructures v0.5.3
INFO: Installing FileIO v0.4.1
INFO: Installing GZip v0.3.0
INFO: Installing Reexport v0.0.3
INFO: Installing SortingAlgorithms v0.1.1
INFO: Installing SpecialFunctions v0.1.1
INFO: Installing StatsBase v0.16.0
INFO: Package database updated
INFO: METADATA is out-of-date you may not have the latest version of DataFrames
INFO: Use `Pkg.update()` to get the latest versions of your packages

julia> using DataArrays
INFO: Precompiling module DataArrays.

julia> using DataFrames
INFO: Precompiling module DataFrames.

julia>

For packages in Julia - refer: https://pkg.julialang.org/

Saturday, July 1, 2017

Django - 027 - Philosophies and Limitations

First and foremost, the limitations to the DTL (Django Template Language) are intentional.

Django was developed in the high volume, ever-changing environment of an online newsroom. The original creators of Django had a very definite set of philosophies in creating the DTL.

These philosophies remain core to Django today. They are:

Separate Logic from presentation
Discourage redundancy
Be decoupled from HTML
XML is bad
Assume designer competence
Treat whitespace obviously
Don't invent a programming language
Ensure safety and security
Extensible

Following is the explanation for this:

1. Separate Logic from Presentation:
A template system is a tool that controls presentation and presentation related logic and that is it. The template system should not support functionality that goes beyond this basic goal.

2. Discourage Redundancy:
The majority of dynamic websites use some sort of common site-wide design - a common header, footer, navigation bar and so on. The Django template system should make it easy to store those elements in a single place, eliminating duplicate code. This is the philosophy behind template inheritance.

3. Be decoupled from HTML:
The template system should not be designed so that it only outputs HTML. It should be equally good at generating other text based formats, or just plain text.

4. XML should not be used for template languages:
Using an XML engine to parse templates introduces a whole new world of human error in editing templates - an incurs an unacceptable level of overhead in template processing.

5. Assume Designer comptance:
The template system should not be designed so that templates necessarily are displayed nicely in WYSIWYG editors such as Dreamweaver. That is too severe of a limitation and wouldn't allow the syntax to be as nice as it is.

Django expects template authors are comfortable editing HTML directly.

6. Treat whitespace obviously:
The template system should not do magic things with whitespace. If a template includes whitespace, the system should treat the whitespace as it treats text-just display it. Any whitespace that is not in a template tag should be displayed.

7. Do not invent a programming language:
The template system intentionally does not allow the following:
Assignment to variables.
Advanced Logic

The goal is not to invent a programming language. The goal is to offer just enough programming-esque functionality, such as branching and looping, it is essential for making presentation related decisions.

The Django template system recognizes that templates are most often written by designers, not programmers, and therefore should not assume Python knowledge.

8. Safety and Security:
The template system, out of the box, should forbid the inclusion of malicious code - such as commands that delete database records. This is another reason the template system does not allow arbitrary Python code.

9. Extensibility:
The template system should recognize that advanced template authors may want to extend its technology. This is the philosophy behind custom template tags and filters.

When the pressure is on to GetStuffDone, and you have both designers and programmers trying to communicate and get all of the last minute tasks done, Django just gets out of the way and lets each team concentrate on what they are good at.

Once you have found this out for yourself through real-life practice, you will find out very quickly why Django really is the framework for perfectionists with deadlines.

With all this in mind, Django is flexible - it does not require you to use the DTL. More than any other component of web applications, template syntax is highly subjective, and programmers opinions vary wildly. The fact that Python alone has dozens, if not hundreds, of open-source template-language implementations supports this point. Each was likely created because its developer deemed all existing template-languages inadequate.

Because Django is intended to be a full-stack web framework that provides all the pieces necessary for web developers to be productive, most times it is more convenient to use the DTL, but it is not a strict requirement in any sense.