Technology Works: August 2017

Monday, August 28, 2017

Julia - Language - Functions - Passing functions as arguments

When we pass a function as a argument, the functional argument will actually call the function. Let us write and example, where we can see its usage:

$ julia
_
_ _ _(_)_ | A fresh approach to technical computing
(_) | (_) (_) | Documentation: https://docs.julialang.org
_ _ _| |_ __ _ | Type "?help" for help.
| | | | | | |/ _` | |
| | |_| | | | (_| | | Version 0.6.0 (2017-06-19 13:05 UTC)
_/ |\__'_|_|_|\__'_| | Official http://julialang.org/ release
|__/ | x86_64-pc-linux-gnu

julia> function myStringFunc(xStr)
yStr = xStr()
println("My String is $yStr\!")
end
myStringFunc (generic function with 1 method)

Creating a second function to embed into the first function's argument:

julia> function myInputFunc()
return(":: julialang.com")
end
myInputFunc (generic function with 1 method)

julia> myStringFunc(myInputFunc)
My String is :: julialang.com!

julia>

Hence, it is possible to embed function inside a function argument.

Julia - Language - Functions - Stabby and Do construct to create anonymous functions

There are two types of anonymous functions in Julia:

Stabby Functions
Do Block

These are quick and dirty functions. They are called anonymous functions as they have no name. That means we cannot call these functions later - the way we do for other functions.

Examples:

$ julia
_
_ _ _(_)_ | A fresh approach to technical computing
(_) | (_) (_) | Documentation: https://docs.julialang.org
_ _ _| |_ __ _ | Type "?help" for help.
| | | | | | |/ _` | |
| | |_| | | | (_| | | Version 0.6.0 (2017-06-19 13:05 UTC)
_/ |\__'_|_|_|\__'_| | Official http://julialang.org/ release
|__/ | x86_64-pc-linux-gnu

julia> y -> 3y^2 + 2y -2
(::#1) (generic function with 1 method)

julia>

Since, the stabby function cannot be called - we can use the map() function to apply the values in an array to this stabby function.

julia> map(y -> 3y^2 + 2y -2,[1,2,3,4,5])
5-element Array{Int64,1}:
3
14
31
54
83

julia>

Using do to create anonymous function:

julia> map([1,2,3,4,5]) do y
3y^2 + 2y -2
end
5-element Array{Int64,1}:
3
14
31
54
83

julia>

The beauty of do statement is that we can add else clause to it.

julia> map([3,6,9,10,11]) do y
if mod(y,3) == 0
100y
elseif mod(y,3) == 1
200y
else
mod(y,3) == 2
300y
end
end
5-element Array{Int64,1}:
300
600
900
2000
3300

julia>

Julia - Language - Functions - Type parameters as function input

There is a facility in Julia, by which we can restrict a function to accept only certain datatypes. This helps to tighten our code and also improves the performance.

$ julia
_
_ _ _(_)_ | A fresh approach to technical computing
(_) | (_) (_) | Documentation: https://docs.julialang.org
_ _ _| |_ __ _ | Type "?help" for help.
| | | | | | |/ _` | |
| | |_| | | | (_| | | Version 0.6.0 (2017-06-19 13:05 UTC)
_/ |\__'_|_|_|\__'_| | Official http://julialang.org/ release
|__/ | x86_64-pc-linux-gnu

julia> function myFunc(myIntInput::Int)
return 5*myIntInput
end
myFunc (generic function with 1 method)

julia> myFunc(4.0)
ERROR: MethodError: no method matching myFunc(::Float64)
Closest candidates are:
myFunc(::Int64) at REPL[1]:2

julia> myFunc(4)
20

julia> methods(myFunc)
# 1 method for generic function "myFunc":
myFunc(myIntInput::Int64) in Main at REPL[1]:2

julia>

We can even include the type information in the method definition:

julia> function myArgTest{T<:Real}(x::T)
print("The value $x is of type $T")
end
myArgTest (generic function with 1 method)

julia>

Explanation of this function:
The curly braces {}, is placed in between the function name and the argument paranthesis. T is used by convention. In the above function definition we have used the {T<:Real} syntax. That means that allow only Real or any subtype of Real. We can be more specific, say - only allow integers
{T::Int}.

julia> myArgTest(5.6)
The value 5.6 is of type Float64
julia> myArgTest(pi)
The value π = 3.1415926535897... is of type Irrational{:π}
julia> myArgTest(4)
The value 4 is of type Int64
julia> myArgTest(4//9)
The value 4//9 is of type Rational{Int64}
julia> myArgTest(4/9)
The value 0.4444444444444444 is of type Float64
julia>

Here is an example of using two arguments that can be of any type, as long as they are the same.

julia> function myTypeAdd{T}(a::T,b::T)
return +(a,b)
end
myTypeAdd (generic function with 1 method)

julia>

Let us now add two complex numbers:

julia> myTypeAdd(4+3im,0+2im)
4 + 5im

julia>

Now let us try to add one complex number to an integer:

julia> myTypeAdd(4+3im,0)
ERROR: MethodError: no method matching myTypeAdd(::Complex{Int64}, ::Int64)
Closest candidates are:
myTypeAdd(::T, ::T) where T at REPL[15]:2

julia>

as expected, Julia has thrown an error.

By, this we have demostrated how to tighten up your code using Julia.

Julia - Language - Functions - Passing arrays as Arguments

Normally, if we want to use the elements of an array in a function as input, we do this by using loops. Julia provides us a cleaner way. For this, we use the map function. Let us see the usage of map function:

$ julia
_
_ _ _(_)_ | A fresh approach to technical computing
(_) | (_) (_) | Documentation: https://docs.julialang.org
_ _ _| |_ __ _ | Type "?help" for help.
| | | | | | |/ _` | |
| | |_| | | | (_| | | Version 0.6.0 (2017-06-19 13:05 UTC)
_/ |\__'_|_|_|\__'_| | Official http://julialang.org/ release
|__/ | x86_64-pc-linux-gnu

julia> inputArr = [-3,-2.5,-2,-1.5,-1,-0.5,0,0.5,1.0,1.5,2.0,2.5,3];

julia> function mySquare(a)
return a*a
end
mySquare (generic function with 1 method)

Now Let us map the array to the mySquare function:

julia> map(mySquare,inputArr)
13-element Array{Float64,1}:
9.0
6.25
4.0
2.25
1.0
0.25
0.0
0.25
1.0
2.25
4.0
6.25
9.0

julia>

Voila! We have recieved the squared outputs of the all the values in the array without using a loop. But, mapping is not always required some inbuilt Julia functions do element wise operations on arrays anyways. It is also a lot faster. In the first example we will map the array of integers from 1 to 10000 to the trigonometric sine function. We will use @time macro to time how long the mapping takes and then repeat the exercise using the inbuilt element-wise operation of the sine function.

julia> map(sin,collect(1:10))
10-element Array{Float64,1}:
0.841471
0.909297
0.14112
-0.756802
-0.958924
-0.279415
0.656987
0.989358
0.412118
-0.544021

julia> sin.(collect(1:10))
10-element Array{Float64,1}:
0.841471
0.909297
0.14112
-0.756802
-0.958924
-0.279415
0.656987
0.989358
0.412118
-0.544021

julia>

Refer: https://docs.julialang.org/en/stable/manual/functions/#man-vectorized-1
Let us time these function using the @time macro.

julia> @time map(sin,collect(1:100000000));
46.356702 seconds (11 allocations: 1.490 GiB, 0.40% gc time)

julia> @time sin.(collect(1:100000000));
46.524224 seconds (33 allocations: 1.490 GiB, 0.34% gc time)

julia>

Now, say we create an array and tuple of two elements:

julia> array_2inp = [3,4]
2-element Array{Int64,1}:
3
4

julia> tuple_2inp = (3,4)
(3, 4)

julia>

We have a function which accepts 2 inputs as arguments:

julia> function LinearCompute(x,y)
5*x+2*y
end
LinearCompute (generic function with 1 method)

julia>

Now, let us pass the array and tuple to the function:

julia> LinearCompute(array_2inp)
ERROR: MethodError: no method matching LinearCompute(::Array{Int64,1})
Closest candidates are:
LinearCompute(::Any, ::Any) at REPL[4]:2

julia> LinearCompute(tuple_2inp)
ERROR: MethodError: no method matching LinearCompute(::Tuple{Int64,Int64})
Closest candidates are:
LinearCompute(::Any, ::Any) at REPL[4]:2

julia>

Errors are received. To avoid this problem, we can use Ellipses or splat:

julia> LinearCompute(array_2inp...)
23

julia> LinearCompute(tuple_2inp...)
23

julia>

We have received the correct output.

With the exception of numbers and characters (or other plain data), values of arguments are passed by reference only and are not copied. They can therefore be altered. We find a good example in arrays. Have a look at this example:

julia> function add_myEle(a)
push!(a,14)
end
add_myEle (generic function with 1 method)

julia>

Here, push with a bang means that I am going to alter the array.

julia> add_myEle(array_2inp)
3-element Array{Int64,1}:
3
4
14

julia> array_2inp
3-element Array{Int64,1}:
3
4
14

julia>

Saturday, August 26, 2017

Julia - Language - Variable Number of Arguments using Ellipses / Splat (...)

Consider a scenario in which we want to create a function in which we do not know how many arguments we will pass to it in advance. How do we do this?
And to do this, we are going to use these three dots (...) with our argument name.

... - These 3 dots are called splat or ellipsis. Ellipsis indicate that we are passing zero, one or more than one arguments.

Example - function with ellipis:

julia> function how_many_args(myarguments...)
println("the number of arguments passed are $(length(myarguments))")
end
how_many_args (generic function with 1 method)

julia> how_many_args()
the number of arguments passed are 0

julia> how_many_args(1,2,3)
the number of arguments passed are 3

julia>

julia> a = 5
5

julia> b = 6
6

julia> c="sdf"
"sdf"

julia> how_many_args(a,b,c)
the number of arguments passed are 3

julia>

Let us see some other areas where this facility to handle unknown number of arguments in a function is useful.

Example: Usage of Join Function.

We will now write a function which accepts a array of strings and gives the output as a meaningful string.

julia> function join_string(input_array)
string_elements = join(input_array," , "," and ")
println("The elements in the string are: $(string_elements) \!")
end
join_string (generic function with 1 method)

For details on usage of Join function Refer: https://docs.julialang.org/en/stable/stdlib/strings/#Base.join

julia> join_string(["Steve Waugh","Mark Waugh","Ricky Ponting"])
The elements in the string are: Steve Waugh , Mark Waugh and Ricky Ponting !

julia> join_string(["Steve Waugh","Mark Waugh"])
The elements in the string are: Steve Waugh and Mark Waugh !

julia>

Let us see, what happens when you pass a single string as input to this function: join_string

julia> join_string("Ramesh")
The elements in the string are: R , a , m , e , s and h !

julia>

The single string input has got split into mulitiple little strings. To overcome this problem, let us create one more function using ellipsis or splat:

julia> function join_string_splat(input_string...)
string_elements = join(input_string," , "," and ")
println("The elements in the string are: $(string_elements) \!")
end
join_string_splat (generic function with 1 method)

julia>

julia> join_string_splat("Ramesh")
The elements in the string are: Ramesh !

julia> join_string_splat("Steve Waugh","Mark Waugh","Ricky Ponting")
The elements in the string are: Steve Waugh , Mark Waugh and Ricky Ponting !

julia>

Note: This time, we have not passed the input as a list of strings, but as separate strings.

To make our understanding better let us try one more example of multiple argments passed as ellipses / splat (...).

julia> function multiple_args(a,b,c...)
println("The argument list is $a , $b, $c")
end
multiple_args (generic function with 1 method)

julia> multiple_args(1,2,3,4,5,6,7,8,9,10,"sachin")
The argument list is 1 , 2, (3, 4, 5, 6, 7, 8, 9, 10, "sachin")

In the above example, the variable c has accepted the unlimited multiple arguments due to the presence of ellipses.

Let us now check if it is possible to combine keyword argument with splat arguments:

julia> function keyword_splat_arg(;a...)
a
end
keyword_splat_arg (generic function with 1 method)

julia> keyword_splat_arg(var1=1,var2=2,var3=3)
3-element Array{Any,1}:
(:var1, 1)
(:var2, 2)
(:var3, 3)

julia>

We can see that, the return vallue is an array of tuples with Key-Value pairs. The Key is the name we gave the keyword argument. Moreover, it is a symbol as there is a colon preceding it.

Monday, August 21, 2017

Julia - Language - Reshaping of Datasets - stackdf() Function - dump() function

In DataFrames Library, there is a function similar to the stack() function, it is the stackdf() function. The output result is same as that of stack() function.

Now, the question arises, why do we have stackdf() function afterall? The important difference is that stackdf() functioin returns a view into the original dataframe whereas stack() function returns actual data copies.

Refer: http://juliastats.github.io/DataFrames.jl/latest/lib/manipulation/#DataFrames.stackdf

Now, let us see how we can put stackdf() function to use:

Let us import the necessary packages:

$ julia
_
_ _ _(_)_ | A fresh approach to technical computing
(_) | (_) (_) | Documentation: https://docs.julialang.org
_ _ _| |_ __ _ | Type "?help" for help.
| | | | | | |/ _` | |
| | |_| | | | (_| | | Version 0.6.0 (2017-06-19 13:05 UTC)
_/ |\__'_|_|_|\__'_| | Official http://julialang.org/ release
|__/ | x86_64-pc-linux-gnu

julia> using RDatasets,DataFrames

julia>

Let us have a look at RDatasets Library:

julia> RDatasets.packages()
33×2 DataFrames.DataFrame
│ Row │ Package │ Title │
├─────┼────────────────┼───────────────────────────────────────────────────────────────────────────┤
│ 1 │ "COUNT" │ "Functions, data and code for count data." │
│ 2 │ "Ecdat" │ "Data sets for econometrics" │
│ 3 │ "HSAUR" │ "A Handbook of Statistical Analyses Using R (1st Edition)" │
│ 4 │ "HistData" │ "Data sets from the history of statistics and data visualization" │
│ 5 │ "ISLR" │ "Data for An Introduction to Statistical Learning with Applications in R" │
│ 6 │ "KMsurv" │ "Data sets from Klein and Moeschberger (1997), Survival Analysis" │
│ 7 │ "MASS" │ "Support Functions and Datasets for Venables and Ripley's MASS" │
│ 8 │ "SASmixed" │ "Data sets from \"SAS System for Mixed Models\"" │
│ 9 │ "Zelig" │ "Everyone's Statistical Software" │
│ 10 │ "adehabitatLT" │ "Analysis of Animal Movements" │
│ 11 │ "boot" │ "Bootstrap Functions (Originally by Angelo Canty for S)" │
│ 12 │ "car" │ "Companion to Applied Regression" │
│ 13 │ "cluster" │ "Cluster Analysis Extended Rousseeuw et al." │
│ 14 │ "datasets" │ "The R Datasets Package" │
│ 15 │ "gap" │ "Genetic analysis package" │
│ 16 │ "ggplot2" │ "An Implementation of the Grammar of Graphics" │
│ 17 │ "lattice" │ "Lattice Graphics" │
│ 18 │ "lme4" │ "Linear mixed-effects models using Eigen and S4" │
│ 19 │ "mgcv" │ "Mixed GAM Computation Vehicle with GCV/AIC/REML smoothness estimation" │
│ 20 │ "mlmRev" │ "Examples from Multilevel Modelling Software Review" │
│ 21 │ "nlreg" │ "Higher Order Inference for Nonlinear Heteroscedastic Models" │
│ 22 │ "plm" │ "Linear Models for Panel Data" │
│ 23 │ "plyr" │ "Tools for splitting, applying and combining data" │
│ 24 │ "pscl" │ "Political Science Computational Laboratory, Stanford University" │
│ 25 │ "psych" │ "Procedures for Psychological, Psychometric, and Personality Research" │
│ 26 │ "quantreg" │ "Quantile Regression" │
│ 27 │ "reshape2" │ "Flexibly Reshape Data: A Reboot of the Reshape Package." │
│ 28 │ "robustbase" │ "Basic Robust Statistics" │
│ 29 │ "rpart" │ "Recursive Partitioning and Regression Trees" │
│ 30 │ "sandwich" │ "Robust Covariance Matrix Estimators" │
│ 31 │ "sem" │ "Structural Equation Models" │
│ 32 │ "survival" │ "Survival Analysis" │
│ 33 │ "vcd" │ "Visualizing Categorical Data" │

julia>

The package of our interest is the datasets package. Let us fetch the famous iris dataset from this package into a dataframe:

julia> iris_df = dataset("datasets","iris");

julia> head(iris_df)
6×5 DataFrames.DataFrame
│ Row │ SepalLength │ SepalWidth │ PetalLength │ PetalWidth │ Species │
├─────┼─────────────┼────────────┼─────────────┼────────────┼──────────┤
│ 1 │ 5.1 │ 3.5 │ 1.4 │ 0.2 │ "setosa" │
│ 2 │ 4.9 │ 3.0 │ 1.4 │ 0.2 │ "setosa" │
│ 3 │ 4.7 │ 3.2 │ 1.3 │ 0.2 │ "setosa" │
│ 4 │ 4.6 │ 3.1 │ 1.5 │ 0.2 │ "setosa" │
│ 5 │ 5.0 │ 3.6 │ 1.4 │ 0.2 │ "setosa" │
│ 6 │ 5.4 │ 3.9 │ 1.7 │ 0.4 │ "setosa" │

julia>

Using stackdf() function:

julia> iris_df_stackdf = stackdf(iris_df);

julia> head(iris_df_stackdf)
6×3 DataFrames.DataFrame
│ Row │ variable │ value │ Species │
├─────┼─────────────┼───────┼──────────┤
│ 1 │ SepalLength │ 5.1 │ "setosa" │
│ 2 │ SepalLength │ 4.9 │ "setosa" │
│ 3 │ SepalLength │ 4.7 │ "setosa" │
│ 4 │ SepalLength │ 4.6 │ "setosa" │
│ 5 │ SepalLength │ 5.0 │ "setosa" │
│ 6 │ SepalLength │ 5.4 │ "setosa" │

Stack 2 columns only:

julia> iris_df_stackdf2 = stackdf(iris_df,1:2);

julia> unique(iris_df_stackdf2[1])
2-element Array{Symbol,1}:
:SepalLength
:SepalWidth

julia>

Stack 3 columns only:

julia> iris_df_stackdf3 = stack(iris_df,1:3);

julia> unique(iris_df_stackdf3[1])
3-element Array{Symbol,1}:
:SepalLength
:SepalWidth
:PetalLength

Stack 4 columns only:

julia> iris_df_stackdf4 = stack(iris_df,1:4);

julia> unique(iris_df_stackdf4[1])
4-element Array{Symbol,1}:
:SepalLength
:SepalWidth
:PetalLength
:PetalWidth

julia>

Checking the size of Original DataFrame:

julia> size(iris_df)

(150, 5)

Checking the size after stacking the original DataFrame:

julia> size(iris_df_stackdf)

(600, 3)

julia> size(iris_df_stackdf2)

(300, 5)

julia> size(iris_df_stackdf3)

(450, 4)

julia> size(iris_df_stackdf4)

(600, 3)

julia>

We can see stackdf() provides the same output as the stack() function. The difference is only in the nature of the datasets.

To see the nature of the underlying datasets - we need to use the dump() function

Dump of DataFrame resulted from output of stackdf() function:

julia> dump(iris_df_stackdf)

DataFrames.DataFrame 600 observations of 3 variables

variable: DataFrames.RepeatedVector{Symbol}

parent: Array{Symbol}((4,))

1: Symbol SepalLength

2: Symbol SepalWidth

3: Symbol PetalLength

4: Symbol PetalWidth

inner: Int64 150

outer: Int64 1 value: DataFrames.StackedVector

components: Array{Any}((4,))

1: DataArrays.DataArray{Float64,1}(150) [5.1, 4.9, 4.7, 4.6]

2: DataArrays.DataArray{Float64,1}(150) [3.5, 3.0, 3.2, 3.1]

3: DataArrays.DataArray{Float64,1}(150) [1.4, 1.4, 1.3, 1.5]

4: DataArrays.DataArray{Float64,1}(150) [0.2, 0.2, 0.2, 0.2]

Species: DataFrames.RepeatedVector{String}

parent: DataArrays.PooledDataArray{String,UInt8,1}(150) String["setosa", "setosa", "setosa", "setosa"]

inner: Int64 1

outer: Int64 4

julia>

Dump of DataFrame resulted from output of stack() function:

julia> dump(iris_df_stack)

DataFrames.DataFrame 600 observations of 3 variables

variable: Array{Symbol}((600,))

1: Symbol SepalLength

2: Symbol SepalLength

3: Symbol SepalLength

4: Symbol SepalLength

5: Symbol SepalLength

...

596: Symbol PetalWidth

597: Symbol PetalWidth

598: Symbol PetalWidth

599: Symbol PetalWidth

600: Symbol PetalWidth value: DataArrays.DataArray{Float64,1}(600) [5.1, 4.9, 4.7, 4.6]

Species: DataArrays.PooledDataArray{String,UInt8,1}(600) String["setosa", "setosa", "setosa", "setosa"]

julia>

To understand, observe the difference in datatypes of the columns in the dataframes created by the stackdf() and stack() functions respectively.

Julia - Language - Reshaping of Datasets - Stack Function

Somtimes, we may need to work on the dataset in a different form. For this purpose, we want to reshape the data. Let us illustrate this by using the iris datasets:

https://github.com/johnmyleswhite/RDatasets.jl#rdatasetsjl

julia> using DataFrames

julia> using RDatasets

julia> iris_df = dataset("datasets","iris");

julia> head(iris_df)
6×5 DataFrames.DataFrame
│ Row │ SepalLength │ SepalWidth │ PetalLength │ PetalWidth │ Species │
├─────┼─────────────┼────────────┼─────────────┼────────────┼──────────┤
│ 1 │ 5.1 │ 3.5 │ 1.4 │ 0.2 │ "setosa" │
│ 2 │ 4.9 │ 3.0 │ 1.4 │ 0.2 │ "setosa" │
│ 3 │ 4.7 │ 3.2 │ 1.3 │ 0.2 │ "setosa" │
│ 4 │ 4.6 │ 3.1 │ 1.5 │ 0.2 │ "setosa" │
│ 5 │ 5.0 │ 3.6 │ 1.4 │ 0.2 │ "setosa" │
│ 6 │ 5.4 │ 3.9 │ 1.7 │ 0.4 │ "setosa" │

julia>

Using stack() function to reshape the dataset:

julia> iris_df_stack = stack(iris_df);

julia> head(iris_df_stack)

6×3 DataFrames.DataFrame

│ Row │ variable │ value │ Species │

├─────┼─────────────┼───────┼──────────┤

│ 1 │ SepalLength │ 5.1 │ "setosa" │

│ 2 │ SepalLength │ 4.9 │ "setosa" │

│ 3 │ SepalLength │ 4.7 │ "setosa" │

│ 4 │ SepalLength │ 4.6 │ "setosa" │

│ 5 │ SepalLength │ 5.0 │ "setosa" │

│ 6 │ SepalLength │ 5.4 │ "setosa" │

julia>

Just observe how the DataFrame (iris_df) has got transformed to iris_df_stack. The following columns in the DataFrame, iris_df:

SepalLength
SepalWidth
PetalLength
PetalWidth

are now available as rows in the DataFrame, iris_df under the column: variable. And the values under these columns have been transposed and available under the column: value in the iris_df_stack DataFrame

We can now say that: our dataset has been stacked - as we have stacked all our columns. To get more hold over the stack() function, let us try to stack only specific columns and observe the output:

Stacking 2 columns:

julia> iris_df_stack2 = stack(iris_df,1:2);

Get the Unique values in the first column (variable) to verify:

julia> unique(iris_df_stack2,1)

2×5 DataFrames.DataFrame

│ Row │ variable │ value │ PetalLength │ PetalWidth │ Species │

├─────┼─────────────┼───────┼─────────────┼────────────┼──────────┤

│ 1 │ SepalLength │ 5.1 │ 1.4 │ 0.2 │ "setosa" │

│ 2 │ SepalWidth │ 3.5 │ 1.4 │ 0.2 │ "setosa" │

Stacking 3 columns:

julia> iris_df_stack3 = stack(iris_df,1:3);

Get the Unique values in the first column (variable) to verify:

julia> unique(iris_df_stack3,1)

3×4 DataFrames.DataFrame

│ Row │ variable │ value │ PetalWidth │ Species │

├─────┼─────────────┼───────┼────────────┼──────────┤

│ 1 │ SepalLength │ 5.1 │ 0.2 │ "setosa" │

│ 2 │ SepalWidth │ 3.5 │ 0.2 │ "setosa" │

│ 3 │ PetalLength │ 1.4 │ 0.2 │ "setosa" │

Stacking 4 columns:

julia> iris_df_stack4 = stack(iris_df,1:4);

Get the Unique values in the first column (variable) to verify:

julia> unique(iris_df_stack4,1)

4×3 DataFrames.DataFrame

│ Row │ variable │ value │ Species │

├─────┼─────────────┼───────┼──────────┤

│ 1 │ SepalLength │ 5.1 │ "setosa" │

│ 2 │ SepalWidth │ 3.5 │ "setosa" │

│ 3 │ PetalLength │ 1.4 │ "setosa" │

│ 4 │ PetalWidth │ 0.2 │ "setosa" │

julia>

We can see that the columns we have chosen in the stack function are the ones which are available in the output DataFrame.

Let us observe the size of these DataFrames.

julia> size(iris_df)

(150, 5)

julia> size(iris_df_stack)

(600, 3)

julia> size(iris_df_stack2)

(300, 5)

julia> size(iris_df_stack3)

(450, 4)

julia> size(iris_df_stack4)

(600, 3)

julia>

Since, the original DataFrame had 150 rows. Hence, each column added to the Stack increases the size of the stack by 150 rows.

Refer: http://juliastats.github.io/DataFrames.jl/latest/lib/manipulation/#DataFrames.stack