NumPy (short for Numerical Python) is the fundamental package for scientific computing with Python, and libraries like Pandas and Matplotlib are built on top of NumPy.

This tutorial is code-oriented, so I'm gonna keep it short on theory.

In Machine Learning, arrays are considered as the main data structure. So, let's start things off with NumPy arrays.

NumPy Arrays

In the NumPy library, arrays are considered n-dimensional arrays (ndarray) and are the main data structure in NumPy. A simple way to create an array from a Python list is by using the array() function.

First off, we will start things off with importing numpy. We import numpy as np, so that in code later on, we don't have to write numpy, we just have to use np as alias when we want to call any of its built-in functions.

import numpy as np

# create a python list 
l = [1, 2, 3, 4, 5]

# convert python list to array
a = np.array(l)
print(a)

# find array shape 
print(a.shape)

# find array datatype
print(a.dtype)    # returns int64 because array a is of type int64. 

The example code creates a Python list of 5 int values.

Output

NumPy's Array Functions

NumPy provides a couple of functions to create fixed-size arrays:

Empty

To create a new array of given shape and type, with random values.

import numpy as np

a = np.empty([3,3])
print(a)

The example code creates an empty 3x3 2-dimensional array with random values.

Output

Zeros

To create a new array of specific size, filled with zero values.

import numpy as np

a = np.zeros([2, 3])
print(a)

The example code creates a 2x3 zero two-dimensional array.

Output

Ones

To create a new array of specific size, filled with one values.

import numpy as np

a = np.ones([2, 3])
print(a)

The example code creates a 2x3 one two-dimensional array.

Output

Combining NumPy Arrays

If we have two arrays, we can combine them in two different ways:

  • Horizontal Stacking.
  • Vertical Stacking.
from numpy import array
from numpy import hstack 
from numpy import vstack 

# sample arrays 
a = array([1, 2, 3])
b = array([4, 5, 6])

# horizontal stacking 
h = hstack((a, b))        # stacks array a & b horizontally together.
print(h)
print(h.shape)

# vertical stacking
v = vstack((a, b))        # stacks array a & b vertically together.
print(v)
print(v.shape)
Output

Array Indexing

Once you have created your NumPy array, you can also access the contents of the array via Indexing.

A simple concept for now in Array Indexing is One-dimensional indexing, which works when you want to access contents of a 1-D array, starting off with zero-offset index. Indexing starts with 0, so the first item in array has index of 0, the second array item has index of 1 and so on. Below is the syntax for One-Dimensional Indexing:

import numpy as np

# define an array 
a = np.array([1, 2, 3, 4, 5, 6])

# index array
print(a[0])        # prints 1
print(a[5])        # prints 6

You can also use negative indexes to retrieve values from the end of the array. For example, to retrieve the last item in the array, you can use the index -1. Similarly, The index -2 returns the second last item from the array.

import numpy as np

# define the array 

data = np.array([1, 2, 3, 4, 5])

print(data[-1])        #=> returns 5 since it is the last item in the array. 

print(data[-5])        #=> returns 1 since it is the 5th last item in the array. 

Note:

  1. Providing integers too large for the bound of array will return an Index Out of Bound error.

  2. The index to retrieve values from the beginning of array starts with 0, but if you are trying to retrieve values from the end of array, it starts with 1, that's why you use -1 for retrieving the last item from the array.

There is also Two-Dimensional Indexing which works when we are dealing with 2-D data, and a comma is used to separate the index of each dimension.


Array Slicing

A rather similar concept is Array Slicing. We can slice and retrieve a subsequence of an array through Slicing. In Machine Learning, it is pretty useful when specifying input and output variables, or while the train-test-split(a technique used for splitting training set from test set).

Slicing is specified using the colon (:) operator with a from and to index, like so: data[from:to]. The way it works is that the expression slices the data array, starting from the from index upto "but not including" the to index. Let' see this in code:

import numpy as np 

# define the array 
data = np.array([1, 2, 3, 4, 5])
print(data[0:2])        #=> prints the array content from 0th index upto but not including 2nd index.
Output

You can also slice the entire array, like so:

import numpy as np

# define the array 
data = np.array([1, 2, 3, 4, 5])
print(data[:])        #=> The : operator without any index specified gives the entire array (in case of 1-D array)
Output

As we have seen with Indexing, we can also use negative indexes in Slicing as well. For example, if we want to slice the last three items in an array, we can do it by starting the slice at -2 index.

import numpy as np 

data = np.array([1, 2, 3, 4, 5])
print(data[-3: ])

Notice how we didn't specified the to index here, that's because we want our slice operation to continue till the end of the array, giving us all the items including and after the -3rd index.

Output

Two-Dimensional Slicing

So far we have seen 1-Dimensional Slicing. Now, let's see how 2-Dimensional Slicing works.

To understand 2-Dimensional Slicing, let's take help of an example. Let's assume we have a big table consisting of rows and columns (a 2-D array of 5x5 dimension). We want to retrieve all of its rows and only the first 3 columns.

import numpy as np

data = np.array([
    [1, 2, 3, 4, 5],
    [6, 7, 8, 9, 10],
    [11, 12, 13, 14, 15],
    [16, 17, 18, 19, 20],
    [21, 22, 23, 24, 25]
])

X = data[:, 0:3]
print(X) 
Output

You could also have used negative index in the above example, like so:

X = data[:, :-2]

Array Reshaping

After applying the Slicing Operation, you may need to reshape your data. Many Machine Learning libraries may require your 1-Dimensional data to be reshaped as 2-Dimensioanl data.

NumPy arrays have a shape attribute which returns a tuple of the length of each dimension of the array.

import numpy as np

data = np.array([1, 2, 3, 4, 5])
print(data.shape)
Output

For a one-dimensional array, a tuple of 1 length is returned, and for a two-dimensional array, a tuple with 2 lengths is returned.

import numpy as np

data = np.array([
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 9]
])
print(data.shape)
Output

Here the shape attribute returns (3, 3) which means the data array has 3 rows, and 3 columns.

Now, that we know about how to find the shape of the array, it's time to learn about reshaping.

Reshaping 1D Array to 2D Array

To reshape a one-dimensional array to two-dimensional array, we use the reshape() function on the NumPy array. The function takes a single argument which specifies the new shape of the array. It is quite common to reshape a 1D array to a 2D array with one column.

import numpy as np

data = np.array([1, 2, 3, 4, 5])
print(data.shape)

# reshape 
data = data.reshape((data.shape[0], 1))
print(data.shape)

Here, we just added 1 for the second dimension (i.e. column).

Output

Reshaping 2D Array to 3D Array

To reshape a 2D array to 3D Array, we follow the same procedure as above. In the 2D array, we already have two dimensions available, all we want now is the third dimension. We do this like we did in the previous case. We provide 1 as the third dimension.

import numpy as np

data = np.array([
    [1, 2],
    [3, 4],
    [5, 6]
])
print(data.shape)

# reshape 
data = data.reshape((data.shape[0], data.shape[1], 1))
print(data.shape)
Output

It is required to reshape 2D data to 3D data where each row represents a sequence into 3D array for algorithms like LSTM models in Keras library. This is why it's so important to understand reshaping.

Outro

So, that's it for now in this tutorial. I know this isn't a very comprehensive tutorial on NumPy. It's just an introduction to the basics, but it's just a good starting point. In the upcoming tutorials, as we will be going through Linear Algebra, we will be making use of NumPy quite extensively and therefore, we will encounter more NumPy concepts in the upcoming list of tutorials.

A journey of a thousand miles begins with a single step. ~ Lao Tzu

If you enjoyed reading this article, comment or just hit the like button below. If you want more content like this delivered to your inbox, sign up in the Subscription form below. 👇🏻

Until next time, Adiós!