NumPy (short for Numerical Python) is the fundamental package for scientific computing with Python, and libraries like Pandas and Matplotlib are built on top of NumPy.
This tutorial is code-oriented, so I'm gonna keep it short on theory.
In Machine Learning, arrays are considered as the main data structure. So, let's start things off with NumPy arrays.
In the NumPy library, arrays are considered n-dimensional arrays (ndarray) and are the main data structure in NumPy. A simple way to create an array from a Python list is by using the array() function.
First off, we will start things off with importing numpy. We import numpy as np, so that in code later on, we don't have to write numpy, we just have to use np as alias when we want to call any of its built-in functions.
import numpy as np # create a python list l = [1, 2, 3, 4, 5] # convert python list to array a = np.array(l) print(a) # find array shape print(a.shape) # find array datatype print(a.dtype) # returns int64 because array a is of type int64.
The example code creates a Python list of 5 int values.
NumPy's Array Functions
NumPy provides a couple of functions to create fixed-size arrays:
To create a new array of given shape and type, with random values.
import numpy as np a = np.empty([3,3]) print(a)
The example code creates an empty 3x3 2-dimensional array with random values.
To create a new array of specific size, filled with zero values.
import numpy as np a = np.zeros([2, 3]) print(a)
The example code creates a 2x3 zero two-dimensional array.
To create a new array of specific size, filled with one values.
import numpy as np a = np.ones([2, 3]) print(a)
The example code creates a 2x3 one two-dimensional array.
Combining NumPy Arrays
If we have two arrays, we can combine them in two different ways:
- Horizontal Stacking.
- Vertical Stacking.
from numpy import array from numpy import hstack from numpy import vstack # sample arrays a = array([1, 2, 3]) b = array([4, 5, 6]) # horizontal stacking h = hstack((a, b)) # stacks array a & b horizontally together. print(h) print(h.shape) # vertical stacking v = vstack((a, b)) # stacks array a & b vertically together. print(v) print(v.shape)
Once you have created your NumPy array, you can also access the contents of the array via Indexing.
A simple concept for now in Array Indexing is One-dimensional indexing, which works when you want to access contents of a 1-D array, starting off with zero-offset index. Indexing starts with 0, so the first item in array has index of 0, the second array item has index of 1 and so on. Below is the syntax for One-Dimensional Indexing:
import numpy as np # define an array a = np.array([1, 2, 3, 4, 5, 6]) # index array print(a) # prints 1 print(a) # prints 6
You can also use negative indexes to retrieve values from the end of the array. For example, to retrieve the last item in the array, you can use the index -1. Similarly, The index -2 returns the second last item from the array.
import numpy as np # define the array data = np.array([1, 2, 3, 4, 5]) print(data[-1]) #=> returns 5 since it is the last item in the array. print(data[-5]) #=> returns 1 since it is the 5th last item in the array.
Providing integers too large for the bound of array will return an Index Out of Bound error.
The index to retrieve values from the beginning of array starts with 0, but if you are trying to retrieve values from the end of array, it starts with 1, that's why you use -1 for retrieving the last item from the array.
There is also Two-Dimensional Indexing which works when we are dealing with 2-D data, and a comma is used to separate the index of each dimension.
A rather similar concept is Array Slicing. We can slice and retrieve a subsequence of an array through Slicing. In Machine Learning, it is pretty useful when specifying input and output variables, or while the train-test-split(a technique used for splitting training set from test set).
Slicing is specified using the colon (:) operator with a from and to index, like so: data[from:to]. The way it works is that the expression slices the data array, starting from the from index upto "but not including" the to index. Let' see this in code:
import numpy as np # define the array data = np.array([1, 2, 3, 4, 5]) print(data[0:2]) #=> prints the array content from 0th index upto but not including 2nd index.
You can also slice the entire array, like so:
import numpy as np # define the array data = np.array([1, 2, 3, 4, 5]) print(data[:]) #=> The : operator without any index specified gives the entire array (in case of 1-D array)
As we have seen with Indexing, we can also use negative indexes in Slicing as well. For example, if we want to slice the last three items in an array, we can do it by starting the slice at -2 index.
import numpy as np data = np.array([1, 2, 3, 4, 5]) print(data[-3: ])
Notice how we didn't specified the to index here, that's because we want our slice operation to continue till the end of the array, giving us all the items including and after the -3rd index.
So far we have seen 1-Dimensional Slicing. Now, let's see how 2-Dimensional Slicing works.
To understand 2-Dimensional Slicing, let's take help of an example. Let's assume we have a big table consisting of rows and columns (a 2-D array of 5x5 dimension). We want to retrieve all of its rows and only the first 3 columns.
import numpy as np data = np.array([ [1, 2, 3, 4, 5], [6, 7, 8, 9, 10], [11, 12, 13, 14, 15], [16, 17, 18, 19, 20], [21, 22, 23, 24, 25] ]) X = data[:, 0:3] print(X)
You could also have used negative index in the above example, like so:
X = data[:, :-2]
After applying the Slicing Operation, you may need to reshape your data. Many Machine Learning libraries may require your 1-Dimensional data to be reshaped as 2-Dimensioanl data.
NumPy arrays have a shape attribute which returns a tuple of the length of each dimension of the array.
import numpy as np data = np.array([1, 2, 3, 4, 5]) print(data.shape)
For a one-dimensional array, a tuple of 1 length is returned, and for a two-dimensional array, a tuple with 2 lengths is returned.
import numpy as np data = np.array([ [1, 2, 3], [4, 5, 6], [7, 8, 9] ]) print(data.shape)
Here the shape attribute returns (3, 3) which means the data array has 3 rows, and 3 columns.
Now, that we know about how to find the shape of the array, it's time to learn about reshaping.
Reshaping 1D Array to 2D Array
To reshape a one-dimensional array to two-dimensional array, we use the reshape() function on the NumPy array. The function takes a single argument which specifies the new shape of the array. It is quite common to reshape a 1D array to a 2D array with one column.
import numpy as np data = np.array([1, 2, 3, 4, 5]) print(data.shape) # reshape data = data.reshape((data.shape, 1)) print(data.shape)
Here, we just added 1 for the second dimension (i.e. column).
Reshaping 2D Array to 3D Array
To reshape a 2D array to 3D Array, we follow the same procedure as above. In the 2D array, we already have two dimensions available, all we want now is the third dimension. We do this like we did in the previous case. We provide 1 as the third dimension.
import numpy as np data = np.array([ [1, 2], [3, 4], [5, 6] ]) print(data.shape) # reshape data = data.reshape((data.shape, data.shape, 1)) print(data.shape)
It is required to reshape 2D data to 3D data where each row represents a sequence into 3D array for algorithms like LSTM models in Keras library. This is why it's so important to understand reshaping.
So, that's it for now in this tutorial. I know this isn't a very comprehensive tutorial on NumPy. It's just an introduction to the basics, but it's just a good starting point. In the upcoming tutorials, as we will be going through Linear Algebra, we will be making use of NumPy quite extensively and therefore, we will encounter more NumPy concepts in the upcoming list of tutorials.
A journey of a thousand miles begins with a single step. ~ Lao Tzu
If you enjoyed reading this article, comment or just hit the like button below. If you want more content like this delivered to your inbox, sign up in the Subscription form below. 👇🏻
Until next time, Adiós!