Numpy Functions Cheat Sheet



This post updates a previous very popular post 50+ Data Science, Machine Learning Cheat Sheets by Bhavya Geethika. If we missed some popular cheat sheets, add them in the comments below.

Cheatsheets on Python, R and Numpy, Scipy, Pandas

The NumPy library is the core library for scientific computation in Python. It provides a high-performance multidimensional array object and tools for working with arrays. Check out the different sections below to learn the various array functions and tools NumPy offers. Creating Arrays 2. Inspecting Your Array 3. Array Mathematics 4. CHEAT SHEET Python NumPy A library consisting of multidimensional array objects and a collection of routines for processing those arrays. W h a t i s N u m P y? Import numpy as np –Import numpy I m p o r t C o n v e n t i o n FURTHERMORE: Python for Data Science Certification Training Course Mathematical and logical operations on arrays can be. NumPy Cheat Sheet — Python for Data Science. NumPy is the library that gives Python its ability to work with data at.

Python For Data Science Cheat Sheet SciPy - Linear Algebra Learn More Python for Data Science Interactively at www.datacamp.com. Interacting With NumPy Also see NumPy The SciPy library is one of the core packages for scientific computing that provides mathematical algorithms and convenience functions built on the NumPy extension of Python. Numpy.linsapce function This numpy.linspace function is used to create an array of evenly spaced numbers in a given interval. We can also determine the number of samples we want to generate (however, it is an optional parameter default value is set to fifty samples).

Data science is a multi-disciplinary field. Thus, there are thousands of packages and hundreds of programming functions out there in the data science world! An aspiring data enthusiast need not know all. A cheat sheet or reference card is a compilation of mostly used commands to help you learn that language’s syntax at a faster rate. Here are the most important ones that have been brainstormed and captured in a few compact pages.

Mastering Data science involves understanding of statistics, mathematics, programming knowledge especially in R, Python & SQL and then deploying a combination of all these to derive insights using the business understanding & a human instinct—that drives decisions.

Here are the cheat sheets by category:

Cheat sheets for Python:

Python is a popular choice for beginners, yet still powerful enough to back some of the world’s most popular products and applications. It's design makes the programming experience feel almost as natural as writing in English. Python basics or Python Debugger cheat sheets for beginners covers important syntax to get started. Community-provided libraries such as numpy, scipy, sci-kit and pandas are highly relied on and the NumPy/SciPy/Pandas Cheat Sheet provides a quick refresher to these.

  1. Python Cheat Sheet by DaveChild via cheatography.com
  2. Python Basics Reference sheet via cogsci.rpi.edu
  3. OverAPI.com Python cheatsheet
  4. Python 3 Cheat Sheet by Laurent Pointal

Cheat sheets for R:

The R's ecosystem has been expanding so much that a lot of referencing is needed. The R Reference Card covers most of the R world in few pages. The Rstudio has also published a series of cheat sheets to make it easier for the R community. The data visualization with ggplot2 seems to be a favorite as it helps when you are working on creating graphs of your results.

At cran.r-project.org:

At Rstudio.com:

  1. R markdown cheatsheet, part 2

Others:

  1. DataCamp’s Data Analysis the data.table way

Cheat sheets for MySQL & SQL:

For a data scientist basics of SQL are as important as any other language as well. Both PIG and Hive Query Language are closely associated with SQL- the original Structured Query Language. SQL cheatsheets provide a 5 minute quick guide to learning it and then you may explore Hive & MySQL!

  1. SQL for dummies cheat sheet

Cheat sheets for Spark, Scala, Java:

Apache Spark is an engine for large-scale data processing. For certain applications, such as iterative machine learning, Spark can be up to 100x faster than Hadoop (using MapReduce). The essentials of Apache Spark cheatsheet explains its place in the big data ecosystem, walks through setup and creation of a basic Spark application, and explains commonly used actions and operations.

  1. Dzone.com’s Apache Spark reference card
  2. DZone.com’s Scala reference card
  3. Openkd.info’s Scala on Spark cheat sheet
  4. Java cheat sheet at MIT.edu
  5. Cheat Sheets for Java at Princeton.edu

Cheat sheets for Hadoop & Hive:

Hadoop emerged as an untraditional tool to solve what was thought to be unsolvable by providing an open source software framework for the parallel processing of massive amounts of data. Explore the Hadoop cheatsheets to find out Useful commands when using Hadoop on the command line. A combination of SQL & Hive functions is another one to check out.

Cheat sheets for web application framework Django:

Django is a free and open source web application framework, written in Python. If you are new to Django, you can go over these cheatsheets and brainstorm quick concepts and dive in each one to a deeper level.

  1. Django cheat sheet part 1, part 2, part 3, part 4

Scipy Cheat Sheet

Cheat

Cheat sheets for Machine learning:

We often find ourselves spending time thinking which algorithm is best? And then go back to our big books for reference! These cheat sheets gives an idea about both the nature of your data and the problem you're working to address, and then suggests an algorithm for you to try.

  1. Machine Learning cheat sheet at scikit-learn.org
  2. Scikit-Learn Cheat Sheet: Python Machine Learning from yhat (added by GP)
  3. Patterns for Predictive Learning cheat sheet at Dzone.com
  4. Equations and tricks Machine Learning cheat sheet at Github.com
  5. Supervised learning superstitions cheatsheet at Github.com

Cheat sheets for Matlab/Octave

MATLAB (MATrix LABoratory) was developed by MathWorks in 1984. Matlab d has been the most popular language for numeric computation used in academia. It is suitable for tackling basically every possible science and engineering task with several highly optimized toolboxes. MATLAB is not an open-sourced tool however there is an alternative free GNU Octave re-implementation that follows the same syntactic rules so that most of coding is compatible to MATLAB.

Cheat sheets for Cross Reference between languages

Related:

  • Python cheatsheet

Operators¶

Command

Description

*

multiplication operation: 2*3 returns 6

**

power operation: 2**3 returns 8

@

matrix multiplication:

returns

Data Types¶

Command

Description

l=[a1,a2,,an]

Constructs a list containing the objects (a1, a2,..., an). You can append to the list using l.append().The (ith) element of (l) can be accessed using l[i]

t=(a1,a2,,an)

Constructs a tuple containing the objects (a1, a2,..., an). The (ith) element of (t) can be accessed using t[i]

Built-In Functions¶

Command

Description

len(iterable)

len is a function that takes an iterable, such as a list, tuple or numpy array and returns the number of items in that object.For a numpy array, len returns the length of the outermost dimension

returns 5.

zip

Make an iterator that aggregates elements from each of the iterables.

returns [(1,4),(2,5),(3,6)]

Iterating¶

Command

Description

forainiterable:

For loop used to perform a sequence of commands (denoted using tabs) for each element in an iterable object such as a list, tuple, or numpy array.An example code is

prints [1,4,9]

Comparisons and Logical Operators¶

Command

Description

ifcondition:

Performs code if a condition is met (using tabs). For example

squares (x) if (x) is (5), otherwise cubes it.

User-Defined Functions¶

Command

Description

lambda

Used for create anonymous one line functions of the form:

The code after the lambda but before variables specifies the parameters. The code after the colon tells python what object to return.

def

The def command is used to create functions of more than one line:

The code immediately following def names the function, in this example g .The variables in the parenthesis are the parameters of the function. The remaining lines of the function are denoted by tab indents.The return statement specifies the object to be returned.

Numpy¶

Command

Description

np.array(object,dtype=None)

np.array constructs a numpy array from an object, such as a list or a list of lists.dtype allows you to specify the type of object the array is holding.You will generally note need to specify the dtype.Examples:

A[i1,i2,,in]

Access a the element in numpy array A in with index i1 in dimension 1, i2 in dimension 2, etc.Can use : to access a range of indices, where imin:imax represents all (i) such that (imin leq i < imax).Always returns an object of minimal dimension.For example,

A[:,2]

returns the 2nd column (counting from 0) of A as a 1 dimensional array and

A[0:2,:]

returns the 0th and 1st rows in a 2 dimensional array.

np.zeros(shape)

Constructs numpy array of shape shape. Here shape is an integer of sequence of integers. Such as 3, (1, 2), (2, 1), or (5, 5). Thus

np.zeros((5,5))

Constructs an (5times 5) array while

np.zeros(5,5)

will throw an error.

np.ones(shape)

Same as np.zeros but produces an array of ones

np.linspace(a,b,n)

Returns a numpy array with (n) linearly spaced points between (a) and (b). For example

np.linspace(1,2,10)

returns

np.eye(N)

Constructs the identity matrix of size (N). For example

np.eye(3)

returns the (3times 3) identity matrix:

[begin{split}left(begin{matrix}1&0&00&1&0 0&0&1end{matrix}right)end{split}]

np.diag(a)

np.diag has 2 uses. First if a is a 2 dimensional array then np.diag returns the principle diagonal of the matrix.Thus

np.diag([[1,3],[5,6]])

returns [1,6].

If (a) is a 1 dimensional array then np.diag constructs an array with $a$ as the principle diagonal. Thus,

np.diag([1,2])

returns

[begin{split}left(begin{matrix}1&00&2end{matrix}right)end{split}]

np.random.rand(d0,d1,,dn)

Constructs a numpy array of shape (d0,d1,,dn) filled with random numbers drawn from a uniform distribution between :math`(0, 1)`.For example, np.random.rand(2,3) returns

np.random.randn(d0,d1,,dn)

Same as np.random.rand(d0,d1,,dn) except that it draws from the standard normal distribution (mathcal N(0, 1))rather than the uniform distribution.

A.T

Reverses the dimensions of an array (transpose).For example,if (x = left(begin{matrix} 1& 23&4end{matrix}right)) then x.T returns (left(begin{matrix} 1& 32&4end{matrix}right))

np.hstack(tuple)

Take a sequence of arrays and stack them horizontally to make a single array. For example

returns [1,2,3,2,3,4] while

returns (left( begin{matrix} 1&22&3 3&4 end{matrix}right))

np.vstack(tuple)

Like np.hstack. Takes a sequence of arrays and stack them vertically to make a single array. For example

returns

np.amax(a,axis=None)

By default np.amax(a) finds the maximum of all elements in the array (a).Can specify maximization along a particular dimension with axis.If

a=np.array([[2,1],[3,4]])#createsa2dimarray

then

np.amax(a,axis=0)#maximizationalongrow(dim0)

returns array([3,4]) and

np.amax(a,axis=1)#maximizationalongcolumn(dim1)

returns array([2,4])

np.amin(a,axis=None)

Same as np.amax except returns minimum element.

np.argmax(a,axis=None)

Performs similar function to np.amax except returns index of maximal element.By default gives index of flattened array, otherwise can use axis to specify dimension.From the example for np.amax

returns array([1,1]) and

returns array([0,1])

np.argmin(a,axis=None)

Same as np.argmax except finds minimal index.

np.dot(a,b) or a.dot(b)

Returns an array equal to the dot product of (a) and (b).For this operation to work the innermost dimension of (a) must be equal to the outermost dimension of (b).If (a) is a ((3, 2)) array and (b) is a ((2)) array then np.dot(a,b) is valid.If (b) is a ((1, 2)) array then the operation will return an error.

numpy.linalg¶

Command

Description

np.linalg.inv(A)

For a 2-dimensional array (A). np.linalg.inv returns the inverse of (A).For example, for a ((2, 2)) array (A)

returns

np.linalg.eig(A)

Returns a 1-dimensional array with all the eigenvalues of $A$ as well as a 2-dimensional array with the eigenvectors as columns.For example,

eigvals,eigvecs=np.linalg.eig(A)

returns the eigenvalues in eigvals and the eigenvectors in eigvecs.eigvecs[:,i] is the eigenvector of (A) with eigenvalue of eigval[i].

np.linalg.solve(A,b)

Constructs array (x) such that A.dot(x) is equal to (b). Theoretically should give the same answer as

but numerically more stable.

Pandas¶

Scipy Cheat Sheet Pdf

Command

Description

pd.Series()

Constructs a Pandas Series Object from some specified data and/or index

pd.DataFrame()

Constructs a Pandas DataFrame object from some specified data and/or index, column names etc.

or alternatively,

Plotting¶

Python Array Cheat Sheet

Command

Description

plt.plot(x,y,s=None)

The plot command is included in matplotlib.pyplot.The plot command is used to plot (x) versus (y) where (x) and (y) are iterables of the same length.By default the plot command draws a line, using the (s) argument you can specify type of line and color.For example ‘-‘, ‘- -‘, ‘:’, ‘o’, ‘x’, and ‘-o’ reprent line, dashed line, dotted line, circles, x’s, and circle with line through it respectively.Color can be changed by appending ‘b’, ‘k’, ‘g’ or ‘r’, to get a blue, black, green or red plot respectively.For example,

plots the cosine function on the domain (0, 10) with a green line with circles at the points (x, v)