A Quick Primer on Python for Data Science

“The only way to learn a new programming language is by writing programs in it.”Dennis Ritchie

Python — the most popular programming language for data science

In 1989, the Dutch programmer Guido van Rossum, who is currently working at Microsoft (previously at Google, Dropbox), invented Python. It was not a popular language until somewhat recently. The popularity exponentially shot up due to its adoption for machine learning projects and the availability of many libraries.

Python is an essential skill every data scientist should possess in order to excel in data exploration, extraction, analysis, and visualization.

The good thing is that it is not difficult to learn. In fact, Python is one of the easiest languages to learn and use, provided that you are familiar with the basic building blocks of any programming language.

In this post, I provide an easy approach to get up to speed with Python. I assume no prior programming knowledge. Thus, it is for absolute beginners. In this tutorial, I look at the building blocks such as variables, functions, loops, conditions, and so on to get you familiarized with the basics of programming. At the end of this post, I go through some of the popular and powerful data structures in Python that help you store and process data efficiently.

Python is an interpreted language. (FYI, C/C++ are compiled languages.)
Once you write your code, the compiler will start executing it line by line from the top of the file.

How can write and run the programs in this tutorial? — You can copy and paste these code into a free Python notebook such as Google Colab. It a ready-made Python environment that you can use it as your playground. Another option is to install python and use the interactive shell by typing python in the terminal or command prompt.

Write your own code!

I believe in learning by doing. Therefore, I strongly encourage you to play around with the code snippets to get a good grip of them. Copy them, modify them, add more lines, and you understand and remember more by doing this.

What are most popular versions of Python? — Python has two major versions: 2 and 3. I encourage you to use version 3 as it has been around for a while and it has the support for most of the Python libraries out there.

Enough beating the bush. Let’s get started.

The first program

# Displaying "Hello World!" on the screen
# print is a Python defined default function, we are simply using
# it. It takes one input parameter - a string.
print("Hello World!")

First Python Program

Line 1, 2 and 3 are Python comments. Python comments are written with a # sign in front of them. Comments are not executed by the compiler; they are for your reference so that you can understand your code later and also by others.

Line 4is the real meat of this first program. It tells the computer to print “Hello World!” to your computer screen. When you run the above program, you will see the following output:

Output of the first program

print is a Python defined built-in function. We will be using print and many other Python built-in functions to make our lives easier.

Hurray! You wrote your first Python program and executed it!

Programming Styles

Before we proceed further, I want to give you a flavor of how you can write programs in Python. There are three main ways to write Python codes.

  1. Unstructured
  2. Procedural
  3. Object-oriented

In unstructured programming, you write the code as one big monolithic file. It is discouraged to use this style of writing for large programs as it is quite difficult to manage. However, for small code snippets, like what we are going to do in this tutorial, it is a convenient way of writing programs.

In procedural programming, we group code into functional units called functions. There are two steps involved here:

  • Define (write) the function
  • Invoke (call) the function

You write a function once and invoke as many time as you want to execute it. In this post, we will mainly be using unstructured and procedural coding styles.

In object-oriented programming, you identify blueprints and create what is called a class for each blueprint. We will be exploring object-oriented programming in Python in a later post.

Don’t worry if the above description sounds complicated. It is just to set the stage. You will understand it better as we proceed further.

Variables

Variables are placeholders in memory that you can store different data.

From the basic data types, you can have numerical (e.g. integer or float) or string variables. Depending on the type of the variable (i.e. string or numerical), Python allocates different amounts of memory.

Unlike programming languages like C, C++, or Java, you do not specify a data type when creating a variable; Python automatically infers the type for you. Therefore, Python is called a dynamically-typed language. It’s good to know some jargon like this to impress others :)

Assuming you have installed Python 3 in your computer, you can perform interactive coding by opening a shell (in Windows Command Prompt and in MAC/Unix, the terminal) by typing python.

It will take you here. [My default version of python is 3.9. Any 3.x version should work for this guide.]

$ python
Python 3.9.12 (main, Apr 5 2022, 01:53:17)
[Clang 12.0.0 ] :: Anaconda, Inc. on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>>

Now let’s use this shell to explore variables. (you may use a Python notebook instead)

#integer variable
>>> x = 25
>>> print(x)
25

Variable x stores the value 25. Since 25 is an integer, Python infers that the variable x is of type integer (int for short).

#string variable
>>> name = "Alice"
>>> print(name)
Alice

Variable called name stores the value “Alice”. Since “Alice” is a string, Python infers that the variable “name” is of type string.

#assigning multiple times
>>> x = 25
>>> x = 10
>>> print(x)
10

When you assign variable values multiple times, older values get replaced by the newer values. In the above example, value 25 gets replaced by 10. There is only one placeholder value for x at all times.

Numbers, and Strings

In addition to integers, Python supports other numerical data types such as floating point numbers. For example,

#floating point numbers
>>> z = 1.53
>>> print(z)
1.53

Numerical variables support standard math operations such as addition, subtraction, multiplication, division, and modulus.

# Numerical operations
>>> a = 10
>>> b = 20 #addition
>>> print(a+b)
30 #subtraction
>>> print(a-b)
-10 #division
>>> print(a/b)
0.5 #modulus
>>> print(b%a)
0

Q. How do you do the multiplication of a and b?

Q. Say that we have another variable c. Its value is 30. How do you add all three variable values in one statement?

Now let’s move on to strings.

#define string variable firstname and assign value "Tim"
>>> firstname = "Tim" #print string
>>> print(firstname)
Tim #print the first character of the string variable
>>> print(firstname[0])
T#print the second character of the string variable
>>> print(firstname[1])
i #print the length of the string
>>> print(len(firstname))
3#print the first and second characters
>>> print(firstname[0:2])
Ti #print the second and third characters
>>> print(firstname[1:3])
im

Notice that a string acts like an array of characters. An array is a data structure that can hold multiple values of the same type.

In this example, firstname is an array of characters. It holds three characters: T, i and m. Notice that the index starts from 0 (not 1!). So, the first element in the array is firstname[0].

More string operations:

#string concatenation (combining strings)
>>> lastname = "Cook"
>>> fullname = firstname + " " + lastname
>>> print(fullname)
Tim Cook

Notice that fullname variable is the concatenation of three strings — two variables firstname and lastname and one constant string “ “. Plus (+) is the string concatenation operator.

Q. What happens if we remove the constant string from the above fullname variable?

Q. Define a variable to store “Hello World!”. How would you print only “Hello” from this variable?

Q. How would you print only “World!” from the above variable?

Functions

A function is a block of organized code, that can be used over and over again by simply invoking it by its name. With or without your knowledge, we have already used a few built-in functions that come with Python.

print()— print function, takes one argument and it simply prints the argument passed on to the screen. E.g. print(firstname) — it prints the value of the firstname variable.

len() — len function, takes one argument, a list/array and it outputs the number of elements in the list. Remember string is an array and len function returns the number of characters in the string. E.g. len(firstname) — it return the length of the firstname string.

Syntax of a function:

def function_name(arguments):
<function body>

def is a keyword that says that you are defining a function.

function_name — you define this, usually the name reflects what the function does. For example, the function calculate_tax is likely to have the business logic to calculate tax.

arguments — these are input parameters; a function may take zero or more arguments. E.g. len() takes 1 argument.

function_body — these are regular python statements — notice that they should be indented from the function name so that Python knows that the code belongs to the function. Typically, we use 4 spaces as indentation.

A function may or may not return a value. For example, len() function returns the length of the input parameter whereas print() function does not return anything; it simply prints the input parameter to the screen. (If you try to assign the output of print function to a variable, variable will be empty.)

The above code defines a simple function called add. It takes two arguments x and y.

In the function body (lines 3), we compute the addition of the two input arguments. Since it is a mathematical operation, the function expects numerical input parameters.

In line 4, we return the computed value. Note that return is a Python reserved word specifically for returning a value out of a function.

In line 7, we use the function by passing 10 and 20 as input argument.

Line 8 prints the result of 30.

In line 11, we reuse the function by passing different arguments of 5 and 9. This time, it prints 14.

Q. Can you write a function called sub that takes two arguments and returns the subtraction of the two arguments?

As mentioned earlier, we would like to reiterate the fact that a function may not always return a value.

def foo():
print("This function does not return any value")

Calling the above function on the python interactive prompt:

>>>foo()
This function does not return any value

Notice that it simply prints something on the screen. Printing on screen is not returning. There should be a return statement in the function for it to return something out of it.

Conditions

When you use conditions, you make a decision between two code paths.

if-else control block

As shown in the above diagram, if the condition is true, Python executes the “if code” block, otherwise it executes the “else code” block.

The above code snippet also uses another language construct we already learned — functions.

So we have:

function
|
|________ if-else block inside the function

Can you guess what would happen if you call, test_condition(1)?

>>>test_condition(1)
SUCCESS

The above function has an if-else block of code. If x is equal to 1, it prints “SUCCESS”, otherwise “FAILED”.

The condition is called a Boolean expression. That is because it evaluates to either True or False.

If True, if-block is executed.

If False, else-block is executed.

Now what would happen if call test_condition(5)?

Yes, you guessed it right, it prints FAILED.

>>>test_condition(5)
FAILED

Some example conditions (Boolean expressions) :

x == y → check if the value of x is equal to the value y

x > y → check if x is great than y

name == “Tim” → check if the name is equal to Tim

len(name) == 5 → check if the length of the name string variable is 5

Q. Write a Boolean expression to check if x is less than y.

Loops

In a Python program, usually the code is executed sequentially from top to bottom. However, if you want to run a block of code repeatedly, you need to use a loop.

There two main kinds of loops:

  • While loops
  • For loops

Let’s have a shot at each of them

While loops: It has the following syntax:

while (condition):
code-block

While the condition is True, the code-block gets executed repeatedly.

The illustration of a while loop

Let’s look at an example:

x = 0
while x < 5:
x = x + 1
print(x)

Can you guess the output of the above code snippet?

How does the code work?

The condition is x < 5 — as long as x is less than 5, the condition is True and hence Python executes the code block inside the while loop.

When the control first enters the while loop, x is 0, the condition is satisfied. The first line in the while code block, increments the value of x by 1 (so it become 1). Then it prints x (which is 1).

Now the control goes back to the beginning of the while loop. Python evaluates the condition. Now x is 1. Since 1 is less than 5, the condition is satisfied.

Next, it again increments the value by 1 — so now x is 2.

It then prints value 2.

It continues like this. At some point, inside the while loop, x gets incremented to 5. It prints 5 inside the loop. Then, it goes back to the top of the while loop. Now x is 5, 5 is not less than 5, the condition is not satisfied and Python does not execute the while loop anymore.

So, the output is:

1
2
3
4
5

Can you guess the output of the following code?

x = 10
while x > 0:
print(x)
x = x - 2

Hint: It prints 5 numbers.

For loops:

There is another popular type of loop called “for-loop”. The syntax of for loop is as follows:

for <a collection value> in <some collection>:
code-block

Don’t worry if you don’t understand what’s written above let’s understand it through an example.

Before, that I would like to introduce you to the Python built-in function range. range() creates a sequence of numbers. Let’s look at some examples.

range(1, 5)

This creates the sequence of 1, 2, 3, and 4 (notice that 5 is excluded).

Another example:

range(0, 10, 2)

The third parameter is the step size. The default step size is 1. This prints 0, 2, 4, 6, and 8.

We can use this to write the same while loop we wrote earlier using for loop.

for x in range(1, 6):
print(x)

For the second while-loop example:

for x in range(10, 0, -2):
print(x)

The for-loop is frequently used to do much cooler stuff in Python. We will explore some later in this tutorial.

Libraries

Libraries provide functionality that you can use out of the box. In fact, Python became so popular recently mainly because of the availability of a vast number of libraries. You name what you want to do — I bet there is a Python library out there that make your life easier.

For example, let’s say that you want to read command line arguments. There is a standard library called sys that help you do that.

First, you need to import the library to your program and then use its functions. Assume that you are creating a file named first.py and you want to print what is passed as the first command line argument.

import sysfirst_argument = sys.argv[1]
print(first_argument)

Now you run this program as follows:

python first.py Hello

sys.argv[1] stores the value of the first argument, which is “Hello”. So, your program will print Hello on the screen.

As you become familiar with Python, you will become friends with many python libraries such as pandas, matplotlib, numpy, scikit-learn, and so on.

Now let’s write a program to take two command line arguments: second.py

import sys
first_argument = sys.argv[1]
second_argument = sys.argv[2]
print("{} {}".format(first_argument, second_argument))

Now you are running the code as follows:

python second.py Hello World!

Any guess about the output? Yes, you are right; it prints “Hello World!” on the screen. The first argument is “Hello” and the second argument is “World!”.

Data Structures: List

A Python list holds a list of values — can be of different type. For example, integer and string values together. You can add or remove items from the list after you create it.

Examples:

# A list with values
names = ["Alice", "Bob", "Eve", "Tim", "Malory"]

You can create an empty list as follows:

names = list()

Append to a list, which adds the new item to the end of the list.

names.append("Thomas")
names.append("Mary")
print(names)

Output is:

["Thomas", "Mary"]

Iterate through a list:

names = ["Thomas", "Mary", "Tim"]
for name in names:
print("Student: {}".format(name))

The output is:

Student: Thomas
Student: Mary
Student: Tim

Notice how we write the for loop. The loop picks each item and assigns it to the variable called “name” and you can use that variable inside the loop to write your business logic (in this case, simply printing on the screen).

Another example:

# A list of numbers
ages = [25, 30, 22, 26, 32, 35, 31, 22, 27]

How can we compute the average age?

We need to sum up all values and divide by the number of entries. We can get the number of entries from len(ages) function.

count = len(ages)
sum = 0
for age in ages:
sum = sum + age
avg = sum / count
print("Average age is {}".format(avg))

The output is

Average age is 27.77777777777778

Let me use the following code snippet (same as above, but with line numbers) to explain what is happening here.

Line 1: we define a list called ages and initialize with all age values.

Line 2: Count the number of entries in the list ages and store in the variable called count. len is a built-in Python function.

Line 3: Initialize a variable called sum to zero. We are going to use this variable to sum up all age values.

Line 4: For loop header code — we pick each entry from ages and assign it to the variable called age.

Line 5: Add the current value of age to the variable sum in each iteration

Line 6: Now we are outside the for loop. We simply take the average.

Line 7: Print the average to the screen.

Q. Can you write a for loop to find the maximum age?

Data Structures: Set

A Python Set is very similar to List, except that a set can only hold unique values, no duplicates like in lists.

# set with initial values
coin_sides = {"Head", "Tail"}# an empty set
colors = set()

Set and List side by side:

>>> myset = set()
>>> mylist = list()>>> myset.add("Apple")
>>> mylist.append("Apple")>>> print(myset)
{'Apple'}>>> print(mylist)
['Apple']

What would happen if we add “Apple” again to both? Any guesses?

>>> myset.add("Apple")
>>> mylist.append("Apple")
>>> print(myset)
{'Apple'}
>>> print(mylist)
['Apple', 'Apple']

Yes, you got the idea — set keeps only one copy of each different values whereas list keeps all of them.

>>> myset.add("Orange")
>>> mylist.append("Orage")
>>> print(myset)
{'Orange', 'Apple'}
>>> print(mylist)
['Apple', 'Apple', 'Orage']

Looping is exactly similar to how you loop lists:

for fruit in myset:
print(fruit)

The output is:

Orange
Apple

Data Structures: Dictionary

A Python dictionary is another popular data structure that allows you to store key-value pairs.

students = {"Alice":10, "Tim": 5}

students is a Python dictionary with two entries. Keys are “Alice” and “Tim”, and values are 10 and 5.

Another way to create the dictionary:

students = dict()
students["Alice"] = 10
students["Tim"] = 5

How to iterate a dictionary:

for key, value in students.items():
print("Key = {} Value = {}".format(key, value))

You get the following output:

Key = Alice Value = 10
Key = Tim Value = 5

Another way to iterate a dictionary (Note this method is less efficient compared to the above, so discouraged)

for key in students:
print("Key = {} Value = {}".format(key, students[key]))

You get the same output as above.

File I/O

As a data scientist, you read from and write to files pretty often. We are going to show basic file operations now.

Say that we have a file called “data.csv” and it has the following comma separated values:

Alice,10
Tim,5
Mary,8

Line 1: open file named “data.csv” for reading (“r” is for reading)

Line 3: Iterate through each line in the file — each line gets copied to the variable called “line”

Line 4: split the value in line variable by “,” — this creates an array of two elements; we store that array in the variable called values

Line 5: We print the two values

Line 7: We close the file.

The output looks as follows:

Name = Alice Age = 10
Name = Tim Age = 5
Name = Mary Age = 8

The following is another way of writing the same code using the keyword “with”. Notice that we don’t call close() in this case.

Now let’s write to a file. Say, that we want to write the following content to a file.

Apple
Orange
Grape

Line 1: We keep the content we want to write in a list called fruits.

Line 3: Open file named “fruits.txt” for writing. “w” is for writing.

Line 5: Loop through each fruit in fruits list.

Line 6: Write each fruit. Notice that we have added a new line character “\n” to each write to make sure that each fruit is written to a new line.

Line 8: After we are done writing, outside the loop, we close the file.

That’s all for today. I hope that you enjoyed our write up and learnt the basics of python.

It is important for you to practice writing your own code to get a grasp of Python (or any other programming language).

With this basic knowledge, I hope that you can understand numerous code snippets available on the Internet.

If you like the post, please do follow me. It helps me reach a wider audience.

Happy Coding!

References

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Digest

Digest

59 Followers

One Digest At a Time. I value your time! #datascience #dataanalyst #datascientist #probability #statistics #ML #AI #savetime #digest