Programming Languages#
This chapter provides a general introduction to programming languages and then focuses on a particular language: Python. The historic hero introduced in these notes is Grace Hopper. She was the first programmer of the Harvard Mark I computer. She was responsible for the development of some of the first programming languages.
Note
The slides for this chapter are available within the book content.
Historic hero: Grace Hopper#
Grace Brewster Murray Hopper (depicted in Fig. 17 was a computer scientist. She was the first programmer of the Harvard Mark I, i.e. a general-purpose electromechanical computer. The Harvard Mark I was used during the Second World War, fully-inspired by Babbage’s Analytical Engine. She pushed for the need of having machine-independent programming languages. This idea brought her in the development of COBOL, one of the first high-level programming languages, which is still used today for some applications.
COBOL (i.e. the common business-oriented language) is a programming language designed for business use. It brings a quite extensive use of English terms for describing the operations of a program. For the very first time, the idea of adopting English for commands made the programming language a bit more verbose and more readable and even self-documenting. Just for making an example, in today’s languages if we want to compare if the value assigned to a variable x
is greater than the one assigned to another variable y
we should use x > y
. In COBOL, the same comparison is made with the following instruction: x IS GREATER THAN y
.

Fig. 17 Portrait of Grace Hopper. Picture by James S. Davis, source: Wikipedia.#
A brief history of programming languages#
After the Second World War, people have developed several programming languages according to several design principles and intended usage in terms of the computational problems to be solved. While all of them, in principle, make it possible to develop solutions for any solvable computational problem, some of them are more suited for a specific domain than others. For instance, COBOL has been developed for business applications, while FORTRAN was designed to deal with scientific computing.
While an extensive analysis of all the programming languages is out of the scope of the topics of this book, it is worth mentioning, at least graphically, a timeline of their evolution, shown in Fig. 18. As highlighted in the timeline, we will introduce and use a particular programming language in this course, i.e. Python, according to its third version released in 2006.

Fig. 18 A graphic timeline summary of some of the main programming languages from 1954 to 2017. The different line colour is used only for readability reasons, and it does not have any particular meaning.#
Python#
Python is a high-level programming language for general-purpose programming. It is currently one of the most used languages for programming on the Web for Data Science and Natural Language Processing tasks. The good thing about Python is that it is one of the most simple languages for starting to learn how to program and create software.
In this course, we will use Python in its latest version, i.e. Python 3. Luckily, there are a lot of resources freely available online for learning this language from scratch, such as:
the introductory book Dive into Python 3 [Pil09];
the official documentation of the language;
an online platform for playing with Python 3 without installing any software on your computer;
an interactive online course for learning Python from scratch;
a book introducing all the basic Python features, which is particularly suited for Digital Humanities [Tag18];
another book entirely dedicated to problem solving and algorithms developed in Python [MR11];
a digital book that contains an introduction to Python for the Humanities.
The goal of this chapter is to develop our first algorithm in Python. The algorithm we produce is the one we have introduced in Chapter Algorithms, that can be described informally by the following natural language text:
Consider three different strings as input, i.e. two words and a bibliographic entry of a published paper. The algorithm must return the number 2 if the bibliographic entry contains both words; the number 1 if the bibliographic entry contains only one word; the number 0 otherwise.
First incomplete version, in Python#
In Python, we can create a new algorithm by implementing a new function. We can introduce a function using the keyword def
(which stands for define). The keyword def
must be followed by a name (e.g. the name of the algorithm) and a comma-separated list of input parameters between round brackets. For instance, def contains_word(first_word, second_word, bib_entry)
defines the function contains_word
, which takes three parameters as input.
Each function definition is followed by “:
” and all the instructions to execute must be specified in the following lines, as an indented block (preferably using four spaces), as illustrated in Listing 10. The name of a function and its parameters cannot contain space characters and must always start with a letter – e.g. this_is_my_parameter
is correct, while 1_parameter
is not.
1def contains_word(first_word, second_word, bib_entry):
2 ...
3 ...
4 ...
In this first version of the algorithm, we would like to introduce only some basic constructs of Python. To this end, we provide only a partial solution in this subsection, which we finalise in the following subsections, following the same strategy used in Chapter Algorithms. In particular, we want to say that if the bibliographic entry contains the first input word, the number 1 is returned; otherwise, a 0 is returned. Listing 11 shows this incomplete version of the algorithm in Python.
1def contains_word(first_word, second_word, bib_entry):
2 if first_word in bib_entry:
3 return 1
4 else:
5 return 0
In this incomplete version, there are already specified some important constructs of Python. The first one is the if-else conditional block. This kind of block allows one to execute a particular instruction if a condition is true (the if
statement). Otherwise, if the condition specified is false, an alternative set of instructions is executed instead (the else
statement). We can avoid specifying the else
clause if no alternative set of instructions is needed. The instructions to perform in one case or the other are within indented sub-blocks (again, four additional spaces). As already introduced in Listing 11, every time we have to add a new block of instructions, we need to use :
after the statement of interest, as shown in Listing 12.
1if <condition>:
2 ...
3 ...
4else:
5 ...
6 ...
The condition specified in the if
statement shown in Listing 11 allows one to check if a certain string is contained in another one using the command “in”. In particular, <string1> in <string2>
would be true if the "Peroni"
, "Osborne"
, and "Peroni, S., Osborne, F., Di Iorio, A., Nuzzolese, A. G., Poggi, F., Vitali, F., Motta, E. (2017). Research Articles in Simplified HTML: a Web-first format for HTML-based scholarly articles. PeerJ Computer Science 3: e132. e2513. DOI: https://doi.org/10.7717/peerj-cs.132"
are all strings.
Note that "Peroni"
in "Peroni beer"
, or variables referring to strings, as shown in Listing 11. A variable is a symbolic name that contains some information referred to as a value (e.g. first_word). For instance, any input value is, in fact, a particular kind of variable. As defined previously, all the input parameters of the algorithm are expected to refer to strings.
The last construct of the partial algorithm introduced in this subsection is the return
statement. It is defined by specifying the token return
followed by the value (or the variable containing a value) that must be returned. The execution of a return
statement concludes the algorithm execution. Thus, all the instructions that follow that statement are not processed anymore. In the example in Listing 11, two different numbers are returned, depending on the execution of a particular branch of the if-else block. In particular, the algorithm returns a 1 if the condition of the if
statement is true, while it returns a 0 otherwise. Python permits to write any number as it is – e.g. 42
and -42
for positive/negative integers, 1.625
and -1.625
for positive/negative decimals. Note that strings and numbers are distinct kinds of objects – e.g. the string "42"
and the number 42
(without the quotes) do not define the same value.
Complex boolean statements#
The original text of the algorithm, introduced at the beginning of Section Python, needs to condition to be true for returning a 2. Indeed, the bibliographic entry must contain both words. This can be defined using a hierarchy of if-else blocks in Python, as shown in Listing 13.
1if first_word in bib_entry:
2 if second_word in bib_entry:
3 return 2
4 else:
5 return 1
6else:
7 if second_word in bib_entry:
8 if first_word in bib_entry:
9 return 2
10 else:
11 return 1
12 else:
13 return 0
However, the readability of the previous example is rather difficult since it repeats several times the same conditions, even if they have been specified in a different order. Thus, Python makes available some operations for assessing compositions of multiple boolean values and for deriving boolean values from number and string comparisons. A boolean value (or, directly, boolean)[1] can be only one of two distinct and disjoint values, True and False. For instance, the condition first_word in bib_entry
returns a particular boolean: True if the bibliographic entry contains the word, False otherwise. We use boolean values in algorithms (and in any programming language) to organise conditional block execution flow.
Sometimes it is useful to combine somehow two distinct boolean values to simplify the conditional blocks’ organisation. This can be done by using specific operators that apply to one (<operator> <B1>
) or two boolean values (<B1> <operator> <B2>
), and return a new boolean value. These operators are called logical not (not
in Python, which applies to one boolean value only), *logical and *(and
, between two boolean values), and logical or (or
, between two boolean values). They are logical operators since they derive from the logic Boole proposed in his works on Boolean algebra. Table 3 summarises their use and shows the truth table of the application of such operators. In particular, given two boolean input values, B1 and B2, the table shows the result of all their possible combinations according to the specific operator. Thus, in the example in Listing 13, we could return a 2 if the bibliographic reference contains both the strings, expressing this constraint in one condition only, i.e. first_word in bib_entry and second_word in bib_entry
.
B1 |
B2 |
not B1 |
B1 and B2 |
B1 or B2 |
---|---|---|---|---|
True |
True |
False |
True |
True |
True |
False |
False |
False |
True |
False |
True |
True |
False |
True |
False |
False |
True |
False |
False |
Round brackets can be used for grouping boolean operations, e.g. (True and False) or False
applies the and
operation first, and the result is used as the first value of the or
operation – given False as a result. If there are no brackets, the application order proceeds as follows. First, one must execute all the not
operations. Then, one must perform all the and
operations. Finally, one must assess the remaining or
operations. For instance, True and not False or False
returns True since it is interpreted as (True and (not False)) or False
.
S1 |
S2 |
S1 < S2 |
S1 <= S2 |
S1 > S2 |
S1 >= S2 |
S1 == S2 |
S1 != S2 |
S1 in S2 |
S1 not in S2 |
---|---|---|---|---|---|---|---|---|---|
|
|
True |
True |
False |
False |
False |
True |
False |
True |
|
|
False |
True |
False |
True |
True |
False |
True |
False |
In addition to the aforementioned boolean operations, it is also possible to use string comparisons for obtaining boolean values. Table 4 shows all the comparisons that one can apply on two strings, i.e. <S1> <operator> <S2>
. In this case, the operators are those typically used numerical comparison, i.e.:
<
, less than;
<=
, less than or equal to;
>
, greater than;>=
greater than or equal to;
==
, equal to;
!=
, different from;in
, included in;
not in
, not included in.
In the case of strings, a string S1 is less than another string S2 if the former precedes the latter according to a pure alphabetic order. Of course, Python uses the alphabetic order for assessing when a string is greater than another one.
Note that we can use similar operators (excluding in
) for comparing numbers, as shown in Table 5. In this case, the standard mathematical numeric comparisons hold.
N1 |
N2 |
N1 < N2 |
N1 <= N2 |
N1 > N2 |
N1 >= N2 |
N1 == N2 |
N1 != N2 |
---|---|---|---|---|---|---|---|
3 |
4 |
True |
True |
False |
False |
False |
True |
4 |
4 |
False |
True |
False |
True |
True |
False |
Thus, we can reuse these boolean operations to rewrite the if-else blocks shown in Listing 13 more understandably. Finally, Listing 14 shows the result.
1if first_word in bib_entry and second_word in bib_entry:
2 return 2
3else:
4 if first_word in bib_entry or second_word in bib_entry:
5 return 1
6 else:
7 return 0
Conditional statements with multiple branches#
While in the previous subsections we have improved the readability of the if-else blocks, Python allows us to do even better. First of all, in the two if statements in Listing 14, we ask Python to evaluate the same sub-conditions (i.e. first_word in bib_entry and second_word in bib_entry
) twice. This can be easily avoided by defining new variables. A variable is defined by specifying its name (without spaces), followed by an =
and the value to associate to it, i.e. <variable_name> = <variable_value>
. The value can be specified directly (e.g. a number) or indirectly by using other variables or even complex operations.
In our example, we could create two variables, called contains_first_word
and contains_second_word
, assigned to the boolean returned by the string comparisons mentioned above, i.e. first_word in bib_entry
and second_word in bib_entry
, respectively. In that way, we can reuse such variables in the two if statements, as shown in Listing 15.
if
statements are specified using two variables.#1if contains_first_word and contains_second_word:
2 return 2
3else:
4 if contains_first_word or contains_second_word:
5 return 1
6 else:
7 return 0
We can improve further the readability of the code by collapsing occurrences of else statements when these contain an if
statement. In this case, both the else-if pair can be safely replaced by an elif
(i.e. else if) statement, which specifies the same condition used in the if statement. Thus, the code in Listing 15 can be rewritten, as shown in Listing 16.
1if contains_first_word and contains_second_word:
2 return 2
3elif contains_first_word or contains_second_word:
4 return 1
5else:
6 return 0
Final algorithm#
In this chapter, we have seen some initial constructs that Python makes available for developing an algorithm. In particular, we have introduced how to define a function with input parameters, variables, conditional statements (i.e. if
, elif
, and else
), string, numeric, and boolean values, boolean operations and string and numeric comparisons. All these constructs enabled us to define our algorithm, which is finally introduced in Listing 17.
1def contains_word(first_word, second_word, bib_entry):
2 contains_first_word = first_word in bib_entry
3 contains_second_word = second_word in bib_entry
4
5
6 if contains_first_word and contains_second_word:
7 return 2
8 elif contains_first_word or contains_second_word:
9 return 1
10 else:
11 return 0
It is worth mentioning that the algorithm proposed initially in Chapter Algorithms as a flowchart does not map with the one presented in Listing 17. This misalignment has been done to explicitly show that it is entirely possible to develop two different algorithms for addressing the same computational problem.
As a final note, and in addition to using the Python interpreter installed on your machine (in any), several Web applications have been developed to test your Python code. Often, they show which kinds of objects Python creates when running. One of these tools, i.e. Python Tutor, is very helpful for people approaching Python for the first time. Indeed, it allows one to see what happens as the (electronic) computer runs each line of code.
Test-driven development#
Different development strategies can be adopted when one wants to understand whether the piece of software he/she has developed is correct or not – i.e. if it is returning the expected result. One of the most used and practical methods used by programmers is the Test-Driven Development (or TDD) [Bec03], summarised in Fig. 19.
In practice, when one has a computational problem to solve, and he/she needs to develop a piece of software to address it, the first thing to develop is a test to check if the software that eventually will be developed behaves correctly (i.e. returns the correct result) or not. Usually, such a test is software that must be designed to test the correctness of another software.
Writing a test before starting to develop software allows one to focus on the problem one has to solve and on the requirements of the software from the very beginning. This approach is also practical when one decides to extend an existing software. In this case, first, one has to develop the test for assessing the correctness of such a new extension. Second, one needs to write the extension and check if the extended software passes the new test.

Fig. 19 A diagram summarising the steps of the test-driven development approach. Picture by Xarawn, source: Wikipedia.#
Write a new test – once understood the computational problem to solve and the related requirements, a new test is written and then added to a collection of previously developed tests.
Run all the tests – we run all the tests available in the collection mentioned above. If the new test fails, there is no code available that addresses the particular computational problem described by the test. Thus, the test fails in the first iteration of the test-driven development since no code has been developed yet.
Write the new code – in this step, we develop a new code to pass the test just added to the collection.
Run all the tests again – in this passage, one checks if the addition of such new code developed to address the new test has not broken the other features already designed and tested by all the other tests available in the collection (in any). If any test fails, the new code must be corrected until all the tests are passed successfully.
Refactor the code – after several iterations of the process, the code grows naturally, and it may be necessary to refactor as to clean the code as much as possibleto guarantee its readability and maintainability in the long term. As a suggestion, every refactoring action should be checked by re-run all the tests available to be sure that a modification to the code does not break its correctness.
Following this approach to development is very useful when one has to implement a particular algorithm in Python. It enables one to check its correctness according to different kinds of input that can be used to run the algorithm itself. Listing 18 shows a plausible test to verify the accuracy of the algorithm introduced in this chapter.
contains_word
code, introduced in Listing 17.#1def test_contains_word(first_word, second_word, bib_entry, expected):
2 result = contains_word(first_word, second_word, bib_entry)
3 if expected == result:
4 return True
5 else:
6 return False
It is possible to use such a test function to test the contains_word
code with different kinds of input values and related expected results. For instance, Listing 19 shows the test code, the algorithm’s code presented in this chapter, and some checks done by running the test code (and thus the algorithm itself) with different input values. Finally, the result of the various inspections is printed on screen by using the Python function print()
.
as part of the material of the course
.# 1# Test case for the algorithm
2def test_contains_word(first_word, second_word, bib_entry, expected):
3 result = contains_word(first_word, second_word, bib_entry)
4 if expected == result:
5 return True
6 else:
7 return False
8
9# Code of the algorithm
10def contains_word(first_word, second_word, bib_entry):
11 if first_word in bib_entry and second_word in bib_entry:
12 return 2
13 elif first_word in bib_entry or second_word in bib_entry:
14 return 1
15 else:
16 return 0
17
18# Three different test runs
19print(test_contains_word("Shotton", "Open",
20 "Shotton, D. (2013). Open Citations. Nature, 502: 295–297. doi:10.1038/502295a", 2))
21print(test_contains_word("Citations", "Science",
22 "Shotton, D. (2013). Open Citations. Nature, 502: 295–297. doi:10.1038/502295a", 1))
23print(test_contains_word("References", "1983",
24 "Shotton, D. (2013). Open Citations. Nature, 502: 295–297. doi:10.1038/502295a", 0))
tdd-template-full-impl
shows the final Python implementation of the algorithm and the related test for the example mentioned above.
Succeed: check if the Python code returns the correct output#
Finally, we should test the Python implementation of the algorithm according to the tests developed in step “Fail”. If all the output values returned by running the Python tests comply with the expected results, we have finished. Otherwise, if some execution returned an unexpected output, we need to go back to the previous step and change something in the Python implementation of the algorithm.
All the tests introduced in tdd-template-full-impl
are passed as expected. It is possible to use Python Tutor to see a complete execution of such code.
References#
Kent Beck. Test-driven development: by example. The Addison-Wesley signature series. Addison-Wesley, Boston, MA, USA, 2003. ISBN 978-0-321-14653-3. URL: https://archive.org/details/est-driven-development-by-example/test-driven-development-by-example/.
Bradley N. Miller and David L. Ranum. Problem solving with algorithms and data structures using Python. Franklin, Beedle & Associates, Wilsonville, OR, 2. ed edition, 2011. ISBN 978-1-59028-257-1. URL: https://runestone.academy/ns/books/published/pythonds/index.html.
Mark Pilgrim. Dive into Python 3. The expert's voice in open source. Apress, New York, NY, USA, 2009. ISBN 978-1-4302-2415-0. URL: https://diveintopython3.net/.
Lisa Tagliaferri. How to code in Python 3. DigitalOcean, New York, NY, USA, 2018. ISBN 978-0-9997730-1-7. OCLC: 1020289950. URL: https://assets.digitalocean.com/books/python/how-to-code-in-python.pdf.