Home

Programming Fundamentals


Please note that any code on this page is pseudocode, meaning it won't work in any actual programming language.

Binary Numbers

Binary numbers are so important for the study of computer science that I put them on their own special page here. If you already know how binary numbers and binary operations work, feel free to skip that section. If not, please read that page before continuing with this one, because this one will assume that you understand them. I suggest you read it just to make sure that you haven't missed any information.

Algorithms

Computers, in their most basic form, are machines that follow sets of instructions which we give them, which we call algorithms. That's all they are! Almost anything can be a computer: for examples see this video about a computer made of a marble dropper or this video about a computer made in the card game Magic the Gathering. Algorithms form the basis for modern computer science.

There are many types of algorithms, and they all serve different purposes. For example, there are algorithms which add two numbers, or compare them to determine which is bigger, or sort lists of names alphabetically, or determine which pages should show up first in a Google search. There can be many different algorithms that achieve the same thing, but depending on the specifics of the application, we might choose one over the other. If we value speed, we might pick the fastest algorithm. If we value being able to run it on many computers, we might choose the least resource intensive algorithm. This would be the one that uses up the least amount of processing power and/or memory.

At the basis of an algorithm is the idea of the manipulation of data.

Primitive Data Types

Data is information; data can be a number, or a true or false value, or a word, or a binary number. Data has two components: the value of the data itself and the metadata of the data, which refers to information about the data, such as when it was created or what it represents or who created it. We can categorize the values of data into several datatypes.

Firstly, there is the Boolean data type, which has only two possible values: True or False. It can also be represented by a single bit, meaning a 1 or 0 value, where 1 is True and 0 is False.

In terms of integer values (...-3, -2, -1, 0, 1, 2, 3...) there are bits, nibbles, bytes, shorts, ints, and longs. The difference between all of these is the amount of data which can be stored, ie the limits of what can be represented in each type. Additionally, because each of these are defined only by their size in bits (1s or 0s in the binary representation of the number), they can hold either unsigned (only positive) or signed (negative numbers using Two's Complement or some other representation scheme) numbers.

- A bit can only hold a 0 or a 1, making it equivalent to a boolean.
- A nibble (4 bits) can hold 16 values, either 0 to 15 if unsigned or -8 to +7 if signed.
- A byte (8 bits or 2 nibbles) can hold 256 values, either 0 to 255 unsigned or -128 to +127 if signed.
- A short (16 bits or 2 bytes) can hold 65536 values, so 0 to 65,535 unsigned or −32,768 to +32,767 signed.
- A signed int (32 bits, or 4 bytes) holds a value within the bounds of −(231) to 231 − 1.
- A signed long (64 bits or 8 bytes) holds a value within the bounds of −(263) to 263 − 1.

Another datatype is a decimal number, such as 5.327, normally referred to as a floating-point number. These numbers are stored in a similar way to scientific notation, and are sometimes called doubles.

An important datatype is the 'char' datatype, which stands for a single character. Computers can only deal in binary, so relating a character to a binary representation of it is very important; this is called character encoding. Some examples of character encoding from the pre-digital age are Morse code or signal flags used in ancient times (from Wikipedia). Right now, the two most common character encodings are ASCII and Unicode, although Unicode is steadily gaining traction because of its greater ability to represent international symbols not found in the English language.

Take a look at the ASCII table on this page. ASCII uses 7 bits to represent the numbers 0 - 127, which each have an associated symbol. For example, lowercase letter 'a' is represented in ASCII by the number 97. You'll also see that at the beginning are some symbols which seem to be rather arbitrary combinations of capital letters. These are actually non-character symbols, such as a newline or a tab or an escape character, which don't have a symbol but are important to be able to represent. If you are wondering, the HEX column on the table is referring to the hexadecimal (Base 16) representation. Just like we can represent numbers in base 2 and base 10, we can represent them in any base, and base 16 happens to be very useful for computer programming.

Moving on to Unicode, you may have seen 'UTF-8' or 'UTF-16' or 'UTF-32' at some point on a computer. These are all Unicode standards, where the number after the 'UTF-' refers to the number of bits used. UTF-8 encoding uses 8 bits, UTF-16 uses 16, etc. Understanding that a character/symbol can be represented by a number according to an encoding standard is an important concept.

Data Structures

The datatypes that you just read about are often called primitive data types, because they are native or 'built-in' to many programming languages. Different programming languages may differ slightly in terms of what datatypes are native or not, but those types that are above are built-in to most languages. Now, we're going to turn to data structures, which allow us to organize data so that it can be stored and retrieved. Datastructures can have different organizational schemes, which is primarily what sets different datastructure types apart. Some data structures are primitive in some languages, while in others, they aren't. It really depends on the language. For now we'll deal with datastructures which use a key-value organizational scheme, where a value in the datastructure is correlated to a key which 'points' to where the value is stored.

The first data structure we'll look at is an array. Arrays are sometimes called lists or vectors, or matricies if they are multidimensional (we'll get to this later). To understand arrays, let's construct an analogy. An array is like a table with a placard on it. The placard is the name of the array - it's a piece of metadata about the array. The table has multiple cups in a neat, orderly row on it, each labeled with a unique number, starting from 0 and then increasing from there. Let's suppose that there are 4 cups, labeled 0, 1, 2, and 3. Inside each cup we can put a primitive data type, such as a number stored as an int, so for example the 0th cup can have the number 5, the cup labeled '1' can have the number 7, the cup labeled '2' can have the number 3, and the cup labeled 3 can have the number '-13'. Now we can say that we have an array of ints, or an int array. In some languages, you are allowed to mix what types are in each cup, but for now we will assume that a valid array must have the same data type in every filled cup.

Index: 0 1 2 3
Value: 5 7 3 -13

Let's define some terminology: firstly, an element in the array is a value that we put in a cup. Each element has an index, which basically refers to the number on the cup it resides in. Arrays in most languages are 0-indexed, which means their index numbers start at 0 rather than 1. This might seem confusing in the beginning, but if you live with it for long enough it will become second nature to refer to the first cup as the 0th index. Arrays are ordered, which means what it says: there is a 'first' element, a 'second' element, all the way up to a 'last' element. The length/size of an array is the number of cups we have; it is the number of spaces we have to put elements. Notice that because arrays are 0-indexed, the very last element has an index of the array's length minus one.

As you can see in the above image, the array index is the key, while the value which the key refers to is quite simply the value. To refer to any element in the array, we can use the index to retrieve or modify the correlated value. Thus it is a simple key-value datastructure.

At this point I would be remiss to not talk about strings. A string is a type of array, but it is so special that in some languages (such as Python) it counts as a primitive data type. A string is an array of chars, which basically means that it is a word or collection of words. For example, "Hello World" is a string and an array of characters. We would say that the length of the string is 11 elements (characters) long, and it occupies indices 0 to 10. The space character would have an index of 5. Please note that when using arrays, the term 'subscript' rather than 'index' is sometimes used; this has nothing to do with subscript text in wordprocessing.

Index: 0 1 2 3 4 5 6 7 8 9 10
Value: H e l l o W o r l d

Now, let's talk about dictionaries, also known as maps. Like the name implies, these map a key to a value. The difference between an array and a dictionary is that in an array, the keys have to be integers, and the values have a specific order; in a dictionary, however, the keys are characters or strings. For example, we could have a map/dictionary with a set of key-value pairs like this:

Key Value
"First Name" "John"
"Last Name" "Smith"
"Age" "32"
"City" "Seattle"
"State" "Washington"

They don't have a specific order; we refer to them not using an integer index, but we using the string key. Using our cups-on-a-table analogy, the cups are now settled on the table in no particular order, and they are labeled not with a number but with a word (really a string or a character).

Variables

Now that we've established what data is, we should ask how we are to manipulate it. To do this, we have to use the concept of a variable; variables allow us to store, modify, and compare the values of data in our programs. A variable normally must be 'declared', meaning we tell the computer what the name of the variable is (how we will refer to it in our code) and what datatype it will be. This lets the computer know in advance how much space to reserve for it: if we declare a byte type variable called x, it knows it only needs to allocate 8 bits of memory for it. In some languages, however, we don't even need to tell the computer what type a variable will be: it will figure it out from context. In Java, I might declare a variable which stores the number of cats like so:

int numCats;

Now, the computer knows that the variable numCats will be an int type (32 bit integer) variable, and can allocate the necessary space for it.

The next logical thing to learn to do is to set the value of a variable. We do this with the assignment operator, which in most languages is '='. We can either do variable assignment after the declaration, or at the exact same time. As another example in Java, we could have the following code:

int x;
x = 5;

or we could have:

int x = 5;

Either would be acceptable ways to declare the integer x and assign it to 5. The only difference is that the first way uses two statements, which are a single instruction or line of code, while the second way only uses one. Computer code is essentially made up of many many individual statements, which each do one small thing. The computer executes (meaning it conducts the actions the statement directs it to) the statements starting from the top of the program, going statement by statement until it reaches the end, barring special structures like conditionals and loops, which we'll learn about later. Together, these small individual actions combine as algorithms which can do powerful things. Different programming languages use different syntax, which is like the grammar and punctuation of programming, to indicate what is a statement, but the concept remains the same. Finally, note that when using the assignment operator, the value on the right is assigned to the variable on the left.

Operators

Above, we learned about the assignment operator '=', which allows us to set the value of a variable. Now, we will want to manipulate that value, and we'll do so using operators. To simplify a bit, an operator is essentially an instruction for what to do with some number of operands. A binary operator acts on two operators, while a unary operator acts on only one. In programming, there are a few common operators which fall into the following categories: arithmetic, relational, logical, unary, and bitwise/bitshift. I'm going to skip the bitwise/bitshift operators right now, because I already covered them in the binary numbers page.

Arithmetic Operators

Whether you recognize it or not, you already know the arithmetic operators. These are operators such as addition (+), subtraction (-), multiplication (*), and division (/). I will assume that you already know how these work. You may or may not know the last one, modulus (%), though, so I will explain it.

One way to think of modulus is as the 'remainder operator'. Just as the division problem 13 / 4 will yield 3.25, 13 % 4 ('thirteen modulus 4') yields 1, because the remainder was 1. If a number divides evenly into another number, the modulus will be 0: 42 % 7 = 0 because 7 * 6 = 42; in other words, because 42 divides evenly by 7 to yield 6 and 0 left over, the modulus is 0. Similarly, 12 / 7 = 1 remainder 5, which is equal to 1 and 5 sevenths. If we were to do 12 % 7, the result would be 5, because the remainder of the division problem is 5.

The other way to think of modulus is where numbers 'reset' after reaching a certain value, or 'wrap around'. For example, if I wanted to do 12 modulus 7, I would start counting up from 1 to 12, but when I run across 7, I reset to 0. My counting would look like this: 1, 2, 3, 4, 5, 6, 0, 1, 2, 3, 4, 5. Thus 12 modulus 7 equals 5. Here's what counting to 20 looks like, modulus 4:

Normal Counting: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Counting Modulus 4: 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0

A familiar example of modulus from your everyday life is a clock. A clock counts modulus 12, because when you get to 12, you actually reset to 0. When you convert from miliary time to civilian time, you are really dividing military time by 12 and taking the remainder. 14 o'clock is 14 % 12 = 2 o'clock.

Relational Operators

You probably already know the relational operators, so I'm just going to give them a quick once-over. Relational operators compare values and determine which is greater or lesser. Unlike in math, these operators don't just tell us a fact about size comparisons; these are fully fledged operators and return an answer, like any other operator. The operator will return a True (1) or a False (0) value, depending on if the statement they are in is true or false. So for example, 5 < 3 will return false or 0, because clearly 5 is actually greater than 3, not less than it. It's very important to stress that to the computer, a true/false value and a 1/0 value are both represented as a single bit of data which can be 1 or 0. They are interchangeable in most situations.

There are size relational operators: less than (<), less than or equal to (≤ or <=), greater than (>), greater than or equal to (≥ or >=), equals (==), and not equals (!=). Note that we normally write less than or equal to as '<=' and greater than or equal to as '>='. Also note that when we want to test if two things are equal (remember that we'll receive a 1 (true) or 0 (false) value back) we have to use a double equals sign. This is because the single equals is already reserved for assignment. x = 5 uses the assignment operator and will set the variable x to be 5, while x == 5 uses the equality comparison operator and will evaluate to 1 (true) if x is indeed equal to 5, and it will evaluate to 0 (false) otherwise.

Logical Operators

There are three main logical operators: AND, OR, and NOT. Like the relational operators, they return a true or false value based on their operands' values. However, they differ in that where the relational operators operate on integers, they operate on true and false values. I'll first explain what each one does, then show you a truth table for each one. A truth table is a table which shows the output of a logical operation given some certain inputs. It works similar to a times table, where the top and left columns represent the operands, while the interior represents the results.

AND (sometimes represented as &&) evaluates to true if both of its operands are 1 (true), and evaluates to false for any other case.

&& True False
True True False
False False False

As you can see, True && True = True, True && False = False, False && True = False, and False && False = False.

OR (sometimes represented as ||) evaluates to false if both of its operands are 0 (false), and evaluates to true for any other case. In this way, it is like the inverse of AND.

|| True False
True True True
False True False

As you can see, True && True = True, True && False = True, False && True = True, and False && False = False.

NOT (sometimes represented as !) is also called the logical complement. It is a unary operator, meaning it operates on only one value. NOT flips the value of whatever it is applied to, so NOT True becomes False and NOT False becomes True.

Input True False
!Input False True

Unary Operators

There are a few operators which operate on only one value which are important to talk about. They don't appear in all languages, but they are important in the ones they do appear in. They are: minus/negate (-), increment (++), and decrement(--).

Minus does exactly what it says it does. It takes the value you give it and returns -1 times that value. So if x is equal to 5, -x returns -5.

Increment and decrement are shorthand operators, meaning they take a longer operation and make it shorter to write out. Increment simply results in the value plus one, while decrement results in the value minus one. So if x is equal to 17, x++ equals 18, and x-- equals 16.

Concatenation

Concatenation is concept which very similar to the addition operator, but instead of acting on numbers, it acts on strings. Concatenation allows the joining of many strings together to create a larger string. Normally, concatenation uses the '+' operator. Let's look at some examples:

"Hello" + " " + "World" = "Hello World"
"My " + "name " + "is " + "[redacted]." = "My name is [redacted]."

The first thing to notice is that strings are contained within quotes ("). In the first example, we have a string saying "Hello", and then we add it to a string containing a single space character, and then we add "World" to that. This, predictably, results in the string "Hello World". The second example is also very self-explanatory.

But now we run into a conundrum: if strings are enclosed by quotes, how do we put a quote inside a string? We do this using escape characters. An escape character is normally written with a backslash (\) and then a specific character, which then becomes a new character. It's similar to the character encoding we discussed above. Some examples of escape characters:

"\"We should take a trip\"," + " " + " he said." = "We should take a trip", he said.
"\'X\' " + "is " + "my " + "favorite " + "letter." = 'X' is my favorite letter.

For this example, I used concatenation, escape characters, and enclosed my strings on the left side, while on the right side I wrote the result in plain text. There are many escape characters, but here is a list of the most useful ones:

- \" represents a quotation mark (").
- \' represents an apostrophe (').
- \\ represents a backslash (\).
- \t represents a tab character, which is like hitting the TAB key.
- \n represents a 'newline' character, which is like hitting the ENTER key.
- \b represents a backspace character, which is like hitting the BACKSPACE key.
- \r represents a 'carriage return', which returns the cursor to the beginning of the line.

Casting

Casting, where a value represented in one datatype is converted to another equivalent value represented by a different datatype, is an important concept to understand. In some languages, your program might require you to have a value in a certain type. This might require turning an int into a double or a double into an int (note that if we convert a double, such as 5.492, into an int, it will become just 5, and we'll lose some data precision). We might also have to turn a character into its numeric ASCII representation or vice versa, or likewise with its unicode representation. We could have to turn an int into a string (so the number 10 becomes the string "10"), or back (so the string "15" becomes the int 15). There are many possibilities, so being aware of this concept may come in useful when you are programming.

Expressions

Now that we've learned about datatypes, variables, and operators, we're ready to learn about expressions. Expressions in CS (Computer Science) are very similar to expressions in math, in that they are a mix of values and operations which need to be 'evaluated' to give a final result. Like mathematical expressions, they have a specific order of operations which is an internationally-recognized standard of which operations take precedence over which others. In math, you probably learned about PEMDAS (Parenthesis, Exponentiation, Multiplication/Division, Addition/Subtraction) and evaluating from left to right. In computer science, it's pretty much the same order, but with some additional operators. A good reference on CS order of operations is this page, particularly the section on C's order of operations (C is a programming language). Don't worry about the operations you don't know about yet. As the page says, some languages require programmers to define an order of operations, or simply go left to right; however, most languages have an order of operations similar to C's.

The best way to learn expressions is just to try a bunch of them. So here is a large list of sample expressions and explanations for you to peruse at your leisure.

De Morgan's Law

If you look at the wikipedia page for De Morgan's Law you will be likely very confused and somewhat intimidated. However, it is really quite simple. De Morgan's law essentially tells us what happens when we take an expression, such as a AND b < c, and stick a NOT in front of the whole thing, like so: NOT (a AND b < c). What we have to do is systematically go through the expression, replacing each relational or logical operator with its inverse. So NOT (a AND b < c) = (a OR b >= c). Here's a handy chart of each operator we need to replace and its inverse:

Operator Inverse
AND OR
OR AND
NOT NOT
> <=
< >=
>= <
<= >
== !=
!= ==

Some examples:

NOT(y > x OR a < b) becomes (y <= x AND a >= b)
NOT(numDogs != numCats) becomes (numDogs == numCats)
NOT(vec1 > vec2 AND vec3 >= vec4) becomes (vec1 <= vec2 OR vec3 < vec4)

Multidimensional Datastructures

The last thing I want to touch on briefly is the idea of a multidimensional datastructure. This is what results from nesting a datastructure inside another datastructure; for example, making an array of arrays of ints. If you want to visualize it with an analogy, you might have a large stage, representing the top-level array. On it are a number of tables, each labeled with their index number in the top-level array. Each table has a number of cups on it, which are also labeled with their index numbers. Inside each cup is a value.

When we wanted to point to a specific value in our single-dimensional array, we simply used the index as our key. In our 2D array, things get slightly more complicated. Each table has an index, and each cup on each table has an index. So to refer to the third cup on the second table, we should refer to the cup with index 2 on the table with index 1 (remember that arrays are 0-indexed, so the 5th element has an index of 4).

In a similar vein, we could conceptualize a dictionary with dictionaries inside it, or even a dictionary with arrays inside of it. Our stage could be a dictionary, filled with tables (arrays) which each are labeled with their string-type key, and those tables have indexed cups on them. We could even think of a 3D structure, or an array filled with arrays filled with arrays filled with something. The possibilies are pretty much endless.

In Conclusion

So today we learned:
- Binary numbers are how computers store data.
- Algorithms are sets of instructions which manipulate data.
- Datatypes are various ways of representing/storing data.
- Primitive datatypes are those which are built in to a specific programming language.
- Data structures are ways to organize data so we can store, modify, and retrieve it.
- Arrays/Vectors and Dictionaries/Maps are datastructures which use a key-value organizational scheme.
- Array keys are integers, while dictionary keys are strings.
- Strings are arrays of characters.
- Variables allow us to modify the values of data within our code.
- In CS, we use Arithmetic, Relational, Logical, Unary, and Bitwise operators.
- Concatenation is like addition, but for strings.
- Escape characters allow us to represent certain reserved characters, such as " or '.
- Casting allows us to change a value stored as one datatype into an equivalent value stored as another datatype.
- Expressions are combinations of values and operators which can be evaluated using operator precedence (order of operations).
- De Morgan's Law guides us on how to change an expression with a NOT at the front.
- We can nest datastructures inside other datastructures to make multidimensional datastructures.

Thank you for reading through this, and I hope you learned something. Next, please check out the Control Structures page.


Home    Back to Top