Notes for FSP0018 – Cryptography and Encryption
This was originally going to be about Blockchain and Bitcoin, which was going to be in preparation for a separate episode on the use of blockchain in real estate and property management and FM. While I was preparing my blockchain notes, I found that I had to spend a lot of time on explaining cryptography concepts. These cryptography concepts apply to much more than just blockchain so I decided that I better just do a cryptography and encryption episode. (So here were are)
Cryptography and encryption

Cryptography is the science and methods for secure communication. A lot of the definitions you might find use the word “secret” instead of “secure” but in modern times we use cryptography to do more than just keep secrets. Specifically, we also use cryptography for equally important things like authentication and for verification of data integrity (which I’ll cover later).

Encryption (which you might call a subset of cryptography) is the process of encoding readable data into unreadable data in a way that can only be reversed by someone who knows the relevant secret.

Historically the secret might have been the encoding method. So I could tell you how I am going to scramble up the data before sending it to you (a simple (and not very secure) example might be to move all the letters 3 places forward in the alphabet, so it’ll look like nonsense to anybody that sees it). Then when you get the data you could use the method to unscramble it. This method of transmitting secrets has 2 notable limitations. The first is that if someone were to discover the method you were using, they would be able to read all of your previous messages and also you would have to come up with a new method. The second limitation (which might be even more of a limitation) is that if you wanted to communicate secretly with more than one other party but you only wanted each of the other parties to be able to read the messages sent to them and not be able to read the message you sent to the others, you would have to come up with a different encoding method for each party which would quickly become cumbersome or impossible.

A better idea is to use an encoding method that uses another piece of data called a key. Because encoding and decoding requires a key, the encoding method doesn’t need to be kept secret. You can publish it widely for everyone to use and only keep the key secret. Using the same encoding method will produce different results when different keys are used. You then share different keys with all of the different parties you want to communicate with. If you lose your key or somebody discovers it, you can discard it and start using a different key. If you want to make sure nobody will be able to read a message if they later discover your key, you can use a key for a short time or maybe only once and then destroy it and choose another one. Most of our modern digital systems use math as the encoding methods and numbers as the keys which is a little tricky to understand. A conceptually simple illustration of the concept is to use the encoding I described above (move each letter 3 places forward) but instead of using the fixed number 3, we use the text of a book as the key. So we agree on the Great Gatsby. I’ll start at the first letter of the book, and for each letter in my message I’ll add a letter from The Great Gatsby striating at the first letter of the book and proceeding through the text. When I say “add” a letter, I mean that if we think of each letter as its number in the alphabet we can add the numbers for 2 letters together (wrapping around to the beginning when the number gets higher than there are letters in the alphabet) and then convert the number you end up with into its corresponding letter. And you can choose a different book for different parties you might want to communicate with. If you have ever read any spy or mystery books, you might have seen something like this.

The term “decryption” is used to mean undoing the encryption. So encryption turns plaintext into cypher text and decryption turns cypher text into plaintext

Terminology

Plaintext – the unencrypted data. This is the readable data you want to keep secret.

Ciphertext – the encrypted data. This is the data you get after you pass the plaintext through the cryptographic encoding. Ideally, cypher text should look like completely random nonsense to somebody that doesn’t have the key.

Key – This is the secret used in certain types of encryption methods.

Algorithm – a set of instructions. In the case of cryptography the algorithm is the exact method of encoding. (While there are many different cryptographic algorithms, I’m only going to cover the basic concepts here. The details of any particular algorithm are out of the scope of this podcast.)

Hash function – a function that takes some arbitrarily sized chunk of data and (by some algorithm) maps it into a data set of a fixed size. Usually, the arbitrarily sized data is much larger than the fixed size that it gets mapped to.

That’s definition might not be very helpful if you don’t already know what a hash function is, so how about an example. Think of arbitrarilysized data as a chapter in a book. There is no fixed length for this. Your hash (that’s the output of the hash function) is always the same size (say 128 bits…that would be a number between 1 and 2^128. 2^128 is a very large number, but not as big as the number of data chunks that can be the size of a book chapter – there are effectively infinity of these). So you have a function that will take the data from any book chapter and produce a 128 bit value.

A particular chunk of input data will always produce the same hash value which is one of the things that make the hash function so useful.

Hash functions are used for many different things in computer science. But for this discussion we are mainly concerned with hash functions as they are used in authentication (as digital signatures) and for verification of data integrity. For this we want what are called cryptographic hash functions.

Cryptographic hash function has the following properties

Computationally infeasible to create an input that maps to a particular hash value. (Collisions possible….)

Computationally infeasible to find 2 different inputs that map to the same hash value.

A small change to the input data should create a big change in the hash.


Other words we use for a hash, depending on the context are digest (message digest), digital fingerprint, checksum.


Computationally infeasible – it’s important that our encryption algorithms and our cryptographic hashes are hard enough to break that we can actually rely on them for security. One way to to break a keyed encryption mechanism would be to continually guess keys and try them on the ciphertext until you get something that looks like the message you expect. We can’t say that this is impossible in general for our encryption algorithms (in fact it is possible, yet we still rely on them for security), so we try to choose algorithms and key sizes so that it would take so long to guess keys that it would be computationally infeasible to do so. For the algorithms we are using now, “computationally infeasible” means that you could try guessing, but on average it would take you millions or trillions of years to come up with the right key even if you used all of our available computing power. We can’t say that these algorithms are impossible to break, the best we can do is computationally infeasible
Two main classes of encryption methods – Symmetric key and asymmetric key

Symmetric key encryption

the same key is used for both encryption and decryption.

Since both parties need the same key, we need to arrange to securely share the secret key. Generally we need to do this outside the channel over which we want to communicate securely.


Asymmetric key encryption (sometimes called “public key”)

In this case there are 2 keys. The keys are chosen (actually algorithmically generated through the magic of math) so that if you run the encryption algorithm with one of the keys, you need the other key to do the decryption.

Generally we call one of these keys the public key and the other the private key. Each party will generate a key pair. The public key is not a secret and is distributed to anybody. The private key must be kept totally secret.

This scheme is designed so that it is very easy to create the two keys but very difficult (computationally infeasible) to compute the private key if you only have the public key.

Cryptography can provide the following benefits: (I say “can provide” because a particular system of cryptography doesn’t necessarily provide all of these things. We choose the particular methods to get the desired functionality.)

Confidentiality (privacy) – We want to keep the content of the message secret, by which I mean only readable by the intended recipient.

When using a symmetric key system, party A and party B will share the encryption key prior to communicating. They can then encrypt and decrypt their messages using their shared key and the agreedupon algorithms. This is pretty straightforward.

When using an asymmetric key system, it’s a little different. If party A (Alice) wants to send a secret message to party B (Bob), Alice needs to get Bob’s public key (which Bob has published so anyone can use it) and use it to encrypt the message before sending it to Bob. Since a message encrypted with Bob’s public key can only be decrypted with Bob’s private key, Alice can be assured that the message can only be read by Bob. Then if Bob wants to send a secret message to Alice, Bob needs to first find Alice’s public key, and so forth.

Forward secrecy – Assurance that if the “main” key is compromised, the party that acquired the key won’t be able to read messages from the past. (not necessarily a property of a particular encryption algorithm but rather a consequence of using encryption in a certain way where (generally) randomly chosen onetime use keys are securely agreed upon and then discarded…imaging burning the page of the code book after you use it)


Integrity – We want to verify that the data hasn’t been changed in transit either through deliberate tampering or by some transmission error. We generally use cryptographic hashes for this.

Before Alice sends a message (or a file or whatever) to Bob, Alice computes a hash of the message, sends the message to Bob and posts the hash somewhere where Bob can find it but she can be reasonably sure no one can change it (for example on a web site she controls). When Bob receives the message, he can compute the hash. If Bob’s hash matches Alice’s hash, Bob can be sure he got the the data Alice sent and not a modified version.

This type of hash is often called a message digest or checksum.

This is often used when distributing software or documents. In this case it might not be important that the message is secret, but it is important the receiver receives the data that was sent (think software updates, etc).



Authentication – We want to be sure the message came from the party we think it came from (if Alice gets a message from Bob, Alice wants to be sure that Bob actually sent the message and that the message wasn’t sent by someone pretending to be Bob).

A couple of ways to do this:

Symmetric key systems: If you’re sure the shared key is secure, and you receive message from someone that decrypts with the shared key, you can be pretty sure the message came from the person you know has the key

Asymmetric key systems: when somebody sends you an encrypted message, it tells you nothing about who they are because in order to send you a secret message they use your public key, which anyone can use. We need to do something different.

If we want to prove that something came from us using an asymmetric key system, we can encrypt the data using our private key. Then anyone can verify that it came from us because only our wellknown public key will decrypt it. In this case the message won’t be secret, but maybe we want that. If we do want the message to be secret and also authenticated…let’s go back to Alice and Bob. Say Alice wants to send a secret message to Bob but also want to make sure Bob knows it came from her. Alice would first encrypt the message with Bob’s public key (now it can only be read by Bob). Next Alice would encrypt it with her private key (now Bob can verify the message came from Alice. By the way this also provides an assurance of data integrity. Bob knows that nobody could have changed the data after Alice encrypted it with her private key.

We also use asymmetric key encryption with hashes to provide assurance of authentication and integrity. Sometimes we don’t want the overhead of encrypting the whole message. Instead, we can hash the message with a cryptographic hash function and then encrypt the hash with our private key. We can send the encrypted hash along with the message. Anyone who receives the message (document or software or whatever) can decrypt the hash with our public key and also compute the hash themselves. If the two match they know they got the data we sent. This type of encrypted hash is called a digital signature.



Nonrepudiation. Authentication is usually considered important because we want to make sure we aren’t communicating with an imposter, but authentication can also provide a form of nonrepudiation (which means the sender can’t credibly deny they sent the message or did the thing that includes their digital signature or whatever). If we have a mechanism that provides some assurance that a certain thing was done by a certain party, we also have a reason to not believe them if they say they didn’t do the thing.

Some examples of cryptographic algorithms

Symmetric

AES – Advanced Encryption Standard – maybe the most common today

DES – Date Encryption Standard

Blowfish

RC4 (or 5 or 6) – Rivest Cipher


Asymmetric

RSA

DiffieHelman key exchange


Commonly used hashing algorithms

SHA – Secure hashing algorithm – several versions of this some of which are considered weak

MD5 – used to be common but no longer considered secure.
