Most digital transmissions are encrypted and decrypted symmetrically using a single shared key for both operations. But how do you share a key in the first place when eavesdroppers are lurking? Previously, we discussed one solution, namely, encrypt and transmit that key using an asymmetric public key protocol like RSA. RSA security stems from multiplication of two large primes being a one-way function-- easy to do, hard to reverse. Well today, we'll explore a different one-way function that enables an altogether different and much cooler solution to the key sharing problem. [MUSIC PLAYING] Homework Alert-- you're going to need these prior episodes to understand this one. So trust me, now is the time to pause and go watch those. In those episodes, we said that symmetric single key encryption schemes have become the workhorses of secure communication and for a good reason. They're fast and practically bulletproof once two parties, like Alice and Bob, have a single-shared key in hand, but that's the challenge. They can't use symmetric key encryption to share the original symmetric key. So how do they get started? Sharing the key offline is conceivable, but it's rarely practical or even possible. Instead, one of them could transmit a shared key after encrypting it asymmetrically using a public-private key protocol, like the well-known RSA scheme. But no matter what, to get the ball rolling on symmetric encryption, Alice and Bob seem to need some mechanism of transmitting a shared key upfront. But what if Alice and Bob could share a key without transmitting it? In other words, what if they could somehow synthesize keys separately that could somehow be guaranteed to come out identical? That sounds impossible, but it can be done via a procedure called Diffie-Hellman Key Exchange, named after the authors of the 1976 paper in which that proposal was first published. Diffie-Hellman, or DH for short, has several variants, but I'll focus on the original version just for easier exposition of the main ideas, however, some disclaimers. One, we're about to get more "mathy," so I suggest following along with pencil and paper and pausing as necessary. Two, in the interest of time and clarity, I will oversimplify and gloss over many details. The references listed below should partially fill that gap, but let us know in the comments if you'd like more details fleshed out by us. And third, Diffie-Hellman is based on modularity arithmetic. Now Kelsey once gave a nice primer on this topic that I will rely on heavily. This episode is not self-contained. So if you need a refresher on that, it's time to pause me and go watch her. I'll wait. Welcome back. Diffie-Hellman is based on some neat structural properties of the numbers less than n that are also relatively prime or coprime to n. That was a mouthful. As Kelsy previously stated, every positive integer smaller than n and coprime to n has a mod n period, some smallest power r that gets you back to 1 after repeated mod n exponentiation. For example, suppose that n is 8 to which 1, 3, 5, and 7 are all coprime. 1 has a mod 8 period of 1 since 1 to the 1 is always 1. And some quick arithmetic shows that 3, 5, and 7 each have a mod 8 period of 2. How about when n equals 5? Well, since 5 is prime, all the positive integers below 5, that's 1 through 4, are coprime to 5. Again, 1 has a period of 1, and 4 has a period of 2. But notice that this time 2 and 3 each have mod 5 period of 4. Moreover, en route to a value of 1, the mod 5 powers of 2 and 3 each cycle through all the coprimes of 5. This cycling feature is the linchpin of Diffie-Hellman. And to talk about it concisely, I need to introduce some terminology. So bear with me for a minute because we need this. The set of numbers coprime to n together with mod n multiplication is called the multiplicative group of integers mod n, which for this episode, I'll abbreviate to just the mod n group. By the way, group here doesn't just mean collection. It has a technical, mathematical meaning that we'll get into another time. Moving on. The number of elements in a group is called its order. Both the mod 8 and mod 5 groups have order 4. Elements of the group whose powers cycle through the whole group are called primitive roots mod n, or alternatively, generators of the mod n group. So 2 is a primitive root mod 5, a.k.a. a generator of the mod 5 group. Groups that have at least one generator are called cyclic. So the mod 5 group is cyclic, but the mod 8 group is not because nothing in that group cycled. Just to make sure you're following, because that was a lot, go ahead and verify for yourself that the mod 7 group is cyclic and that it has two generators-- 3 and 5. By the way, if you're wondering which mod n groups end up being cyclic and which ones don't, the cyclic ones have n equal to 2, 4, a power of an odd prime, or twice a power of an odd prime. That's a neat fact, not obvious. Anyway, with this terminology in hand, here finally, is the Diffie-Hellman protocol. Step one, in full view of Eve, Alice and Bob will agree on two things, first, a modulus n. This amounts just to choosing some multiplicative group. And second, they'll pick one of the generators g of that group. So say they pick n equals 11 for illustration. The group elements are 1 through 10, all coprime to 11, four of which are also generators-- 2, 6, 7, and 8. Let's say they pick g equals 7. Step two, Alice and Bob each pick one number at random from the group that they won't reveal even to each other. Let's say Alice picks a equals 2, and Bob picks b equals 4 arbitrarily. Step three, Alice and Bob each compute g raised to the power of their own secret number mod n. And then they transmit those results to each other with Eve still watching. As you can see, Alice and Bob end up transmitting 5 and 3 respectively. And now, the magic. Step four, Alice and Bob each raise the others publicly transmitted number to the power of their own private number mod n. And check it out. In mod 11 arithmetic, they both end up with 9. Now why that happens seems less mysterious if you represent the last maneuver in terms of the generator g equals 7 and if you invoke the fact that exponent rules still hold in modular arithmetic. But conceptually, do you see what just happened? Alice and Bob have managed to share a number, 9, without ever transmitting it publicly. So they can now use that number for symmetric encryption provided that Eve can't also determine that number from those numbers that she has intercepted. So can Eve do this? Can Eve guess Alice and Bob's shared number? In our simple example, yes, but in general, no. And even our simple example can give you insight into why. Think about it like this. If you are Eve, how would you solve these equations to find Alice's or Bob's private number? You'd probably end up listing the mod 11 powers of 7 and seeing where 5 and 3 fall in that list. Now even with numbers this small, I think you can see that that's much slower and less procedural than what Alice and Bob have to do to synthesize the 9 in the first place, namely, just raise 7 to a single power. More generally, when the modulus and the exponents are enormous, computers can still do modular exponentiation extremely quickly. But there simply is no fast way to undo modular exponentiation. That inverse problem is called the Discrete Logarithm Problem, or the DLP, and it's much harder than the ordinary logs you do on your calculator. There are better algorithms than just listing powers of the generator, but even those rely on pre-computing some sort of lookup tables. And here's the thing. Eve's best known method for discovering Alice and Bob's shared key is to solve the DLP for at least one of their secret numbers. Now when n is 11, Eve can make the necessary tables by hand. But if n is a huge prime, Eve's in trouble. That's not because prime numbers are inherently necessary for Diffie-Hellman, but rather because from a practical standpoint, a group with a prime modulus n will automatically have n minus 1 members, which is also a huge number, that means a crazy number of generators and powers of generators to tabulate. Once n has several hundred decimal digits, then cracking the DLP become computationally infeasible within a human lifetime, even for the NSA. So there you have it. That's my admittedly oversimplified, but still fairly faithful synopsis, of Diffie-Hellman key exchange. It's a good reminder that, contrary to much popular belief, encryption is not inherently about primes or factoring but rather about one-way functions. And here's a fun fact. One-way functions needn't be numerical in nature. In fact, in a later episode, we'll showcase the one-way function with the fastest growing footprint in contemporary cryptography-- a one-way function whose flavor is much more geometric. You're super curious now, huh? Patience grasshopper, we will get there. I'll see you soon.