Hi, I'm Carrie Anne, and welcome to CrashCourse Computer Science!
Over the last three episodes, we've talked about how computers have become interconnected, allowing us to communicate near-instantly across the globe.
But, not everyone who uses these networks is going to play by the rules, or have our best interests at heart.
Just as how we have physical security like locks, fences and police officers to minimize crime in the real world, we need cybersecurity to minimize crime and harm in the virtual world.
Computers don't have ethics.
Give them a formally specified problem and they'll happily pump out an answer at lightning speed.
Running code that takes down a hospital's computer systems until a ransom is paid is no different to a computer than code that keeps a patient's heart beating.
Like the Force, computers can be pulled to the light side or the dark side.
Cybersecurity is like the Jedi Order, trying to bring peace and justice to the cyber-verse.
INTRO The scope of cybersecurity evolves as fast as the capabilities of computing, but we can think of it as a set of techniques to protect the secrecy, integrity and availability of computer systems and data against threats.
Let's unpack those three goals: Secrecy, or confidentiality, means that only authorized people should be able to access or read specific computer systems and data.
Data breaches, where hackers reveal people's credit card information, is an attack on secrecy.
Integrity means that only authorized people should have the ability to use or modify systems and data.
Hackers who learn your password and send e-mails masquerading as you, is an integrity attack.
And availability means that authorized people should always have access to their systems and data.
Think of Denial of Service Attacks, where hackers overload a website with fake requests to make it slow or unreachable for others.
That's attacking the service's availability.
To achieve these three general goals, security experts start with a specification of who your "enemy" is, at an abstract level, called a threat model.
This profiles attackers: their capabilities, goals, and probable means of attack - what's called, awesomely enough, an attack vector.
Threat models let you prepare against specific threats, rather than being overwhelmed by all the ways hackers could get to your systems and data.
And there are many, many ways.
Let's say you want to "secure" physical access to your laptop.
Your threat model is a nosy roommate.
To preserve the secrecy, integrity and availability of your laptop, you could keep it hidden in your dirty laundry hamper.
But, if your threat model is a mischievous younger sibling who knows your hiding spots, then you'll need to do more: maybe lock it in a safe.
In other words, how a system is secured depends heavily on who it's being secured against.
Of course, threat models are typically a bit more formally defined than just "nosy roommate".
Often you'll see threat models specified in terms of technical capabilities.
For example, "someone who has physical access to your laptop along with unlimited time".
With a given threat model, security architects need to come up with a solution that keeps a system secure - as long as certain assumptions are met, like no one reveals their password to the attacker.
There are many methods for protecting computer systems, networks and data.
A lot of security boils down to two questions: who are you, and what should you have access to?
Clearly, access should be given to the right people, but refused to the wrong people.
Like, bank employees should be able to open ATMs to restock them, but not me... because I'd take it all... all of it!
That ceramic cat collection doesn't buy itself!
So, to differentiate between right and wrong people, we use authentication - the process by which a computer understands who it's interacting with.
Generally, there are three types, each with their own pros and cons: What you know.
What you have.
And what you are.
What you know authentication is based on knowledge of a secret that should be known only by the real user and the computer, for example, a username and password.
This is the most widely used today because it's the easiest to implement.
But, it can be compromised if hackers guess or otherwise come to know your secret.
Some passwords are easy for humans to figure out, like 12356 or q-w-e-r-t-y.
But, there are also ones that are easy for computers.
Consider the PIN: 2580.
This seems pretty difficult to guess - and it is - for a human.
But there are only ten thousand possible combinations of 4-digit PINs.
A computer can try entering 0000, then try 0001, and then 0002, all the way up to 9999... in a fraction of a second.
This is called a brute force attack, because it just tries everything.
There's nothing clever to the algorithm.
Some computer systems lock you out, or have you wait a little, after say three wrong attempts.
That's a common and reasonable strategy, and it does make it harder for less sophisticated attackers.
But think about what happens if hackers have already taken over tens of thousands of computers, forming a botnet.
Using all these computers, the same pin - 2580 - can be tried on many tens of thousands of bank accounts simultaneously.
Even with just a single attempt per account, they'll very likely get into one or more that just happen to use that PIN.
In fact, we've probably guessed the pin of someone watching this video!
Increasing the length of PINs and passwords can help, but even 8 digit PINs are pretty easily cracked.
This is why so many websites now require you to use a mix of upper and lowercase letters, special symbols, and so on - it explodes the number of possible password combinations.
An 8-digit numerical PIN only has a hundred million combinations - computers eat that for breakfast!
But an 8-character password with all those funky things mixed in has more than 600 trillion combinations.
Of course, these passwords are hard for us mere humans to remember, so a better approach is for websites to let us pick something more memorable, like three words joined together: "green brothers rock" or "pizza tasty yum".
English has around 100,000 words in use, so putting three together would give you roughly 1 quadrillion possible passwords.
Good luck trying to guess that!
I should also note here that using non-dictionary words is even better against more sophisticated kinds of attacks, but we don't have time to get into that here.
Computerphile has a great video on choosing a password - link in the dooblydoo.
What you have authentication, on the other hand, is based on possession of a secret token that only the real user has.
An example is a physical key and lock.
You can only unlock the door if you have the key.
This escapes this problem of being "guessable".
And they typically require physical presence, so it's much harder for remote attackers to gain access.
Someone in another country can't gain access to your front door in Florida without getting to Florida first.
But, what you have authentication can be compromised if an attacker is physically close.
Keys can be copied, smartphones stolen, and locks picked.
Finally, what you are authentication is based on... you!
You authenticate by presenting yourself to the computer.
Biometric authenticators, like fingerprint readers and iris scanners are classic examples.
These can be very secure, but the best technologies are still quite expensive.
Furthermore, data from sensors varies over time.
What you know and what you have authentication have the nice property of being deterministic - either correct or incorrect.
If you know the secret, or have the key, you're granted access 100% of the time.
If you don't, you get access zero percent of the time.
Biometric authentication, however, is probabilistic.There's some chance the system won't recognize you... maybe you're wearing a hat or the lighting is bad.
Worse, there's some chance the system will recognize the wrong person as you - like your evil twin!
Of course, in production systems, these chances are low, but not zero.
Another issue with biometric authentication is it can't be reset.
You only have so many fingers, so what happens if an attacker compromises your fingerprint data?
This could be a big problem for life.
And, recently, researchers showed it's possible to forge your iris just by capturing a photo of you, so that's not promising either.
Basically, all forms of authentication have strengths and weaknesses, and all can be compromised in one way or another.
So, security experts suggest using two or more forms of authentication for important accounts.
This is known as two-factor or multi-factor authentication.
An attacker may be able to guess your password or steal your phone: but it's much harder to do both.
After authentication comes Access Control.
Once a system knows who you are, it needs to know what you should be able to access, and for that there's a specification of who should be able to see, modify and use what.
This is done through Permissions or Access Control Lists (ACL), which describe what access each user has for every file, folder and program on a computer.
"Read" permission allows a user to see the contents of a file, "write" permission allows a user to modify the contents, and "execute" permission allows a user to run a file, like a program.
For organizations with users at different levels of access privilege - like a spy agency - it's especially important for Access Control Lists to be configured correctly to ensure secrecy, integrity and availability.
Let's say we have three levels of access: public, secret and top secret.
The first general rule of thumb is that people shouldn't be able to "read up".
If a user is only cleared to read secret files, they shouldn't be able to read top secret files, but should be able to access secret and public ones.
The second general rule of thumb is that people shouldn't be able to "write down".
If a member has top secret clearance, then they should be able to write or modify top secret files, but not secret or public files.
It may seem weird that even with the highest clearance, you can't modify less secret files.
But, it guarantees that there's no accidental leakage of top secret information into secret or public files.
This "no read up, no write down" approach is called the Bell-LaPadula model.
It was formulated for the U.S. Department of Defense's Multi-Level Security policy.
There are many other models for access control - like the Chinese Wall model and Biba model.
Which model is best depends on your use-case.
Authentication and access control help a computer determine who you are and what you should access, but depend on being able to trust the hardware and software that run the authentication and access control programs.
That's a big dependence.
If an attacker installs malicious software - called malware - compromising the host computer's operating system, how can we be sure security programs don't have a backdoor that let attackers in?
The short answer is... we can't.
We still have no way to guarantee the security of a program or computing system.
That's because even while security software might be "secure" in theory, implementation bugs can still result in vulnerabilities.
But, we do have techniques to reduce the likelihood of bugs, quickly find and patch bugs when they do occur, and mitigate damage when a program is compromised.
Most security errors come from implementation error.
To reduce implementation error, reduce implementation.
One of the holy grails of system level security is a "security kernel" or a "trusted computing base": a minimal set of operating system software that's close to provably secure.
A challenge in constructing these security kernels is deciding what should go into it.
Remember, the less code, the better!
Even after minimizing code bloat, it would be great to "guarantee" that code as written is secure.
Formally verifying the security of code is an active area of research.
The best we have right now is a process called Independent Verification and Validation.
This works by having code audited by a crowd of security-minded developers.
This is why security code is almost always open-sourced.
It's often difficult for people who wrote the original code to find bugs, but external developers, with fresh eyes and different expertise, can spot problems.
There are also conferences where like-minded hackers and security experts can mingle and share ideas, the biggest of which is DEF CON, held annually in Las Vegas.
Finally, even after reducing code and auditing it, clever attackers are bound to find tricks that let them in.
With this in mind, good developers should take the approach that, not if, but when their programs are compromised, the damage should be limited and contained, and not let it compromise other things running on the computer.
This principle is called isolation.
To achieve isolation, we can "sandbox" applications.
This is like placing an angry kid in a sandbox; when the kid goes ballistic, they only destroy the sandcastle in their own box, but other kids in the playground continue having fun.
Operating Systems attempt to sandbox applications by giving each their own block of memory that others programs can't touch.
It's also possible for a single computer to run multiple Virtual Machines, essentially simulated computers, that each live in their own sandbox.
If a program goes awry, worst case is that it crashes or compromises only the virtual machine on which it's running.
All other Virtual Machines running on the computer are isolated and unaffected.
Ok, that's a broad overview of some key computer security topics.
And I didn't even get to network security, like firewalls.
Next episode, we'll discuss some specific example methods hackers use to get into computer systems.
After that, we'll touch on encryption.
Until then, make your passwords stronger, turn on 2-factor authentication, and NEVER click links in unsolicited emails!
I'll see you next week.