Sunday, 20 February 2011

Just a litlle bit more on N vs. NP: Computational Complexity

So, after having a conversation with my friend Daniel (who will at some time help me redesign this blog), I realised that I had made one very big, and fairly erroneous,  assumption with regards to P vs. NP: People know about computational complexity analysis. So, now I will correct that and explain computational complexity.

So, every computer program is basically a set of steps, which is called an algorithm. When designing any software, you first design the algorithm and then implement it. There are several reasons for this, the one of interest to us is complexity analysis. This is basically counting how many steps the program will take and how many CPU cycles are needed. The basic rule of thumb is that each addition, subtraction, multiplication, division, memory lookup and assignment take 1 CPU cycle.

Of course, this is just a heuristic measure and the actual time taken will depend on the actual hardware, but it gives us a fair idea. Another advantage is that it allows us to compare two algorithms for solving the same and find out which one is more efficient. Just to illustrate this, I will demonstrate one of the most classic examples: Adding up numbers from 1 to n.

The simplest way to do that is to have a counter and add each number to it in sequence. The pseudocode algorithm is below. Anything in green is a comment and is simply to explain what is happening.

int counter = 0; //Creates a counter, with an intial value of 0
int n = getUserInput(); //Asks the user to enter a number
for i = 1 to n //Repeat the following steps with i having a value from 1 to n
counter += i; //Add i to the counter
next i //Increase the value of i by 1 and repeat above
output counter //Output the result

If we count the number of steps, we see we need 2 steps for initialisation (lines 1&2) and n additions and 1output. This gives a total of n+3 steps. Now we look at another way of doing this, by using this formula. In pseudocode, it would look like this:

int n = getUserInput(); //Asks the user to enter a number
int counter = (n*(n+1))/2; //Creates a counter, and assigns it the value of the sum
output counter //Output the result 


Now, if we count the steps, we get 1 for initialisation, 1addition, 1 multiplication, 1 division and 1 step for output. This gives us a grand total of 5 steps. Comparing that with the previous algorithm, we see that this is more efficient and does not depend on the actual input.

Most programs are more complex than this and have more complicated expressions for their running time. For these we use big-O notation. This gives us an approximate estimate of the effective running time. It also give us very nice and neat complexity classes. Looking at the example above, algorithm 1 has a running time linear in the input size, O(n), while algorithm 2 has constant running time, O(1). We also have logarithmic time O(log n), polynomial time O(n^c), exponentital time, O(c^n) and so on and so forth. Which, quite neatly brings us to the main problem: P vs. NP.

All problems in computing have a solution,  but not all are efficient. Some solutions would takes minutes, others hours and some would take 100s of years to compute. Again, this all depends on your hardware, so actual time measures are not exactly useful. And, this is why we have P and NP.

All the problems that have a solution using an algorithm with time complexity that is polynomial, or less, are classed as P type problems. These can be solved in an acceptable amount of time, e.g. a couple of days. Any thing that takes more than polynomial time is classed as NP. The trick is that even though for small enough instances of NP problems, you can find a solution efficiently, it is the scaling that matters.

So, thus we can see how the above means this post makes a little bit more sense, I hope.

Tuesday, 15 February 2011

A note on passphrase guessing time calculations

Alrighty then, I had few thoughts about these figures that I mentioned here and decided to some math. It all falls apart very quickly. If we actually crunch the numbers, the titular hacker's computing power changes for every equation and produces some interesting values. I will assume there is some rounding off that is done and thus we lose accuracy in the answer explaining the fluctuations.

Now, don't just take my word for it, you can join in at home. Grab a pen, paper and a calculator because we are dealing with HUGE numbers. Before we can begin, we need to do some housekeeping and define some variables. If you recall from my last post, I stated the 3 characteristics a good password should have, but we only consider the two under attack here that is length and complexity.

We define the complexity by the size of the alphabet, a, that is possible characters the password contains. For lower case a = 26, lower and upper case a = 52. With symbols it depends on how many symbols are considered valid, but in general, we have a = 52 + number of valid symbols. The length is fairly straightforward and self-explanatory. We define the total complexity of our password as
c = a^l (where ^ denotes exponentiation.)

Now, to calculate the time it would take for a hacker to guess your password, we need to know how many guesses they can make per second (or other appropriate time unit), which we denote by g. We can see that the time required is
t = c/g (in the worst case)
You may want to consider the average case, which is obtained by dividing by 2g instead of g.

If you compute g for lower case passwords with length 6 and 7, you see that there is a discrepancy in the g value. However, if you take the g from length 6 and plug it into the equation, you get a t of 4.333.... hours, which is close enough to the 4 hours they have stated. This lends credence to my rounding theory, but does not prove it.

The rest is left as an exercise for the reader. So, go on and give it a whirl. Try this with different combinations and see how long it would take somebody to crack your password.

Sunday, 13 February 2011

Passwords/phrases and client side storage

So, after we discussed this, we now move onto the promised post on where and how you should be storing your passwords. But before we get into that, we need to define the importance of passwords and password strength. But, we first need to discuss the term password, mainly the word part. People think, quite intuitively, that a password should be a single word, which is not the best idea. I prefer the term passphrase, implying multiple words and/or numbers and/or symbols. I will use the term passphrase from now on. With that out of the way, I think the next logical step is to discuss password strength.

Password strength is defined by 3 characteristics. The simplest is length, which is fairly obvious. The longer the password the harder it is to guess. I recently stumbled onto these figures, but take them with a grain of salt. They do not specify what kind of hardware was used to make these figures, so its all a bit iffy. Next is complexity, which is illustrated in the afore mentioned figures. (but just the general trend, the actually figures are still questionable). Simply put, if you have more complex passwords, with a combination of lower case, upper case, numbers and symbols, you increase the search space hugely. The third is memorability. There is no point of having "ASddeu43548&^&^ßß" as your password, because you will never remember it. A good password should be easily recalled.  The higher each of these characteristics are, the stringer the password.

Now we move on to the classification of passwords. People have varying opinions on what the exact classifications are, but I use a 4-tier system, detailed below. For each tier we define the suggested password strength wrt the 3 characteristics using the terms High(H), Medium(M), Low(L). We express these as a 3-tuple of the form {Length, Complexity, Memorability} e.g. High length, medium to high complexity and low memorability is written as{H,M-H,L}

TIER I: The big guns; these are passwords for your financial accounts. Online banking, online shopping, or any account which has you financial details, PIN numbers for your ATM/debit/credit cards. These must be very memorable, hard to guess and very strong. You lose one of these, you will lose all your money.
Strength: {H,H,H}

TIER II: These are next in line in terms of importance: Login credentials. This is your school/university/office login name and password. This is how you login to systems at work (wlog) either when you are on the premises or remotely. If you lose these, then you can kiss your professional life goodbye.
Strength: {H,M-H,H}

TIER III: The mid-level identity theft type passwords: email and asocial networking. So your Facebook, HI5, LinkedIn, MySpace, GMail, Y!Mail, Hotmail, thismail, thatmail, and so on and so forth. Depending on how many e-mail addresses you have and what kind of emails you receive on them, the effects of the loss vary. If you lose your primary account's password, then the likelihood of identity theft is significant.
Strength: {M-H,M-H,M}

TIER IV: The throwaways. This is all the stuff you couldn't care less about. Logins for sites that you created just so that you could read certain articles for example. These do carry some risk, but there is a very small risk involved. This depends on several factors, which we will get into in a moment.
Strength: {M,M,L-M}

Now, we need to lay down some ground rules. No passwords should be shared across tiers. This is based on the principle of least privilege. Secondly, passwords from one tier are never stored with passwords from a lower tier. Thirdly, realisability limits; Tier 1 passwords should be unique, tier 2 & 3 should be reused sparsely, Tier 4 can be reused infinitely. Finally, each tier's passwords should be of similar strengths, as explained above. These rules and system in general are a guideline, which I try to follow, but there are grey areas. When in doubt, go for the safest option.

Now that we know how to classify our passwords, we now move onto storage of said passphrases. For tier 1 passwords, we need a highly secure storage, i.e. a password locker. These are programs that will store all your passwords for you in an encrypted form. To decrypt these, you need a master passphrase, which you define when you setup the locker. This passphrase can be thought of as as a tier 0 passphrase, which a mild abuse of notation. This master passphrase has to live in your head and must be of high length, complexity and memorability.

Next we go onto tier 2. These can be stored in programs, but need to be encrypted or locked with another master pass phrase. These should not as a rule be stored with the tier 1 passwords, but that rule tends to be broken for practicality's sake. It is quite annoying having multiple password lockers and it wouldn't be the worst thing if your most important passwords were kept together. If we do have a second password locker, we should treat the master passphrase as tier 1 password.

Now for tier 3, it you can save them in your browser, BUT, they must be encrypted with a master passphrase under all circumstances. Also you should avoid the use of cookies storing the session, caused by checking the "Remember me" check box. This is a big no-no. It's convenient, but it's not really secure. Alternately you could have a third password locker and store them there, encrypted with a tier 2 passphrase (I'm sure you can see the general trend here). If you have your tier 1 and 2 passphrases in a single locker, then this would be a second locker.

And now, tier 4. We all have a billion passwords for a billion sites that we that we use once a month or even less frequently. Theoretically, you could create a new password for each, but you will never be able to remember them all. These you can have your browser remember for you. This way you can have a unique arbitrary password for every single account. There is no real need for a master passphrase encryption, but it is recommended. As you may have guessed, this master passphrase would be a tier 3 passphrase.

These rules are quite rigid, but they are designed from a security point of view rather than a usability point of view. What is an acceptable loss of usability is very much a personal preference and that is up to you. You are welcome to bend and even break some of the rules, to make life easier for yourself. But, remember, you sacrifice security for usability and only you can strike the right balance for yourself. (I avoided saying "you have the power" because it sounds really cheesy) And so, there you have my guidelines for password storage. If in any way this makes the web just that much more secure, then I will have done my job.

Saturday, 12 February 2011

Passwords and server side storage.

OK, so some background reading here. I know it says I will post something about password lockers, but this is not it. Instead of considering where you store your password, we consider where they store you password. They being any website you have signed up for. 

Now, it's obvious that they need to store your password. If not, the whole exercise would be pointless. The initial idea was to store the password in plaintext in a file in the server. BAD IDEA! If an attacker were to somehow get a hold of that file, you are screwed. Royally! The attacker now knows all your users' login credentials and can run amok doing all sorts of bad things. So, there were 2 bright ideas to fix this: encryption and hashing.

We note that both work in a similar way at the high level, which we will abstract as a function f. With these, when a user registers, we get the plaintext input of the password, pass and compute f(pass). We then store f(pass) instead of pass. When a user attempts to login, we get the password they entered login_pass and compute f(login_pass). We then check to see if f(login_pass) == f(pass). If they are the same, we conclude the user has entered the right password and let them in. If not, then they get an annoying error message.

Now, we straight away notice a problem: Collisions! If person A and person B both have the same password, then the stored value f(pass) will be the same. This means that if the attacker knows person A's password, they know person B's password. There is a very very small possibility that f(passA) = f(passB) and passA != (not equal) passB, but we will ignore this possibility. So, now what?

Well, the answer is salt. Well not exactly that kind of salt, but what we call a "random salt." Basically, this is a random bit-string, which or may not have some structure. So, we now have the modified password storage protocol as follows: A user registers with password pass. We compute f(pass+salt) and store (f(pass+salt), salt). When a user attempts to login, we get the password they entered login_pass and the stored salt. We compute f(login_pass+salt). We then check to see if f(login_pass+salt) == f(pass+salt). Thus, even if 2 users have the same password, the collision is not apparent due the salt. Now, they are secure, right?

Not quite yet. This depends on your choice of the function f. Let's start by discussing the symmetric key crypto option. You need to use a secure crypto scheme, such as AES, with a large enough key size. Then, we need to consider the key storage. If we store the key with the passwords file, then there is no security. If the attacker can access the password file, its safe to assume he can access the encryption key. Once, he has the encryption key, it's game over! But if you store the key safely in another location, even if the attacker gets the password file, there is still some security.

Now, we look at hash functions. You would need to use a collision-resistant hash function, which is not broken. Since the hash functions are not really keyed, there is no concern about key storage, so that makes things a bit easier. However the trade-off is the due to the fact that hash functions are designed to be computed quickly, the attacker has quite an advantage as detailed here.

Of course, the increased computational power attacks can also make it faster to break the symmetric crypto algorithm. How quickly this is don depends on known attacks for the algorithm. So, there is still a threat if your encrypted password file is compromised. Ask Gawker, they will tell you. This particular attack revealed some interesting statistics about passwords, shown here. More about these later.

For now, I have but one humble request for all admins on all servers out there: please store my, and everybody else's passwords securely. Although a lot of them do it right, some don't. This needs to change.

Tuesday, 8 February 2011

Computer Security Experts; The Doctors of the Digital Age?

So, there's a lot going on and I really need time to compose my thoughts. And by that time, more will have happened. Loop infinitely. But until the end of time or death, whichever comes first, I will try and keep up. While I do that, I have a not so significant post on a random train of thought from my brain.

So, this idea struck me while watching an episode of House MD. If you are not familiar with it, I will give you the overview: Dr. Gregory House is a genius and an ass of the highest order. He diagnoses and cures patients with conditions that have baffled and/or escaped other physicians. They use a method know as "differential diagnosis" (DDx), which is basically saying "the patient has the following symptoms, therefore they must have the following condition." While watching one of these, I realised that there is only one kind of human body. Yes, there are differences from person to person, such as eye colour, height, weight, allergies, etc, but the basic abstract framework, if you will, is the same. Arguably there are two, one for each gender, but there really aren't more than that.

Then I thought if any parallels could be drawn between computer security and medicine. Here's where I drew a blank. There were some superficial comparisons, but those were a stretch of the imagination at best. It dawned on me that the level of complexities in the systems we deal with are so high, that the human body looks like a wind-up toy in comparison. In no way do I mean to trivialise medicine, which is a very complex field in its own, but all that complexity is constrained to at most two basic frameworks. In computer science in general, there are an near infinite number of potential frameworks.

If we begin at the most basic level and just examine the hardware. Right there you have so many components to consider and several of them with potential security issues. First off, the components have to compatible. Next we need to insure that none of these components, on their own or in combination, will cause a security threat. This is easier said than done, as components from different manufacturers can behave differently and have side-effects that others don't. At this point we have so many ways we can fail, and yet we only have a box that does nothing. Zilch! Without any software to run the hardware, you just have an expensive and oversized paperweight. Which takes us to the next point software.

Even in software, we have two basic groups: operating systems (OS's) and application software. Well first you need an OS to run your computer. There is a HUGE potential location for security holes here. Every OS available has security holes, every one of them. Yes, every single one, that especially includes MacOS. I am sick and tired of Mac users sanctimoniously claiming that there are no viruses from Macs! This is often swiftly followed by a comment on how Macs are more secure than Windows. I have one word for that:
NONSENSE.
Seriously, every operating system has security issues. Some have more that others, some have more critical ones than others. Now another concern is which operating system are you using? Which version of it? Which patches and updates are installed? Is there any issue arising from the hardware/software combination? These are just some of the questions you have to ask. At this point we have a computer that can switch on and let you log on and not much more.

I know you must be thinking, but when I installed my operating system it had all these programs installed already. I could play games, connect to the Internet, and so on. Yes, that is true, but the software that enabled you to do so was not part of your OS, generally speaking. It was bundled in and included with your installation media, but it is technically not part of the OS. Now we come on over to application programs. This anything you install on your computer, no matter how small or large, it all matters.

The thing with most software is it does a lot of stuff that you never see. Most of the time it's stuff you want it to do, but you really have no way of knowing. There are two scenarios here, where the software is doing what you asked, but as a side-effect has made you vulnerable to certain attacks and where the software is deliberately making you vulnerable. In either case you are vulnerable. This is assuming just one program, it gets even more fun with multiple programs. Some applications connect to each other, such as your PDF reader and your web browser. Here it becomes really fun!

It may turn out that on their own the programs pose no threat, but when combined they are potentially lethal. A sort of the reverse of salt, whose components are lethal, but the combination is not. I think you can see where this is leading to. If your head is spinning trying to imagine the countless possibilities of interaction between programs on your computer and/or that you know of, well then my job is done.

Now, I would like to point out that the same applies for smart phones. Have fun running over that one. Then consider when you connect your smart phone to your computer. This whole path leads you to a really messed up place where you are building a house of cards, using cards of different shapes and sizes. It's almost like you want it to fall down, just so you can stop building.

But enough gloom and doom, silver lining time. Here you go! Seriously, although there are a plethora of threats to your computer and its safety, if you are smart and keep your wits about you, then you should be fine.

Sunday, 6 February 2011

IPv4 and 6

Right, last post for today, making it a record 3 in a day. I have a couple of other topics to post on, but seeing as how its 2am, I'm going with the easy one. So what is IPv4? Glad you asked!

If you are reading this, you are connected to the Internet. But what you may not have ever given any thought as to how this is possible. You may know that you plug the LAN cable from the router into your Ethernet port or connect to your wireless network, but what really goes on?

The Internet (capital 'I') came out of the concept of internets (small 'i'), which is a network of networks. So how do you know who is who and who is on which network? Simple assign them all a unique identifier. Then the question arises as to how you do that, because we now have the problem that everybody needs to understand how these identifiers work, so we need a common language, if you will.

And thus was born the Internet Protocol version 4, aka IPv4. Defined in RFC 791, the Internet Protocol is how all devices connected to the Internet identify each other and communicate to each other. So, what's the problem?

IPv4 address are 32-bit address broken into 4 8-bit groups called octates. Now this means there is a finite number of these addresses, approximately 4 billion, which we are now out of. Certain ranges of addresses are restricted for specific purposes, but it is just a small portion. As of February 3rd, all IPv4 addresses had been assigned by the Internet Assigned Numbers Authority (IANA) to the Regional Internet Registries (RIRs), when they assigned the last 5 remaining /8 blocks, which is a set of addresses with the first octate fixed.

This is a real problem, as at some point, new devices will not be able to connect to the Internet. There does exist a solution: IPv6, as defined in RFC 2460 & RFC 2373. Version 6 addresses are 128-bits, broken into 8 hexates or 16-bit groups, as compared to the 32-bit version 4 addresses. This gives us a hugely greater number of addresses and would solve this problem of address exhaustion. Well, there is a small catch.

Despite IPv4 essentially having being exhausted, IPv6 is still not implemented fully. So we are currently in a weird transition period where things are a bit muddled. Almost everybody has implemented support for both IPv4 and IPv6, but there is no strict adherence to IPv6.

My issue with this is that the existence of a parallel legacy system has almost always created some kind of security threat that the new system can not deal with. To my knowledge, there is now such security loophole in IP, yet. I'm sure somebody, somewhere will find something and start exploiting it. It may not be a big hole, but it will probably be there. I would truly be happy to be proven wrong and hopefully we will transition over to IPv6 without incident.

On that note, that is all from me today. I will try and get as many of the latest stories, but they seem to be cropping up faster than I can handle.

*******EDIT*********
So, I may have said I will be catching up on stuff, which I totally am! But, also this came to my attention.

Saturday, 5 February 2011

Even more Wikileaks

Next order So, the latest nomination for the Nobel Peace prize: Wikileaks! I really thought that nobody could top this. Well done on surprising me World. I disagreed with last year's award, but that's more personal opinion than anything else, but this is ludicrous! I mean at this rate soon I will be nominated for the Nobel Prize in Literature for writing this blog!

Egypt. Let's start there.

So, there's a lot happening right now. Looks like I'm going to have to blog in overdrive, which probably means these posts wont be great, so apologies in advance. First order of business: Egypt. Unless you live in a bubble, or perhaps The Bubble (totally should have gotten a second series) then you will know of the problems in Egypt. Here I'm going to say that the politics of the situation is irrelevant to my blog post, so not even going to go there. Right now on the situation of interest: the Internet!

So, amidst protests and unrest and in general everything not being so hunky dory, the Egyptian government flipped the switch on the Internet. The whole country was sans Internet. There was massive outrage both inside and outside Egypt. And here's my issue with that: Deal with it!

People everywhere have become to accustomed to the Internet being there and available. Everybody is so hooked into the Internet that they just assume it will always be there, that it is a right they have. Access to the Internet is a privilege, the most abused privilege in the world. At some level I guess I sort of believe it to be a right as well, but I am aware that it could be taken away at any point.

So, now a bit of background from recent history: The Iranian "Twitter revolution." The basic gist is that all protests and opposition and so on where coordinated via Twitter, Facebook and other social media sites. Although the Iranian government was able to take down classical telecommunications, the Internet provided a channel for communication. It seems the Egyptian government took this lesson to heart and flipped that switch as well.

The problem with allowing people to broadcast their agenda and/or grievances over Facebook and Twitter is that it tends to skew public opinion one way or the other, generally not in favour of the Government. It seems that anti-government sentiments are more quickly adopted than pro-government sentiments, at least to my observation. This seems to be irrelevant of how valid the grievances are. So in a bid to avoid this, the Egyptian government shut off the Internet.

However, the Internet is a resilient little bastard. There were some very novel ways conjured up to access the Internet. There were several other ways which were fun and creative, but they carried on accessing the Internet and rallying support for their cause. This further compounded the outrage against the "oppressive regime."

Note the quotation marks, as I would like to disagree. The circumstances of the event were not exactly normal. There was some civil unrest, there were protests and there was a curfew in place. The government acted in a way which they believed would not cause the opinions of the rest of the world to be slanted to far against them and they failed.

There have been other countries that have considered having an Internet "kill-switch" (I am oddly undecided if I love or hate that), most notably USA. I think one of my favourite things to come out of this is this post, where an anonymous poster says Australia has a backbone for stating they will not have an IKS (just wanted to try out that acronym, don't think it will stick.) I would strongly disagree and say it's the opposite.

As I have stated before, here and here, the Internet has too little regulation and something like this would bring in some level of regulation. It is not all that's needed and is not necessarily the best way to do it, but it's a start. I would not like it if my Internet was shut off by the government, but I would understand. What we need to understand is that the government has allowed you to access the Internet and they can take that away. They could just as easily ban cars or motorbikes or potatoes, which would probably be meet with the same response. Like I said, somewhere down the line the government said "Yes, we should allow the Internet" and then they gave licenses to ISPs and so on and so forth and you were connected. They can revoke that at any point in time, if they chose to.

Looking at this without the political dimension just points out another case where technology has grown too fast for legislators to keep up, through no fault of their own. Even experts are constantly learning new things and having to keep up. At some point, I hope, there will be some catching up done and we will all be better off for it. Until then, well, things will go on as before.