Sunday 20 February 2011

Just a litlle bit more on N vs. NP: Computational Complexity

So, after having a conversation with my friend Daniel (who will at some time help me redesign this blog), I realised that I had made one very big, and fairly erroneous,  assumption with regards to P vs. NP: People know about computational complexity analysis. So, now I will correct that and explain computational complexity.

So, every computer program is basically a set of steps, which is called an algorithm. When designing any software, you first design the algorithm and then implement it. There are several reasons for this, the one of interest to us is complexity analysis. This is basically counting how many steps the program will take and how many CPU cycles are needed. The basic rule of thumb is that each addition, subtraction, multiplication, division, memory lookup and assignment take 1 CPU cycle.

Of course, this is just a heuristic measure and the actual time taken will depend on the actual hardware, but it gives us a fair idea. Another advantage is that it allows us to compare two algorithms for solving the same and find out which one is more efficient. Just to illustrate this, I will demonstrate one of the most classic examples: Adding up numbers from 1 to n.

The simplest way to do that is to have a counter and add each number to it in sequence. The pseudocode algorithm is below. Anything in green is a comment and is simply to explain what is happening.

int counter = 0; //Creates a counter, with an intial value of 0
int n = getUserInput(); //Asks the user to enter a number
for i = 1 to n //Repeat the following steps with i having a value from 1 to n
counter += i; //Add i to the counter
next i //Increase the value of i by 1 and repeat above
output counter //Output the result

If we count the number of steps, we see we need 2 steps for initialisation (lines 1&2) and n additions and 1output. This gives a total of n+3 steps. Now we look at another way of doing this, by using this formula. In pseudocode, it would look like this:

int n = getUserInput(); //Asks the user to enter a number
int counter = (n*(n+1))/2; //Creates a counter, and assigns it the value of the sum
output counter //Output the result 


Now, if we count the steps, we get 1 for initialisation, 1addition, 1 multiplication, 1 division and 1 step for output. This gives us a grand total of 5 steps. Comparing that with the previous algorithm, we see that this is more efficient and does not depend on the actual input.

Most programs are more complex than this and have more complicated expressions for their running time. For these we use big-O notation. This gives us an approximate estimate of the effective running time. It also give us very nice and neat complexity classes. Looking at the example above, algorithm 1 has a running time linear in the input size, O(n), while algorithm 2 has constant running time, O(1). We also have logarithmic time O(log n), polynomial time O(n^c), exponentital time, O(c^n) and so on and so forth. Which, quite neatly brings us to the main problem: P vs. NP.

All problems in computing have a solution,  but not all are efficient. Some solutions would takes minutes, others hours and some would take 100s of years to compute. Again, this all depends on your hardware, so actual time measures are not exactly useful. And, this is why we have P and NP.

All the problems that have a solution using an algorithm with time complexity that is polynomial, or less, are classed as P type problems. These can be solved in an acceptable amount of time, e.g. a couple of days. Any thing that takes more than polynomial time is classed as NP. The trick is that even though for small enough instances of NP problems, you can find a solution efficiently, it is the scaling that matters.

So, thus we can see how the above means this post makes a little bit more sense, I hope.

No comments:

Post a Comment