Introduction

One of the - confusing - topics in Algorithm design is the concept of NP completeness. If you search the topic on the internet you will probably find tons of articles and lectures on the subject however in this short article I will summarize it so that it is easy to remember by the average student or software engineer.

It is all about Running Time

Computational complexity analysis refers to the study of computer algorithms in terms of efficiency. Given a problem of some input size, we need to know how fast or slow it runs as the input size grows to large values. Consider the following examples:

Looking an item up in a perfect hash table takes a constant time, theoretically speaking, no matter how big the hash table is, the item can be found instantly (assuming it exists), the algorithm is said to have a running time of O(1). This is indeed very fast.
The running time to look up an item in a sorted binary search tree of (n) elements is O(Log(n)) for example if the number of items is (32) then the number of comparisons needed to find an element is (5) in the worst case. As you can see this is a fast algorithm
Searching for an element in an unsorted array runs in linear time O(n). If the array contains 100 elements then you need to make 100 comparisons to find the element in the worst case. This means, the algorithm will slow down linearly as the input size grows. To be exact, the running time linearly grows as the input size becomes larger.

Polynomial versus Exponential

Algorithms with a running time function of the form O(n^k) where (k) is a constant are said to run (or solved) in polynomial time. Theoretically speaking, they are fast even if (k) is large. On the other hand, algorithms with a running time of the form O(k^n) where (k) is a constant are said to run in exponential time. These algorithms are very slow. Actually it may take a computer years to finish running the algorithm.

P, NP, NP-Complete, NP-Hard

Now we know what running time means, based on that we can classify problems into classes depending on how complex they are. I will be using plain English as opposed to mathematical terms in order to make it easy to understand.

P Problem

P means that there exists an algorithm to solve the problem which runs in polynomial time. If you are still not sure what running time or polynomial means then please read the introduction one more time.

NP Problem

NP problem means that there exists an algorithm to verify a given solution is correct in polynomial time. Note that we are talking about solution verification. We are not talking about the solution for the problem because no one has ever yet discovered a fast (polynomial) solution for NP problems nor proved the solution does not exist in the first place. NP does not mean None Polynomial, please do not say that in front of people because it is embarrassing, NP stands for non deterministic which means the problem can be solved in polynomial time BUT using none deterministic machine (of course it is not the regular computer we use every day, this computer is in the mind of some weird computer scientists. If you are interested you can research it on your own. Try to search for Turing Machine)

NP-Hard Problem

Now we know what NP means, NP-Hard problem is AT LEAST as hard as any NP problem. It could be harder, who knows. Again, when thinking about problem difficulty we are still referring to the running time of the algorithm. A harder problem takes more time to finish running.

Reduction

It feels bad when you try to solve a problem and it turns to be very hard to the extent that many people have tried it without success. In order to claim that the problem cannot be solved by many people then you need to prove that it is equivalent to some known hard problem. Reduction refers to transforming a problem to another well known (hard) problem so that people won’t dare to call you stupid. This conversion process should run in polynomial time and can be used to prove a problem is NP-Complete as we will indicate later.

NP-Complete Problem

NP-Complete problem is both NP and NP-Hard. You know what NP and NP-Hard means, so NP-Complete means a problem that is easy to verify a given solution and every problem in NP can be reduced to our problem. The hard part is proving the first example of an NP-complete problem but our friend Steve Cook did that in the 1970s. Thanks to him because he did a great job so that we are not called stupid. In few words, in order to prove that a given problem is NP-Complete then

Show it is NP: given a solution, show that it can be verified in polynomial time
Show it is NP-Hard: pick an already known NP-Complete problem and show that you can reduce it back to our problem in polynomial time

What is the big deal about the question P = NP?

If P is the same as NP then this is good news, there are many interesting real life problems that could be solved very quickly.
It is going to be really embarrassing for the folks working in computer science field because it is believed (I am assuming someone got the $1000000 prize) for long time by many people that they are not the same.
In reality many believe they are not the same because many people knocked their heads against the wall for so long without any success and because crazy scientists need a proof a solution does not exist, it is going to be an open question.

Class NP

The class of decision problems for which there is a polynomially bounded nondeterministic algorithm.

The class of problems that are "slightly harder" than P.

In general, an exponential number of subproblems, each of which can be solved or verified in polynomial time.

Being in class NP constitutes an upper bound on problem complexity.

Nondeterministic Turing Machine (NDTM)

Model a problem as a tree in which the nodes represent states, with time flowing from the root to the children. A leaf represents a solution.

In an NDTM, the computation splits whenever a guess is needed.

The tree has polynomial depth but is exponentially bushy.

"Guess" nodes are like OR nodes: if either of the children returns T, the "guess" node returns T.

NP-hard

A problem Q is NP-hard if every problem in NP is reducible to Q.

A lower bound on a problem: "at least as hard as any problem in NP."

NP-Complete

The hardest decision problems in NP.

If there were a polynomially bounded algorithm for an NP-complete problem, then there would be a polynomially bounded algorithm for each problem in NP.

Establishes both a lower and an upper bound (although it may be that P=NP).

If any NP-complete problem is in P, then P=NP.

Satisfiability (SAT)

Instance: Given a set U = {u₁, ..., u_m} of variables and a collection C = {c₁, ..., c_n} of clauses over U.

Question: Is there a truth assignment to U that satisfies C?

A logical formula in conjunctive normal form (CNF): a conjunction of disjunctions
e.g., (x₁ + x₂ + ¬x₃) · (x₁ + x₃) · (¬x₂)

easy to falsify: make all terms in one clause fail.

hard to satisfy: terms in clauses interact.

Cook's Theorem: SAT is NP-complete.

uses SAT to simulate a machine

Once the primordial NP-complete problem is found, we can prove other problems NP-complete without reference to machines:

To prove that problem Q is NP-complete, reduce SAT (or any other NP-complete problem) to Q:
SAT <=_P Q

E.g., 3SAT: like SAT, but | c_i | = 3 for 1 <= i <= n.

Need to show that 3SAT is NP-hard:
Since X <=_P SAT for any problem X in NP, we need only show SAT <=_P 3SAT.

Proof:
General form of clauses in SAT is c_j = (z₁ + z₂ + ... + z_k), where each z_i is either an input or the negation of an input.

k = 1

{z₁ + z₁ + z₁}

k = 2

{z₁ + z₂ + z₂}

k = 3

{z₁ + z₂ + z₃}

k >= 4

Add auxiliary variables that transmit information between pieces.
E.g., {z₁ + z₂ + z₃ + z₄ + z₅} becomes {(z₁ + z₂ + y₁) (¬y₁ + z₃ + y₂) (¬y₂ + z₄ + z₅)}
The transformation is polynomial in the length of the SAT instance.

Class NP

The class of decision problems for which there is a polynomially bounded nondeterministic algorithm.

The class of problems that are "slightly harder" than P.

In general, an exponential number of subproblems, each of which can be solved or verified in polynomial time.

Being in class NP constitutes an upper bound on problem complexity.

Nondeterministic Turing Machine (NDTM)

Model a problem as a tree in which the nodes represent states, with time flowing from the root to the children. A leaf represents a solution.

In an NDTM, the computation splits whenever a guess is needed.

The tree has polynomial depth but is exponentially bushy.

"Guess" nodes are like OR nodes: if either of the children returns T, the "guess" node returns T.

NP-hard

A problem Q is NP-hard if every problem in NP is reducible to Q.

A lower bound on a problem: "at least as hard as any problem in NP."

NP-Complete

The hardest decision problems in NP.

If there were a polynomially bounded algorithm for an NP-complete problem, then there would be a polynomially bounded algorithm for each problem in NP.

Establishes both a lower and an upper bound (although it may be that P=NP).

If any NP-complete problem is in P, then P=NP.

Satisfiability (SAT)

Instance: Given a set U = {u₁, ..., u_m} of variables and a collection C = {c₁, ..., c_n} of clauses over U.

Question: Is there a truth assignment to U that satisfies C?

A logical formula in conjunctive normal form (CNF): a conjunction of disjunctions
e.g., (x₁ + x₂ + ¬x₃) · (x₁ + x₃) · (¬x₂)

easy to falsify: make all terms in one clause fail.

hard to satisfy: terms in clauses interact.

Cook's Theorem: SAT is NP-complete.

uses SAT to simulate a machine

Once the primordial NP-complete problem is found, we can prove other problems NP-complete without reference to machines:

To prove that problem Q is NP-complete, reduce SAT (or any other NP-complete problem) to Q:
SAT <=_P Q

E.g., 3SAT: like SAT, but | c_i | = 3 for 1 <= i <= n.

Need to show that 3SAT is NP-hard:
Since X <=_P SAT for any problem X in NP, we need only show SAT <=_P 3SAT.

Proof:
General form of clauses in SAT is c_j = (z₁ + z₂ + ... + z_k), where each z_i is either an input or the negation of an input.

k = 1

{z₁ + z₁ + z₁}

k = 2

{z₁ + z₂ + z₂}

k = 3

{z₁ + z₂ + z₃}

k >= 4

Add auxiliary variables that transmit information between pieces.
E.g., {z₁ + z₂ + z₃ + z₄ + z₅} becomes {(z₁ + z₂ + y₁) (¬y₁ + z₃ + y₂) (¬y₂ + z₄ + z₅)}
The transformation is polynomial in the length of the SAT instance.

Nondeterministic Turing Machine (NDTM)

Model a problem as a tree in which the nodes represent states, with time flowing from the root to the children. A leaf represents a solution.

In an NDTM, the computation splits whenever a guess is needed.

The tree has polynomial depth but is exponentially bushy.

"Guess" nodes are like OR nodes: if either of the children returns T, the "guess" node returns T.

NP-hard

A problem Q is NP-hard if every problem in NP is reducible to Q.

A lower bound on a problem: "at least as hard as any problem in NP."

NP-Complete

The hardest decision problems in NP.

If there were a polynomially bounded algorithm for an NP-complete problem, then there would be a polynomially bounded algorithm for each problem in NP.

Establishes both a lower and an upper bound (although it may be that P=NP).

If any NP-complete problem is in P, then P=NP.

Satisfiability (SAT)

Instance: Given a set U = {u₁, ..., u_m} of variables and a collection C = {c₁, ..., c_n} of clauses over U.

Question: Is there a truth assignment to U that satisfies C?

A logical formula in conjunctive normal form (CNF): a conjunction of disjunctions
e.g., (x₁ + x₂ + ¬x₃) · (x₁ + x₃) · (¬x₂)

easy to falsify: make all terms in one clause fail.

hard to satisfy: terms in clauses interact.

Cook's Theorem: SAT is NP-complete.

uses SAT to simulate a machine

Once the primordial NP-complete problem is found, we can prove other problems NP-complete without reference to machines:

To prove that problem Q is NP-complete, reduce SAT (or any other NP-complete problem) to Q:
SAT <=_P Q

E.g., 3SAT: like SAT, but | c_i | = 3 for 1 <= i <= n.

Need to show that 3SAT is NP-hard:
Since X <=_P SAT for any problem X in NP, we need only show SAT <=_P 3SAT.

Proof:
General form of clauses in SAT is c_j = (z₁ + z₂ + ... + z_k), where each z_i is either an input or the negation of an input.

k = 1

{z₁ + z₁ + z₁}

k = 2

{z₁ + z₂ + z₂}

k = 3

{z₁ + z₂ + z₃}

k >= 4

Add auxiliary variables that transmit information between pieces.
E.g., {z₁ + z₂ + z₃ + z₄ + z₅} becomes {(z₁ + z₂ + y₁) (¬y₁ + z₃ + y₂) (¬y₂ + z₄ + z₅)}
The transformation is polynomial in the length of the SAT instance.

Class NP

The class of decision problems for which there is a polynomially bounded nondeterministic algorithm.

The class of problems that are "slightly harder" than P.

In general, an exponential number of subproblems, each of which can be solved or verified in polynomial time.

Being in class NP constitutes an upper bound on problem complexity.

Nondeterministic Turing Machine (NDTM)

Model a problem as a tree in which the nodes represent states, with time flowing from the root to the children. A leaf represents a solution.

In an NDTM, the computation splits whenever a guess is needed.

The tree has polynomial depth but is exponentially bushy.

"Guess" nodes are like OR nodes: if either of the children returns T, the "guess" node returns T.

NP-hard

A problem Q is NP-hard if every problem in NP is reducible to Q.

A lower bound on a problem: "at least as hard as any problem in NP."

NP-Complete

The hardest decision problems in NP.

If there were a polynomially bounded algorithm for an NP-complete problem, then there would be a polynomially bounded algorithm for each problem in NP.

Establishes both a lower and an upper bound (although it may be that P=NP).

If any NP-complete problem is in P, then P=NP.

Satisfiability (SAT)

Instance: Given a set U = {u₁, ..., u_m} of variables and a collection C = {c₁, ..., c_n} of clauses over U.

Question: Is there a truth assignment to U that satisfies C?

A logical formula in conjunctive normal form (CNF): a conjunction of disjunctions
e.g., (x₁ + x₂ + ¬x₃) · (x₁ + x₃) · (¬x₂)

easy to falsify: make all terms in one clause fail.

hard to satisfy: terms in clauses interact.

Cook's Theorem: SAT is NP-complete.

uses SAT to simulate a machine

Once the primordial NP-complete problem is found, we can prove other problems NP-complete without reference to machines:

To prove that problem Q is NP-complete, reduce SAT (or any other NP-complete problem) to Q:
SAT <=_P Q

E.g., 3SAT: like SAT, but | c_i | = 3 for 1 <= i <= n.

Need to show that 3SAT is NP-hard:
Since X <=_P SAT for any problem X in NP, we need only show SAT <=_P 3SAT.

Proof:
General form of clauses in SAT is c_j = (z₁ + z₂ + ... + z_k), where each z_i is either an input or the negation of an input.

k = 1

{z₁ + z₁ + z₁}

k = 2

{z₁ + z₂ + z₂}

k = 3

{z₁ + z₂ + z₃}

k >= 4

Add auxiliary variables that transmit information between pieces.
E.g., {z₁ + z₂ + z₃ + z₄ + z₅} becomes {(z₁ + z₂ + y₁) (¬y₁ + z₃ + y₂) (¬y₂ + z₄ + z₅)}
The transformation is polynomial in the length of the SAT instance.

Foundations of Computer Science

Monday, 14 April 2014

NP Complete Class

Class NP

Nondeterministic Turing Machine (NDTM)

NP-hard

NP-Complete

Satisfiability (SAT)

Other NP-Complete Problems

Class NP

Nondeterministic Turing Machine (NDTM)

NP-hard

NP-Complete

Satisfiability (SAT)

Other NP-Complete Problems

Class NP

Nondeterministic Turing Machine (NDTM)

NP-hard

NP-Complete

Satisfiability (SAT)

Other NP-Complete Problems

Class NP

Nondeterministic Turing Machine (NDTM)

NP-hard

NP-Complete

Satisfiability (SAT)

Other NP-Complete Problems

No comments:

Post a Comment