Analyzing Algorithms Growth of Functions

Transcription

Analyzing Algorithms Growth of Functions
CSE 5350/7350
Introduction to Algorithms
Analyzing Algorithms
Growth of Functions
(Chapters 1 & 2)
Mihaela Iridon, Ph.D.
[email protected]
CSE 5350 - Fall 2007
Analyzing Algorithms
Growth of Functions
Slide 1
What means analyzing algorithms?
• Predicting the required resources
• What do we measure?
–
–
–
–
Computational time
Memory
Communication bandwidth
Other
• Model constructs:
–
–
–
–
Technology
Resources (hardware & software)
Associated costs
Assumptions (1 processor, RAM-model: sequential operations)
CSE 5350 - Fall 2007
Analyzing Algorithms
Growth of Functions
Slide 2
Tools used for algorithm analysis
• Mathematical tools:
– Discrete combinatorics
– Probability theory
– Ability to single out the predominant
operations (most significant terms in a
formula)
• Modeling and simulation tools
• Software utilities, benchmarking
programs
CSE 5350 - Fall 2007
Analyzing Algorithms
Growth of Functions
Slide 3
Terminology
• Input Size
– In general the time to execute a set of
operations is dependent on the size of the input
– Depends on the problem definition
– = the number of items in the input
– Could be more than one number (e.g. a graph)
• Running Time
– = the number of primitive operations (steps)
executed
– Machine-independent term
CSE 5350 - Fall 2007
Analyzing Algorithms
Growth of Functions
Slide 4
Example (1) – Insertion Sort (In Place)
/// <summary>
/// Sorts the input list of integers by using the Insertion Sorting algorithm
/// (see Cormen textbook, Chapter 1.1)
/// </summary>
/// <param name="input">Input list of integers (to be sorted) – input list will be modified</param>
public static void InsertionSort(List<int> input)
{
if (input == null) return; //NULL input
if (input.Count < 2) return; //one-element array; nothing to sort
int i=0, key=0;
for (int j = 1; j < input.Count; j++)
{
key = input[j];
}
}
Cost
C1
C2
//insert input[j] into the sorted sequence input[0..j-1] 0
i = j-1;
C4
while (i > -1 && input[i] > key)
C5
{
input[i+1] = input[i];
C6
i--;
C7
}
input[i+1] = key;
C8
CSE 5350 - Fall 2007
Analyzing Algorithms
Growth of Functions
# of Times executed
n
n-1
n-1
n-1
Σ(j=1..n-1)tj
n 1
t
j 1
Σ(j=1..n-1)(tj – 1)
Σ(j=1..n-1)(tj – 1)
n-1
Slide 5
j
Example (1) – Insertion Sort (In Place)
• The running time depends on the input
value
– If the input is already sorted then the body of the while loop
does not execute and the best case scenario/running time for
insertion sort is:
T(n) = c1 n + c2 (n-1) + c4 (n-1) + c5 (n-1) + c8 (n-1)
= (c1 + c2 + c4 + c5 + c8) * n – (c2 + c4 + c5 + c8)
= a * n + b  linear function of n
– If the input is in reverse sorted order: (worst case scenario)
T(n) = c1 n + c2 (n-1) + c4 (n-1) + c5 (n(n+1)/2 - 1) +
c6 [n(n-1)/2] + c7 [n(n-1)/2] + c8 (n-1)
= a * n2 + b * n + c  quadratic function of n
CSE 5350 - Fall 2007
Analyzing Algorithms
Growth of Functions
Slide 6
Worst-case & Average-case Analysis
• Worst-case running time:
– The longest running time for any input of size n
(i.e. the longest path in the execution)
– Upper bound on the running time for any input
– Occurs fairly often
– The average-case ~ the worst-case
• Average-case running time:
– Difficult to define what average input means
– Example for Insertion Sort: On average, half the
elements in an array A1 ... Aj-1 are less than an element
Aj, and half are greater.
CSE 5350 - Fall 2007
Analyzing Algorithms
Growth of Functions
Slide 7
Order/Rate of Growth
• Simplifying abstraction
• Consider only the leading term
• Ignore the leading term’s constant
coefficient
• Worst-case running time for Insertion Sort
is Θ(n2) (theta of n-squared)
• An algorithm with Θ(n2) will run faster
than one with Θ(n3)
CSE 5350 - Fall 2007
Analyzing Algorithms
Growth of Functions
Slide 8
Divide-and-Conquer Algorithms
• Incremental approach (e.g. Insertion Sort)
• Divide-and-conquer approach (e.g.
recursive algorithms such as Merge Sort)
– [Divide] Break the problem into related or
similar sub-problems of smaller size;
– [Conquer] Solve the sub-problems
– [Combine] Combine the solutions
CSE 5350 - Fall 2007
Analyzing Algorithms
Growth of Functions
Slide 9
Analyzing divide-and-conquer algorithms
• Recursive approach  use recurrence equation
(recurrence) to describe the running time
• T(n) = running time on a problem of size n
• If n= small (n <= c) then Θ(1)
• Otherwise, divide problem in a sub-problems,
each of size n / b
• D(n) = time to divide
• C(n) = time to combine
• T(n) = a*T(n/b) + D(n) + C(n) (when n > c)
CSE 5350 - Fall 2007
Analyzing Algorithms
Growth of Functions
Slide 10
Merge Sort (1)
public static void MergeSort2(List<int> input, int startIx, int endIx)
{
if (input == null) return;
if (startIx == endIx) return; //stop condition
int middle = (int) Math.Floor((endIx + startIx) / 2.0);
MergeSort2(input, startIx, middle);
MergeSort2(input, middle + 1, endIx);
Combine2(input, startIx, middle, endIx);
}
CSE 5350 - Fall 2007
Analyzing Algorithms
Growth of Functions
Slide 11
Merge Sort (2)
public static void Combine2(List<int> input, int i1, int i2, int i3)
{
if (input == null || input.Count == 0) return;
List<int> result = new List<int>(i3 - i1 + 1); //not 100% in-place
int ix1 = i1, ix2 = i2+1;
while (result.Count < i3 - i1 + 1)
{
while (ix1 < i2 + 1 && (ix2 == (i3 + 1) || input[ix1] < input[ix2]))
result.Add(input[ix1++]);
while (ix2 < i3 + 1 && (ix1 == (i2 + 1) || input[ix1] > input[ix2]))
result.Add(input[ix2++]);
}
for (int j = i1; j <= i3; j++)
input[j] = result[j - i1];
}
CSE 5350 - Fall 2007
Analyzing Algorithms
Growth of Functions
Slide 12
Analyzing Merge Sort
• [Divide]
• [Conquer]
• [Combine]
D(n) = Θ(1)
2 * T(n/2)
C(n) = Θ(n)
• T(n) =
{
Θ(1)
if n=1
2T(n/2) + Θ(n) if n≥1
= Θ(n * log2 n)  better than Insertion Sort
CSE 5350 - Fall 2007
Analyzing Algorithms
Growth of Functions
Slide 13
Growth of Functions
• Algorithms efficiency
• Compare relative performance of
alternative algorithms
• Analysis for large input size: e.g. n ∞
• Asymptotic efficiency of algorithms:
– Input size is large enough to make only the
order of growth of the running time relevant
– How the running time increases with the size
of the input in the limit, as the size of the input
increases without bound
CSE 5350 - Fall 2007
Analyzing Algorithms
Growth of Functions
Slide 14
Asymptotic Notation
• Notations used to describe the asymptotic
running time of an algorithm
• Are defined in terms of functions whose
domains are the set N = {0, 1, 2, …}
• Convenient for describing the worst-case
running-time function T(n)
• Notation  abused vs. misused
CSE 5350 - Fall 2007
Analyzing Algorithms
Growth of Functions
Slide 15
Θ-notation
• Θ: asymptotically bounds a function from above
and below
( g (n))  { f (n) :  c1 , c2 , n0  0
0  c1 g (n)  f (n)  c2 g (n)  n  n0 }
f(n) = Θ(g(n)) indicates f(n)  Θ(g(n))
or
g(n) is an asymptotically tight bound for f(n)
CSE 5350 - Fall 2007
Analyzing Algorithms
Growth of Functions
Slide 16
O-notation
• O: asymptotic upper bound
O( g (n))  { f (n) :  c, n0  0
0  f (n)  cg (n)  n  n0 }
f(n) = Θ(g(n))  f(n) = O(g(n))
• The Θ-notation is stronger than the O-notation
• Example: n = O(n2)
• E.g.: worst-case for insertion sort = O(n2)
• Notation abuse: The running time of insertion sort is O(n2).
– The running time depends on the particular input of size n.
– It is true only for the worst-case scenario (i.e. no matter what
particular input of size n is chosen for each value of n)
CSE 5350 - Fall 2007
Analyzing Algorithms
Growth of Functions
Slide 17
Ω-notation
• Ω: asymptotic lower bound
( g (n))  { f (n) :  c, n0  0
0  cg (n)  f (n)  n  n0 }
• Theorem
 f(n) and g(n), f(n) = Θ(g(n)) iff
f(n) = O(g(n)) and f(n) = Ω(g(n))
• Used to bound the best-case running time
• E.g.: best-case for insertion sort is Ω(n)
• Worst-case for insertion sort is Ω(n2)
CSE 5350 - Fall 2007
Analyzing Algorithms
Growth of Functions
Slide 18
Graphical Comparison
CSE 5350 - Fall 2007
Analyzing Algorithms
Growth of Functions
Slide 19
o-notation
• Upper bound that is not asymptotically
tight
• 2n2 = O(n2) is asymptotically tight
• 2n = O(n2) is not asymptotically tight
o( g (n))  { f (n) :  c  0, n0  0
0  f (n)  cg (n)  n  n0 }
• 2n = o(n2), but 2n2  o(n2)
f ( n)
lim
0
n  g ( n)
CSE 5350 - Fall 2007
Analyzing Algorithms
Growth of Functions
Slide 20
-notation
• Lower bound that is not asymptotically
tight
• f(n)  (g(n)) iff g(n)  o(f(n))
 ( g (n))  { f (n) :  c  0, n0  0
0  cg (n)  f (n)  n  n0 }
• E.g.: n2/2 = (n), but n2/2 = (n2)
lim
n 
CSE 5350 - Fall 2007
f ( n)

g ( n)
Analyzing Algorithms
Growth of Functions
Slide 21
Comparison of Functions
• Transitivity (for all five notations):
If f(n) = X(g(n)) and g(n) = X(h(n))  f(n) = X(h(n))
• Reflexivity (for the big-X notations)
f(n) = X(f(n))
• Symmetry:
f(n) = Θ(g(n)) iff g(n) = Θ(f(n))
• Transpose symmetry:
f(n) = O(g(n)) iff g(n) = Ω(n)
f(n) = o(g(n)) iff g(n) = (n)
CSE 5350 - Fall 2007
Analyzing Algorithms
Growth of Functions
Slide 22
Analyzing Algorithms Addendum
•Finding the largest clique in a graph
•Parsing an object model
CSE 5350 - Fall 2007
Analyzing Algorithms
Growth of Functions
Slide 23
Finding the largest clique
• Graph: G(V,E)
• A graph or undirected graph G is an
ordered pair G: = (V,E) that is subject to
the following conditions:
– V is a set, whose elements are called vertices or nodes,
– E is a set of pairs (unordered) of distinct vertices, called
edges or lines.
• A clique in an undirected graph G is a set
of vertices V such that for every two
vertices in V, there exists an edge
connecting the two. (Complete sub-graph)
CSE 5350 - Fall 2007
Analyzing Algorithms
Growth of Functions
Slide 24
The Clique Problem
• determining whether a graph contains a
clique of at least a given size k.
• Verification of actual clique : trivial
• The clique problem is in NP (nondeterministic polynomial time).
• NP-complete
• The corresponding optimization problem,
the maximum clique problem, is to find
the largest clique in a graph.
CSE 5350 - Fall 2007
Analyzing Algorithms
Growth of Functions
Slide 25
Brute force approach to the clique
problem
• Examine each sub-graph with at least k vertices
and check to see if it forms a clique
• Number of cases to inspect:
• A clique C=(vi1, vi2, .., vin) exists only when its n
sub-cliques each of size n-1 exist.
• Event-raising mechanism to increment a counter
of sub-cliques using the threshold graph
• Space required to build all cliques until
G=completely connected:  n    n    n   ..   n   2n  n  1
 2  3  4
     
CSE 5350 - Fall 2007
Analyzing Algorithms
Growth of Functions
n
 
Slide 26
Sorting an object model
• Input:
– A collection of property paths
• Output:
– A sorted collection of property paths
CSE 5350 - Fall 2007
Analyzing Algorithms
Growth of Functions
Slide 27
Input Sample (excerpt)
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
Company.Principals[0].References[0].ResidentialInfos[0].Address.AddressType
Company.Principals[0].References[0].ResidentialInfos[1].Address.AddressType
Company.Principals[0].References[1].ResidentialInfos[0].Address.AddressType
Company.Principals[0].References[1].ResidentialInfos[1].Address.AddressType
Company.Principals[1].References[0].ResidentialInfos[0].Address.AddressType
Company.Principals[1].References[0].ResidentialInfos[1].Address.AddressType
Company.Principals[1].References[1].ResidentialInfos[0].Address.AddressType
Company.Principals[1].References[1].ResidentialInfos[1].Address.AddressType
Company.Principals[0].References[0].ResidentialInfos[0].Address.StreetAddressLine1
Company.Principals[0].References[0].ResidentialInfos[1].Address.StreetAddressLine1
Company.Principals[0].References[1].ResidentialInfos[0].Address.StreetAddressLine1
Company.Principals[0].References[1].ResidentialInfos[1].Address.StreetAddressLine1
Company.Principals[1].References[0].ResidentialInfos[0].Address.StreetAddressLine1
Company.Principals[1].References[0].ResidentialInfos[1].Address.StreetAddressLine1
Company.Principals[1].References[1].ResidentialInfos[0].Address.StreetAddressLine1
Company.Principals[1].References[1].ResidentialInfos[1].Address.StreetAddressLine1
Company.Principals[0].References[0].ResidentialInfos[0].Address.StreetAddressLine2
Company.Principals[0].References[0].ResidentialInfos[1].Address.StreetAddressLine2
Company.Principals[0].References[1].ResidentialInfos[0].Address.StreetAddressLine2
Company.Principals[0].References[1].ResidentialInfos[1].Address.StreetAddressLine2
Company.Principals[1].References[0].ResidentialInfos[0].Address.StreetAddressLine2
Company.Principals[1].References[0].ResidentialInfos[1].Address.StreetAddressLine2
Company.Principals[1].References[1].ResidentialInfos[0].Address.StreetAddressLine2
Company.Principals[1].References[1].ResidentialInfos[1].Address.StreetAddressLine2
CSE 5350 - Fall 2007
Analyzing Algorithms
Growth of Functions
Slide 28
Object Model
Company
Principals[]
FinancialStatements[]
TradeLines[]
Contacts[]
Declarations[]
Documents[]
Addresses[]
CompanyInfo
BusinessAddress
MailingAddress
RelationshipSummary
CSE 5350 - Fall 2007
1..*
1
1
1..*
Address
AddressType
StreetAddressLine1
StreetAddressLine2
City
County
State
Zipcode
Country
Analyzing Algorithms
Growth of Functions
Principal
EmploymentInfos[]
IncomeInfos[]
TradeLines[]
References[]
Assets[]
IdentificationInfos[]
ResidentialInfos[]
ContactInfo
PersonalInfo
PrincipalType
EmployedByCompany
YearsAsOwner
IndividualOrJointType
PersonType
Slide 29
Output sample (sorted object model)
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
Company.Principals[0].EmploymentInfos[0].EmployerAddress.AddressType
Company.Principals[0].EmploymentInfos[0].EmployerAddress.StreetAddressLine1
Company.Principals[0].EmploymentInfos[0].EmployerAddress.StreetAddressLine2
Company.Principals[0].EmploymentInfos[0].EmployerAddress.City
Company.Principals[0].EmploymentInfos[0].EmployerAddress.County
Company.Principals[0].EmploymentInfos[0].EmployerAddress.State
Company.Principals[0].EmploymentInfos[0].EmployerAddress.ZipCode
Company.Principals[0].EmploymentInfos[0].EmployerAddress.Country
Company.Principals[0].EmploymentInfos[0].EmployerName
Company.Principals[0].EmploymentInfos[0].EmploymentType
Company.Principals[0].EmploymentInfos[0].OccupationType
Company.Principals[0].EmploymentInfos[0].YearsOfEmployment
Company.Principals[0].EmploymentInfos[0].MonthsOfEmployment
Company.Principals[0].EmploymentInfos[0].Title
Company.Principals[0].EmploymentInfos[0].Department
Company.Principals[0].EmploymentInfos[1].EmployerAddress.AddressType
Company.Principals[0].EmploymentInfos[1].EmployerAddress.StreetAddressLine1
Company.Principals[0].EmploymentInfos[1].EmployerAddress.StreetAddressLine2
Company.Principals[0].EmploymentInfos[1].EmployerAddress.City
Company.Principals[0].EmploymentInfos[1].EmployerAddress.County
Company.Principals[0].EmploymentInfos[1].EmployerAddress.State
Company.Principals[0].EmploymentInfos[1].EmployerAddress.ZipCode
Company.Principals[0].EmploymentInfos[1].EmployerAddress.Country
Company.Principals[0].EmploymentInfos[1].EmployerName
CSE 5350 - Fall 2007
Analyzing Algorithms
Growth of Functions
Slide 30
Data Graph
ROOT
ApplicationNumber
ConversationLogs
Comment
DateTimeStamp
IncludeInUDR
Company
Principals
FinancialStatements
EmploymentInfos
IncomeInfos
TradeLines
CSE 5350 - Fall 2007
Analyzing Algorithms
Growth of Functions
Slide 31
Data Structures
Node
Text : string
Count : int
CrtIndex : int
IsLast : bool
NextNode : Node
RightNode : Node
CSE 5350 - Fall 2007
Analyzing Algorithms
Growth of Functions
Slide 32