1.Introduction to Data Structures:-

Transcription

1.Introduction to Data Structures:-
UNIT – I
Introduction to Data Structures:-
Ex. :-
1942
1942 A year
It may be Number,
Vehicle Number,
Book Number.
Information +context= knowledge
context1:1942 “A freedom Struggle movement ”
Context2:1942 “A Love Story”
Short Definition . :- Data structure means organization of data.
Ex.
:- Arrange Books in Book shelf
Objective :- To access book in efficient way.
Definition 1 :- Data structure is a way of organizing data that consider not only items stored
but also the relationship between them.
Objective:- Data should be retrieved in efficient & convenient way .
Definition 2 :- Data is organized in many different ways. The logical or mathematical model
of a particulars organization of data is called as data structure.
Magic square problem:Arrange 1 to 9 numbers in 3 x 3 matrix, such that sum of all rows ,columns & diagonal
elements are same.
6
15
1
7
5
2
9
15
15
8
15
3
15
4
15
15
15
Logic :- Top – left .
Types of Data Structures :-
Data structures operations :1) Traversing :- Accessing & processing each record exactly once.
2) Inserting :- Adding a new record to the data structure .
3) Deleting:- Removing a particular record from the data structure.
4) Sorting:- Arranging the data in some given order.
5) Searching:-Finding the location.
6) Merging :- Combining the record in two different sorted files into a single file.
Basic Terminology :- Elementary data organization
Data item :- Refers to a single unit of values.
Ex.:-
Ex.:- Age
Name
Fname
Mname
Lname
Collection of data are frequently organized into a hierarchy of fields ,records & files.
Additional Terminology :1)Entity :- An entity is something that has certain attributes or properties.
2)Entity set :- Entities with similar attributes form an entity set .
Ex. :- All employees from an organization.
3)Field:- Field is a single elementary unit of information representing an attribute of an
entity.
4)Record :- It is the collection of fields of a given entity.
5)File:- It is the collection of records of the entities in a given entity set.
6)Primary key :- Field k, which uniquely identify the record from a file is called as
primary key & the values K1,K2……… in such field are called as keys or key values.
The above organization of data into fields ,records & files may not be complex
enough to maintain & efficiently process certain collections of data.
Study of data structures include following three steps.
1) Logical or mathematical description of the structure.
2) Implementation of the structure on a computer.
3) Quantitative analysis of the structure ,which includes determining the amount of
memory needed to store the structure & the time required to process the structure.
Data Structures :- Data may be organized in many different ways ,the logical or
mathematical model of a particular organization of data is called as data structures.
The choice of a particular data model depends on two considerations
1) First, it must be rich enough in structure to mirror the actual relationship of the data
in the real world.
2) The structure should be simple enough that one can effectively process the data
when necessary.
Different Data Structures :Array :- It is linear data structures. Array is collection of data elements & some
data type [This structure uses contiguous memory locations.].
Let A is an array of N elements . then by using bracket Notation,
A[1],A[2],……….A[N]
The number K in A[K] is called the subscript & A[K] is called a
subscripted valiable.
[ Elements of array are stored in successive consecutive memory locators.]
Ex:- STUDENT
MARKS
Rollno.
Sub
1
Sub
2
Sub
3
Sub
4
1
50
60
70
55
2
40
20
90
75
Sub
Ram
Amit
Rohan
Rita
Sunil
50
Advantage :-  Structure is simple .
 Arrays are easy to traverse ,search & sort.
Disadvantages:- Insertion & deletion is difficult .It involves data movement.
1) Linked list:-
Amit
Info
Chetan
Link
Gita
Nitin
1) Stack:- A stack is also called as LIFO i. e. Last–in–First out structure .It is a
linear list in which whatever item inserted last is deleted first.
Ex.:- Stack of Books.
2) Queue :- It is also called as FIFO i.e. First- in- First out structure .It is a linear list
in which whatever item inserted first is deleted first.
Ex.:- A queue of people waiting for a bus.
3) Trees:- Data frequently contain a hierarchical relationship between various
elements The data structure which reflects this relationship is called as tree.
Ex:-
Soc sec
no.
Lname
Employee
Name
Fname
Address
.
Mname
Age
Street
Salary.
Dependent
s
Area
endents
City
01 Employee
02 Soc Sec No.
02 Name
03 Lname
03 Fname
03 Mname
02 Address
03 Street
03 Area
04 City
04 State
04 Zip
02 Age
02 Salary
02 Dependents
State
Zip
Ex.:- Consider Algebric expression.
(2x+y) (a-7b)3
*
Exponential
+
*
2
y
x
a
3
*
7
b
6) Graph :- Data sometimes contain a relationship between pairs of elements
which is not necessarily hierarchical in nature. For Ex. Suppose an airline files
only between the cities connected by lines in following fig. .The data structure
which reflects this type of relationship is called a graph.
Shrinagar
Delhi
Mumbai
Cheenai
Nagpur
Data structure operations:The following four operations play a major role .
1) Inserting :- Adding a new record to the structure.
2) Deleting :- Removing a record from the structure.
3) Traversing :-Accessing each record exactly once so that certain items in the
record may be processed.
4) Searching:- Finding the location of the record which satisfy the one or more
conditions.
The following two operations , which are called in special situation.
1) Sorting:- Arranging the records in some order.
2) Merging :- Combining the records in two different sorted files into a single sorted
file.
The major objective is to develop Efficient Algorithm for processing data .
Two major measures of the efficiency of an algorithm.
Time
Space
The complexity of an algorithm is the function which gives the running time and/or
space in terms of the input size.
Complexity :- Space –time tradeoffs.
Complexity:- The complexity of an algorithm is a function f(n) which measures
the time and /or Space used by an algorithm in terms of the input size n.
The space- time tradeoff refers to a choice between algorithmic
solutions of a data processing problem that allows one to decrease the running
time of an algorithmic solution by increasing the space to store the data & vice
versa.
Algorithm Notation:Algorithm :- A step . by step procedure to solve a particular problem
Format:Algorithm 1: (find sum) -----------------------------------------
Paragraph Which tells
the purpose of algorithm.
----------------------------Step 1
Step 2.
List of steps that is to
be executed.
Step n. Exit
Steps ,Control, Exit: The steps of the algorithm are executed one after the other beginning with step 1
.(Unless indicated otherwise)
 Control may be transferred to step n of the algorithm by the statement.
“Go to step n.”
 The algorithm is completed with the statement Exit.
Comments :- Each step may contain a comment in brackets which indicates the
main purpose of the step. The comment will usually appear at the beginning or
at the end of step.
Variable names :- Variable names will use capital letters.
For Ex.:- DATA
Assignment statement :- Assignment statement will use the dots equal
notation “:=”.
For Ex.:- Max:=DATA [1]
The above statement assigns the value in DATA[1] to Max.
Input :- Data may be input & assigned to variables by means of a read statement
with the Following form.
Read: Variable names
Output :- Message placed in quotation marks ,& data in variables may be output
by means of a write statement with the following form.
Write : Messages and/or variable names.
Procedure:- The term ‘procedures’ will be used for an independent algorithmic
module which solves a particular problem. Use the world procedure
instead of algorithm.
Control Structures
There are three types of logic or flow control.
1) Sequence logic or Sequential flow
2) Selection logic or Conditional flow
3) Iteration logic or Repetitive flow or loops
1) Sequential flow :-
Algorithm
Flow Chart
Module A
Module A
Module B
Module B
Module C
Module C
2 Conditional flow :The conditional structures fall into three types.
2.1> Single Alternative :Syntax :If Condition, then :
[Module A]
[End of If Structure]
Flowchart:-
Condn ?
Yes
Module A
No
2.2> Double Alternative :Syntax :If Condition, then :
[Module A]
Else:
[Module B]
[End of If Structure]
Flowchart:-
Condn
No
Yes
Module A
2.3> Multiple Alternative:Syntax :If Condition (1), then :
[Module A1]
Else if condition (2), then:
[Module A2]
Else if condition (M), then:
[Module AM]
Else:
[Module B]
[End of If Structure]
Module B
3. Repeat – Flow [ Loops]:3.1> Repeat –For-Loop :Syntax:-
Repeat for K=R to S by T
[Module]
[End of loop].
Where ,
R is initial value
S is end value or test value
T is increment.
Flowchart :K←R
Yes
Is K> S
?
??
No
Module [Body of Loop]
No
K ← K+T
3.2 > Repeat –While Loop :Syntax:Repeat while condition :
[Module]
[End of loop]
Flowchart :n
Cond .?
Yes
Module Yes
[Body of Loop]
Complexity of Algorithm
Suppose M is an algorithm ,& suppose n is the size of the input data.The time
& space used by the algorithm M are two main measures for calculating efficiency
of M.
The complexity of an algorithm M is the function f(n) which gives the running
time &/or storage space requirement of the algorithm in terms of the size n of the
input data.
There are three cases.
1) Worst case :- The maximum value of f(n) for any possible input.
2) Average case :- The excepted value of f(n) .Average case uses the probability
theory.
E= n1p1+n2p2+………+nkpk
suppose the numbers n1,n2,….nk occurs with respective probabilities
p1 p2…….pk
3) Best case:- The minimum possible value of f(n).
Subalgorithms:A subalgorithms is a complete & independently defined algorithmic module which is
used (or called) by some other subalgorithm. A subalgorithm receives values called
arguments, from an originating (calling) algorithm, performs computations;& then send
back the result to the calling algorithm.
The subalgorithm is defined independently so that it may be called by many
different algorithm or called at different times in the same algorithm.
[The relationship between an algorithm & a subalgorithm is similar to the
relationship between a main program & subprogram in a programming language.]

Write a function subalgorithm MEAN to find the average of 3 numbers A,B & C.
Function 2.5 : MEAN (A,B,C)
1. Set AVE :=(A+B+C)/3.
2. Return (AVE).

Write a procedure SWTICH to interchange values of AAA & BBB .
Function 2.6 : SWITCH (AAA,BBB)
1. Set TEMP :=AAA, AAA:=BBB And BBB:= Temp.
2. Return .
3.

Write an algorithm to find out the roots of quadratic equation ax2+bx+c=0
Algorithm 2.2 : (Quadratic Equation) This algorithm inputs the coefficients
A,B,C of a quadratic equation & outputs the real solution ,if any.
Step1 . Read : A,B,C.
Step 2 . Set D:= B2- 4Ac.
Step 3 . If D>0 then :
(a) Set x1:=(-B+
D ) / 2A and
Set x2:=(-B - D ) / 2A.
(b) Write :X1, X2.
Else if D=0 ,then:
(a) Set x :=-B/2A.
(b) Write : ’Unique solution’, X.
Else :
Write :’No Real Solutions’
[End of If Structure.]
Step 4 .Exit.

Write an algorithm to find the largest element in an array.
Algorithm 2.3 : (largest element in array) Given a non empty array DATA with
N numeric values ,this algorithm finds the location LOC & the value Max of the Largest
element of DATA.
1 .[Initialize] Set K:=1,LOC:= 1 and Max:= DATA[1].
2 . Repeat step 3 & 4 while K ≤ N :
3 . If Max < DATA [K] ,then :
Set LOC := K and Max := DATA[K].
[End of If Structure.]
4 . Set K := K+1 .
[End of step 2 loop.]
5. Write :LOC , Max.
6. Exit.
Linear search :Algorithm 2.4 : (Linear search) A linear array DATA with N elements & a specific
ITEM of information are given .This algorithm finds the location LOC of ITEM in
the array DATA or Sets LOC=0.
1. [ Initialize] set K:=1 and LOC:= 0.
2. Repeat steps 3 & 4 while LOC =0 and K≤ N .
3. If ITEM=DATA [K] , Then: Set LOC:= K.
4. [Increment Counter] Set K:=K+1 .
[End of step 2 loop.]
5. [ Successful ?]
If LOC=0,then :
Write : ITEM is not in the array DATA.
Else :
Write :LOC is the location of ITEM.
[End of If structure]
6. Exit.
Complexity of Linear search :The complexity of the search algorithm is given by the number C of
comparisons between ITEM & DATA [k].
* Worst case :- The worst case occurs when ITEM is the last element in the array
DATA or is not there at all.
C(n)=n
Accordingly C(n)=n is the worst case complexity of linear search
algorithm.
* Average case :- Here we assume that ITEM does appear in DATA ,and it is equally
likely to occur at any position in the array .Accordingly the number of comparisons can
1
be any of the numbers 1,2,3 ,…. N and each number occurs with probability p= then
n
C(n) =1.
1
1
1
+2 . +……+n.
n
n
n
= (1+2 + ……….+n).
=
1
n
n( n  1) 1
.
2
n
=
n 1
2

n
( Average no. of comparisons are approximately equal to half the number
2
of elements in the data list.)
DATA Types:1) character
2) Real (pr floating point)
3) Integer (or fixed point)
4) Logical
* Variables :
Global Variables :- Variables that can be accessed by all program modules.
Local variables:- Each program module contains its own list of variables called
Local variables.
Sieve Method :- To find all prime numbers less than m.
Prime no. :- An integer n>1 is called prime number if its only positive divisions are 1
& n.
If n> 1 is not prime (Composite no.) .then n must have a divisor k≠ 1,
such that k ≤ n Or in other words k2 ≤ n
Problem:- Fins all prime numbers less than 30.
Step 1 :- List the 30 Numbers
1
2
17
3
18
4
5
19
20
6
7
21
8
9
22
23
10
24
11
12
13
14
15
16
25
26
27
28
29
30
11
12
13
14
15
16
25
26
27
28
29
30
Step 2 :-Cross out 1 & multiples of 2
1
2
17
3
18
4
5
19
20
6
7
21
8
9
22
23
10
24
Step 3 :- Since 3 is the first number following 2 that has not been eliminated cross out
multiples of 3 from the list.
1
2
17
3
18
4
5
19
20
6
7
21
8
9
22
23
10
24
11
12
13
14
15
16
25
26
27
28
29
30
Step 4 :- Since 5 is the first number following 3 that has not been eliminated ,cross
out multiples of 5 from the list.
1
2 3
4
5
17
18
19
20
6
7
21
8
22
9
23
10
24
11
12
13
14
15
16
25
26
27
28
29
30
Step 5 :- Now 7 is the first number following 5 that has not been eliminated but
72 > 30.This means the algorithm is finished & the numbers left in the list
are the prime numbers less than 30.
2
3
5
7
11
13
17
19
23
29
String processing:Each programming language contains a character set that is used to
communicate with the computers. The set usually includes the following.
Alphabet:- A,B…………….X,Y,Z
Digit -: 0,1,2,………………..8,9
Special characters:- +, -, /, *, () , $, =, ’,  , 
String :- A finite sequence s of zero or more characters is called string.
Length:- The number of characters in a string is called its length.
The string With zero characters is called empty string or the null string.
Ex.:String
Length
‘The End’
‘To be or note to be’
‘‘
‘ ‘
7
18
0
2
Concatenation of String :Let S1 & S2 be strings .The string consisting of the characters of S1
followed by the characters of S2 is called the concatenation of S1 & S2 .It will
be denoted by S1// S2.
For Ex.:- ‘The ‘// ‘END’=’TheEnd’
‘The’ // ‘’ // ‘End ‘ =’The End “
And length S1 //S2 is equal to sum of lengths of the strings S1 & S2.
Substring :A string Y is called a substring of a string S if there exists X & Z such that
S= X//Y//Z
If X is an empty string then Y is called initial substring of S,& if Z is an empty
string then Y is called a terminal of S.
For Ex.:- ‘THE’ is an initial substring of ‘THE END’ .
‘BE OR NOT ‘ is a substring of ‘To BE OR NOT TO BE’
Storing String :Strings are stored in three types of structures.
Storing Strings
Fixed Length
Structures
1)
Variable Length
Structures
Linked Structures
Record –Oriented ,Fixed length storage :In this type of storage method records with fixed length are used to store
strings.
a) Records stored sequentially in the memory.
Suppose we assume that records has fixed length 10, & we used it to stored
name of students.
A M
I
T
F
R
A
K
I
Y
A
200
A
H
A
N
210
P R
220
Advantages :1) The ease of accessing data from any given record.
2) The ease of updating data in any given record (As long as the length
of the new data does not exceed the record length.)
Disadvantages :1)Time is wasted reading an entire record if most of the storage
consists of inessential blank spaces.
2) Certain records may require more space than available.
3) When the correction consists of more or fewer characters then the
original text, changing a misspelled word requires the entire record to
be changed.
b) Records stored using pointer:In above method (1 a ) suppose we wanted to insert a new record.
This would require that all succeeding records to be moved to new memory
locations. However this disadvantages can be easily removed by following method
where we can use an array POINT which gives the address of each successive
record .So that records need not be stored in consecutive location in memory .
Accordingly ,inserting a new record will require only an updating of array POINT.
POINT
1
2
3
2)
A M
I
T
F A
R
A
K
P R
I
Y
A
H
A
N
Variable –Length storage with Fixed Maximum:Although strings may be stored in fixed –length memory locations, there are
advantages in knowing the actual length of each string .For Ex. One then does
not have to read the entire record when the string occupies only the beginning
part of the memory location.
The storage of variable length strings in memory cells within fixed lengths can
be done in two general ways.
A) One can use a marker ,such as two dollar sign ($$),to signal the end of the
string.
B) One can list the length of the string – as an additional item in the pointer
array.
Records with sentinels
A
M
I
T
$
$
F A R A K H A N $$
P R I Y A$$
Records whose length are listed
1
4
2
7
3
5
AMIT
FARAKHAN
PRIYA
Remarks :- Records stored sequentially one might be tempted to store strings one
after another by using some separation marker such as $$ or by using a
pointer array giving the location of strings.
AMIT $$ FARAKHAN $$ P R I Y A$$ …….
Fig :- a
END
AMIT
FARAKHAN PRIYA
……
1
2
3
Fig :- b
Advantages
:- This method of storing strings will save space.
Disadvantages :- This method is usually inefficient when the strings & their
lengths are frequently being changed.
3 ) Linked storage :In this method strings are stored in linked list .A linked list is a linearly
ordered sequence of memory cells ,called nodes. Where each node
contains an item like which points to the next node in the list .
schematic diagram of linked list.
XXX
XXX
XXX
Each memory cell is assigned one character or a fixed number of
characters, & like contains the address of the cell / node containing next
character or group of characters in the string.
Ex:- string S =’TO BE OR NOT TO BE’
One character per node
(a)
T
O
B
E
Four characters per node
(b) T O
B
E
O R
N O T
Character data type :Constants :- Many programming languages denote string constants by placing
the string in either single or double quotation marks.
For Ex.:- ‘The END ‘ & ‘TO BE OR NOT TO BE’ are string constants of
length 7 & 18 characters respectively .
Variables :Character Variable
Static
Static
Semistatic
Dynamic
:- A static character variable means a variable whose length is
defined before the program is executed & can not change
throughout the program.
Semistatic :- A semistatic character variable means a variable whose length
may vary during the execution of the program as long as the length
does not exceed a maximum value determined by the program
before the program is executed.
Dynamic :- Dynamic character variable means a variable whose length can
change the execution of the program.
String operations :
A ) Substring :- Group of consecutive elements in a string is called as
substring , to access substring from a given string following
operation is used.
SUBSTRING(string, initial ,length)
Where, initial – is the position of first character of the substring in
the Given string.
Ex.:- SUBSTRING( ‘TO BE OR NOTE TO BE ‘,4,7)
‘BE OR N’
SUBSTRING( ‘THE END ‘,4,4)
‘  END’
B)Indexing :- It is also called as pattern matching ,refers to finding the position
where a string pattern P first appears in a given string text T.
INDEX (Text, Pattern)
If the pattern P does not appear in the text T, then INDEX is
assigned the value 0.
Ex.:-Let T=’His father is the professor ‘
Then ,
INDEX(T, ‘THE’) =7
INDEX(T, ‘THEN’) =0
INDEX(T, ‘ THE ’) =14
C)Concatenation :- Let S1 & S2 be the strings then concatenation of S1 & S2
is denoted by S1 // S2 is the string consisting of the
characters S1 followed by the characters of S2.
Ex.:- Let S1 =’MARK’ & S2 =’TWAIN’
Then,
S1 // S2 =’MARKTWAIN’
S1 //” “ // S2 = ‘ TWAIN’
d)Length:- The number of characters in a string is called its length .
LENGTH(string)
Ex:-
LENGTH(‘Computer’)=8
LENGTH(‘0’)=0
LENGTH(‘ ’)=0
LENGTH(‘’)=1
Word Processing :Following are the world processing operations.
A)Insertion :- Inserting a string in the middle of the text.
Suppose in a given text T we want to insert a string S so that S
begins in position K.
INSERT (Text, Position, String)
Ex.:- INSERT (‘ABCDEFG’,3,’XYZ’)= ‘ABXYZCDEFG’
INSERT (‘ABCDEFG’,6,’XYZ’)= ‘ABCDEXYZFG’
B) Deletion :- Deleting a string from the text.
Suppose in a given text T we want to delete the substring which
begins in position K & has length L.
DELETE (Text, Position, Length)
Ex.:- DELETE (‘ABCDEFG’,4,2)= ‘ABCFG’
DELETE (‘ABCDEFG’, 2, 4)= ‘AFG’
DELETE (‘ABCDEFG’, 0, 2)= ‘ABCDEFG’
C)Replacement :- Replacing one string in the text by another.
Suppose in a given text T we want to replace the first
occurrence of pattern P1 by pattern p2
REPLACE (Text ,Pattern 1,Pattern2)
Ex.:-
REPLACE(‘XABYABZ’,’AB’, ‘C’)=’XCYABZ’
REPLACE(‘XABYABZ’,’BA’, ‘C’)=’XABYABZ’
In the second case the pattern BA does not occur ,& hence there
is no change.
Algorithm 3.1 :- A text T & a pattern P are in memory .This algorithm delete every
occurrence of P in T.
1.[Find index of P.] set K:= INDEX (T,P).
2. Repeat while K ≠ 0
(a) [Delete P from T]
Set T:= DELETE (T,INDEX(T,P), LENGTH (P)).
(b) [Update index] Set K:= INDEX(T,P).
[End of Loop.]
3. Write :T.
4. Exit.
Algorithm 3.2 :- A text T & patterns P & Q are in memory. This algorithm replaces
Every occurrence of P in T by Q.
1. [Find index of P] Set K:= INDEX(T,P).
2. Repeat while K ≠ 0
(a) [Replace P by Q ] Set T:= REPLACE (T,P,Q).
(a) [Update index] Set K:= INDEX(T,P).
[End of Loop.]
3. Write :T.
4. Exit.
Pattern Matching algorithm :There are two algorithms given in the book.
1) First pattern matching algorithm :In this algorithm we compare a given pattern P with each of the
substring of T, moving from left to right ,until we get a match.
Suppose that P is a 4 characters string & T is 20 character string
And P & T appear in memory as linear arrays with one character per element.
i.e.
p=p[1] p[2] p[3] p[4]
T=T[1] T[2] T[3]……………. T[19] T[20]
Then P is compared with each of the following 4 characters substring of T.
W1=T[1] T[2] T[3] T[4] ,W2=T[2] T[3] T[4] T[5]…….,W17=T[17] T[18] T[19] T[20]
NOTE:- Max =20-4+1=17
(Number of Substring of T are formed)
Max = LENGTH(T) – LENGTH(P)+1
This algorithm contains two loops ,one inside the other .The outer
loop runs through each successive R-character substring of T.
The inner loop compares P with W K , character by character .If any
character does not match , control transfer to next substring to match. If
P does not appear in T then INDEX=0.
Algorithm 3.3: ( Pattern Matching ) P & T are strings with length R & S
respectively & are stored as arrays with one character per element .This
algorithm finds the INDEX of P in T.
1. [Initialize] set K:=1 & Max:=S-R+1.
2. Repeat steps 3 to 5 while K ≤ Max
3. Repeat for L=1 to R [Tests each character of p]
If P[L] ≠ T [K+L-1] ,Then : Go to step 5.
[End of Inner Loop]
4. [Success] set INDEX=K & Exit..
5. Set K:=K+1.
[End of step 2 outer loop]
6. [Failure] Set INDEX=0.
7. Exit..
Complexity of first matching Algorithm:It is measured by the number C of comparisons between characters
in the pattern P & character of the text T.
Then
C=N1 +N2 +……+N L
Where, L is the position in T where P first appears or
L= Max if P does not appear in T.
Ex.1 :- Suppose P= a a b a
T= c d c d c d c d c d c d……..=(c d)10
W1
W2
W3
Max=20-4+1=17
Hence C= 1+1+……..+1=17
2:- Suppose P=a a b a
T= a b a b a a b a
W1
W1 =a b a b
W2 =b a b a
W3 = a b a a
W4 =b a a b
W5 = a a b a
Hence C= 2 + 1+ 2+ 1+ 4 = 10
3:- Suppose P= a a a b
T=a a …..=a 20
Hence ,
C= 4 + 4 + ……….+4 = 68
In general ,
P is an r character string
T is an s character string
Data size for algorithm is
N=r + s
Max =20-4 +1
W1 =a a a a
W2 = a a a a
W17 = a a a a
In Worst case every character of p except the last matches every
substring W k.In this case as we know
Max= s – r +1
C(n) =r(s-r+1) [ See ex. 3 ]
For fixed n we have s=n-r
The maximum value of c(n) occurs when r=(n+1) / 4
Accordingly ,
C(n) = (n+1)2 / 8
= O (n2)
The complexity of this pattern matching algorithm is equal to O (n2)
2) Second pattern matching algorithm:Algorithm 3.4: (Pattern matching ) The pattern matching algorithm
Table F(Q1,T) of a pattern P is in memory ,& the input is an N character
string T= T1 T2 …… TN .This algorithm finds the INDEX of P in T.
1. [Initialize] set K:=1 & S1=Q0 .
2. Repeat steps 3 to 5 while SK ≠ P & K ≤ N
3. Read Tk .
4. Set Sk+1:= F(Sk , Tk). [Find next stage]
5. Set K:= K+1 .[Updates counter]
[End of step 2 Loop]
6. [Successful ?]
If Sk= P , Then :
INDEX =K – LENGTH (P).
Else :
INDEX=0.
[End of If Structure]
7. Exit.
Complexity of second pattern matching Algorithm:The running time of the above algorithm is proportional to the number of
times the step 2 loop is executed .The worst case occurs when all the text T
is read i.e. loop is executed n=LENGTH(T) times.
Accordingly we can state that the complexity of this pattern
matching algorithm is equal to O(n).
The second pattern matching algorithm is fast as a compare to
first matching algorithm.
Problems
Ex.1:- Let P = a a b a
The initial substrings of P are
Q0= , Q1 = a, Q2 = a2, Q3= a2 b , Q4 = a2 ba=p
[Q0=  is the empty string ]
For each character t, the entry ƒ (Qi , t ) in the table is the largest Q which
appears as a terminal substring in the string Qi t. We compute
ƒ(, a) =a
ƒ(, b) =
i.e. ƒ(Q0, a) =Q1
i.e. ƒ(Q0, b) = Q0
ƒ(a, a) = a2
i.e. ƒ(Q1, a) =Q2
ƒ(a, b) =
i.e. ƒ(Q1, b) = Q0
ƒ(a2, a) = a2
i.e. ƒ(Q2, a) =Q2
ƒ(a2, b) = a2 b
i.e. ƒ(Q2, b) = Q 3
ƒ(a2b , b) = 
i.e. ƒ(Q3, b) = Q0
ƒ(a2 b , a) =P
i.e. ƒ(Q3, a) =Q4=P
Pattern matching table :Q
Q0
Q1
Q2
Q3
a
Q1
Q2
Q2
P
b
Q0
Q0
Q3
Q0
Pattern matching graph :-
a
a
Q0
b
Q1
b
b
Q2
a
b
a
Q3
P
Ex. 2 :- Consider the pattern P = a a a b b .
First list the initial segments of P :
Q0= , Q1= a, Q2= a2, Q3= a3 , Q4= a3 b , Q5= a3 b2 = P
For each character t, the entry ƒ (Qi , t ) in the table is the largest Q which
appears as a terminal substring in the string Qi t. We compute
ƒ(, a) =a = Q1
ƒ(, b) = = Q0
ƒ(a, a) = a2 = Q2
ƒ(a, b) = = Q0
ƒ(a2, a) = a3 = Q3
ƒ(a2, b) =  = = Q0
ƒ(a3 , a) = a3 = Q3
ƒ(a3 , b) = a3 b = Q4
ƒ(a3 b, a) = a = Q1
ƒ(a3 b, b) = a2 b2 = P
Hence we obtained the following pattern matching table :
Q
Q0
Q1
Q2
Q3
Q4
a
Q1
Q2
Q3
Q3
Q1
b
Q0
Q0
Q0
Q4
P
Pattern matching graph for above table is as follows:a
a
a
Q0
b
Q1
b
b
b
a
Q2
a
Q3
b
b
Q4
P
Ex. 3 :- Let pattern P = a b a b a b .
The initial substrings of P are :
Q0= , Q1= a, Q2= ab, Q3= aba ,
Q6= ababab=p
Q4= ab ab ,
Q5= ababa ,
The function ƒ given the entries in the table as follow:
ƒ(, a) =a = Q1
ƒ(, b) = = Q0
ƒ(a, a) = a= Q1
ƒ(a, b) =ab= Q2
ƒ(ab, a) = aba = Q3
ƒ(ab, b) = = Q0
ƒ(aba, a) = a = Q1
ƒ(aba, b) = abab= Q4
ƒ(abab, a) = ababa = Q5
ƒ(abab, b) = = Q0
ƒ(ababa, a) = a= Q1
ƒ(ababa, b) = ababab= P
Hence we obtained the following pattern matching table :
Q
Q0
Q1
Q2
Q3
Q4
Q5
a
Q1
Q1
Q3
Q1
Q5
Q1
b
Q0
Q2
Q0
Q4
Q0
P
Pattern matching graph for above table is as follows:a
a
a
a
Q0
b
Q1
a
Q2
b
Q3
Q4
b
b
a
b
b
Q5
P