Database Design: A Point in Time Architecture - CSCI 6442

Transcription

Database Design: A Point in Time Architecture - CSCI 6442
Database Design: A Point in Time Architecture
1 of 23
Home
SQL
SQL Home
.NET
Cloud
https://www.simple-talk.com/sql/database-administration/database-desig...
SysAdmin
Database Administration
Opinion
Books
Blogs
Forums
Database Design: A Point in Time Architecture
Database Design: A Point in Time Architecture
22 February 2007
by Arthur Fuller
Av rating:
Total votes: 160
Total comments: 43
send to a friend
printer friendly version
Point in Time Architecture (PTA) is a database design that guarantees support for two related but different concepts –
History and Audit Trail.
History – all information, both current and historical, that as of this moment, we believe to be true.
Audit Trail – all information believed to be true at some previous point in time.
The distinction is that the Audit Trail shows the history of corrections made to the database. Support for History and Audit
Trail facilities are notably absent from typical OLTP databases. By "typical", we mean databases that support the
traditional Select, Insert, Delete and Update operations. In many cases, typical OLTP databases are perfectly fine for
their requirements, but some databases demand the ability to track History and Audit Trail as core requirements. Without
these abilities, the database will fail.
Typical OLTP databases destroy data. This is most obvious with the Delete command, but a moment's thought reveals
that the Update command is equally destructive. When you update a row in a table, you lose the values that were there a
moment ago. The core concept in PTA is this: no information is ever physically deleted from or updated in the
database.
However, some updates are deemed important while others are not. In all likelihood, the data modeler, DBA, or SQL
programmer will not know which updates are important and which unimportant without consultation with the principal
stakeholders. A mere spelling error in a person's surname may be deemed unimportant. Unfortunately, there is no way to
distinguish a spelling error from a change in surname. A correction to a telephone number may be deemed trivial, but
again there is no way to distinguish it from a changed number. What changes are worth documenting, and what other
changes are deemed trivial? There is no pat answer.
The Insert statement can be almost as destructive. Suppose you insert ten rows into some table today. Unless you've got
a column called DateInserted, or similar, then you have no way to present the table as it existed yesterday.
What is Point In Time Architecture (PTA)?
PTA is a database design that works around these problems. As its name implies, PTA attempts to deliver a transactional
database that can be rolled back to any previous point in time. I use the term "rolled back" metaphorically: traditional
restores are unacceptable for this purpose, and traditional rollbacks apply only to points declared within a transaction.
A better way to describe the goal of a PTA system is to say that it must be able to present an image of the database as it
existed at any previous point in time, without destroying the current image. Think of it this way: a dozen users are
simultaneously interrogating the database, each interested in a different point in time. UserA wants the current database
image; UserB wants the image as it existed on the last day of the previous month; UserC is interested in the image of
5/4/2015 12:37 AM
Database Design: A Point in Time Architecture
https://www.simple-talk.com/sql/database-administration/database-desig...
the last day of the previous business quarter; and so on.
Requirements of PTA
Most obviously, physical Deletes are forbidden. Also, Inserts must be flagged in such a way that we know when the
Insert occurred. Physical Updates are also forbidden; otherwise we lose the image of the rows of interest prior to the
Update.
What do we need to know?
Who inserted a row, and when.
Who replaced a row, and when.
What did the replaced row look like prior to its replacement?
We can track which rows were changed when in our PTA system by adding some standard PTA columns to all tables of
PTA interest. I suggest the following:
DateCreated – the actual date on which the given row was inserted.
DateEffective – the date on which the given row became effective.
DateEnd – the date on which the given row ceased to be effective.
DateReplaced – the date on which the given row was replaced by another row.
OperatorCode – the unique identifier of the person (or system) that created the row.
Notice that we have both a DateCreated column and a DateEffective column, which could be different. This could
happen, for example, when a settlement is achieved between a company and a union, which guarantees specific wage
increases effective on a series of dates. We might know a year or two in advance that certain wage increases will kick in
on specific dates. Therefore we might add the row some time in advance of its DateEffective. By distinguishing
DateCreated from DateEffective, we circumvent this problem.
Dealing with inserts
The easiest command to deal with is Insert. Here, we simply make use of our DateCreated column, using either a
Default value or an Insert trigger to populate it. Thus, to view the data as it stood at a given point in time, you would
perform the Select using the following syntax:
SELECT * FROM AdventureWorks.Sales.SalesOrderHeader
WHERE DateCreated < [some PTA date of interest]
This scenario is all fine and dandy assuming that you are creating the table in question. But you may be called upon to
backfill some existing tables.
If you are retrofitting a database to support PTA, then you won't be able to use a Default value to populate the existing
rows. Instead you will have to update the existing rows to supply some value for them, perhaps the date on which you
execute the Update command. To that extent, all these values will be false. But at least it gives you a starting point. Once
the DateCreated column has been populated for all existing rows, you can then alter the table and either supply a
Default value for the column, or use an Insert trigger instead, so that all new rows acquire their DateCreated values
automatically.
Dealing with deletes
In a PTA architecture, no rows are physically deleted. We introduce the concept of a "logical delete". We visit the existing
row and flag it as "deleted on date z." We do this by updating its DateEnd column with the date on which the row was
"deleted". We do not delete the actual row, but merely identify it as having been deleted on a particular date. All Select
statements interrogating the table must then observe the value in this column.
2 of 23
5/4/2015 12:37 AM
Database Design: A Point in Time Architecture
3 of 23
https://www.simple-talk.com/sql/database-administration/database-desig...
SELECT * FROM AdventureWorks.Sales.SalesOrderHeader
WHERE DateEnd < [PTA_date]
Any row logically deleted after our PTA date of interest is therefore assumed to have logically existed up to our date of
interest, and ought to be included in our result set.
Dealing with updates
In PTA, updates are the trickiest operation. No rows are actually updated (in the traditional sense of replacing the current
data with new data). Instead, we perform three actions:
1. Flag the existing row as "irrelevant after date x".
2. Copy the values of the existing row to a temporary buffer.
3. Insert a new row, copying most some of its values from the old row (those that were not changed), and using the
new values for those columns that were changed. We also supply a new value for the column DateEffective
(typically GetDate(), but not always as described previously).
There are several ways to implement this functionality. I chose the Instead-Of Update trigger. Before investigating the
code, let's describe the requirements:
1. We must update the existing row so that its DateReplaced value reflects GetDate() or UTCDate(). Its DateEnd
value might be equal to GetDate(), or not. Business logic will decide this question.
2. The Deleted and Inserted tables give us the values of the old and new rows, enabling us to manipulate the
values.
Here is the code to create a test table and the Instead-Of trigger we need. Create a test database first, and then run this
SQL:
CREATE TABLE [dbo].[Test_PTA_Table](
[TestTablePK] [int] IDENTITY(1,1) NOT NULL,
[TestTableText] [varchar](50)
COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,
[DateCreated] [datetime]
NOT NULL CONSTRAINT [DF_Test_PTA_Table_DateCreated]
DEFAULT (getdate()),
[DateEffective] [datetime] NOT NULL,
[DateEnd] [datetime] NULL,
[OperatorCode] [varchar](50)
COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,
[DateReplaced] [datetime] NULL
CONSTRAINT [DF_Test_PTA_Table_DateReplaced]
DEFAULT (getdate()),
CONSTRAINT [PK_Test_PTA_Table] PRIMARY KEY CLUSTERED
(
[TestTablePK] ASC
)WITH (PAD_INDEX = OFF, IGNORE_DUP_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY]
Here is the trigger:
CREATE TRIGGER [dbo].[Test_PTA_Table_Update_trg]
-- ALTER TRIGGER [dbo].[Test_PTA_Table_Update_trg]
5/4/2015 12:37 AM
Database Design: A Point in Time Architecture
https://www.simple-talk.com/sql/database-administration/database-desig...
ON [dbo].[Test_PTA_Table]
INSTEAD OF UPDATE
AS
SET NOCOUNT ON
DECLARE @key int
SET @key = (SELECT TestTablePK FROM Inserted)
UPDATE Test_PTA_Table
SET DateEnd = GetDate(), DateReplaced = GetDate()
WHERE TestTablePK = @key
INSERT INTO dbo.Test_PTA_Table
(TestTableText, DateCreated, DateEffective, OperatorCode, DateReplaced)
(SELECT TestTableText, GetDate(), GetDate(), OperatorCode, NULL
FROM Inserted)
A real-world example would involve more columns, but I kept it simple so the operations would be clear. With our
underpinnings in place, open the table and insert a few rows. Then go back and update one or two of those rows.
Dealing with selects
Every Select statement must take into account the dates just described, so that a query which is interested in, say, the
state of the database as it appeared on December 24, 2006, would:
Exclude all data inserted or updated since that day.
Include only data as it appeared on that day. Deletes that occurred prior to that date would be excluded.
In the case of updated rows, we would be interested only in the last update that occurred prior to the date of
interest.
This may be trickier than it at first appears. Suppose that a given row in a given table has been updated three times prior
to the point in time of interest. We'll need to examine all the remaining rows to determine if any of them have been
updated or deleted during this time frame, and if so, exclude the logical deletes, and include the logical updates.
With our standard PTA columns in place this may not be as tricky as it at first sounds. Remember that at any particular
point in time, the rows of interest share the following characteristics.
DateCreated is less than or equal to the PTA date of interest.
DateEffective is greater than or equal to the PTA date.
DateEnd is either null or greater than the PTA date.
DateReplaced is either null or less than the PTA date.
So for our row that has been updated three times prior to the PTA date:
The first and second rows will have a DateEnd and a DateReplaced that are not null, and both will be less than
the PTA date.
The third row will have a DateEffective less than the PTA date, and a DateReplaced that is either null or greater
than the PTA date.
So we can always query out the rows of interest without having to examine columns of different names, but rather always
using the same names and the same semantics.
PTA implementation details
The most important thing to realize is that it may not be necessary to trace the history of every column in a table. First of
all, some columns, such as surrogate IDs, assigned dates (e.g. OrderDate), and other columns such as BirthDate will
4 of 23
5/4/2015 12:37 AM
Database Design: A Point in Time Architecture
5 of 23
https://www.simple-talk.com/sql/database-administration/database-desig...
never be changed (other than for corrections). Another example is TelephoneNumber. In most applications, it is not
significant that your telephone number changed twice in the past year: what we care about is your current number.
Admittedly, some organizations may attach significance to these changes of telephone number. That is why we can only
suggest a rule of thumb rather than an iron-clad rule. The stakeholders in the organization will help you decide the
columns that are deemed "unimportant".
What then qualifies as an important column? The rule of thumb is that important columns are those that have
changeable attributes, and whose changes have significance.
Columns with changeable attributes are often called Slowly Changing Dimensions (SCDs). However, just because an
attribute value changes, that doesn't imply that the change is significant to the business. There are two types of SCD:
Type 1 – columns where changes are of little or no interest to the organization
Type 2 – columns where changes must be tracked and history recorded.
An obvious example of a Type 2 SCD is EmployeeDepartmentID. Typically, we would want to be able to trace the
departments for which an employee has worked. But again, this may or may not be important to a given organization.
What we can say is this: it is rarely the case that all columns within a table are considered Type 2.
Once you have defined the Type 1 and Type 2 columns, you can then devise the programmatic objects required to
handle both types. The Type 1 code won't bother with logical updates; it will perform a simple update., replacing the old
value with a new one and not documenting this change in detail. The Type 2 code will follow the rules for logical updates
and deletes.
Using domains
Depending on the development tools you use, you may or may not be able to take advantage of domains. (I am a big fan
of ERwin and PowerDesigner, and almost never develop a data model without using them, except for the most trivial
problems.)
In terms of data-modeling, a domain is like a column definition, except that it is not related to a table. You create a
collection of domains, specifying their default values, description, check constraints, nullability and so on, without
reference to any given table. Then, when you create individual tables, instead of supplying a built-in data type for a
column, you specify its domain, thus "inheriting" all its attributes that you defined earlier. The less obvious gain is that
should you need to change the domain definition (for example from int to bigint, or shortdatetime to datetime, or
varchar(10) to char(10)), you make the change in exactly one place, and then forward-engineer the database. All
instances of the domain in all tables will be updated to correspond to the new domain definition. In a database
comprising hundreds of tables, this approach can be a huge time-saver.
Although I love domains, I have found one problem with them. In my opinion, there ought to be two kinds of domains, or
rather a double-edged domain. Consider a domain called CustomerID. Clearly, its use in the Customers table as a PK
is different than its use in various related tables, as an FK. In the Customer table it might be an int, Identity(1,1),
whereas in the related tables, it will still be an int, but not an identity key. To circumvent this problem, I typically create a
pair of domains, one for the PK and another for all instances of the domain as an FK.
Sample transactions
There is no panacea for creating a PTA database. However, using the examples provided here, plus some clear thinking
and a standard approach, you can solve the problems and deliver a fully compliant PTA system.
Assume a Hollywood actress who marries frequently, and who always changes her surname to match her husband's. In
a PTA, her transaction record might look like this:
Table 1: Persons table with PTA columns and comment.
5/4/2015 12:37 AM
Database Design: A Point in Time Architecture
6 of 23
https://www.simple-talk.com/sql/database-administration/database-desig...
PersonIDGivenSurnameDateCreatedDateEffectiveDateEnd DateReplacedOperator CodeDescription
1234
Mary O'Hara
01-Jan-04
01-Jan-04
1234
Mary O'Hara
01-Jan-04
01-Jan-04
2345
Mary Roberts 01-Jun-04
02-Jun-04
2345
Mary Roberts 01-Jun-04
02-Jun-04
3456
Mary Kent
12-Dec-05
12-Dec-05
3456
Mary Kent
12-Dec-05
12-Dec-05
4567
Mary Clark
06-Jun-06
07-Jun-06
01-Jun-04 01-Jun-04
12-Dec-0512-Dec-05
06-Jun-06 06-Jun-06
jlarue
Prior to Wedding
jlarue
Ends Maiden Name
bhoskins
Adopts New Surname
cwebb
Ends Marriage
cwebb
Remarries and adopts new Surname
bhoskins
Ends Marriage
bhoskins
Remarries and adopts new Surname
Here we have the history of Mary's three marriages. Mary O'Hara entered the database on 01-Jan-04. In June of the
same year she adopted, through marriage, the surname Roberts. This is reflected in our PTA database with the
appropriate value inserted into the DateEnd and DateReplaced columns of Mary's row. We then insert a new row into
the Persons table, with a new PersonID value, the updated surname and the correct DateCreated and DateEffective
values. This process is repeated for each of Mary's subsequent marriages, so we end up with four rows in the Persons
table, all referring to the same "Mary".
These three Primary Keys all point to the same woman. Her surname has changed at various points in time. To this
point, we have considered History as referring to the history of changes within the tables. However, this example
illustrates another concept of history: the history of a given object (in this case, a person) within the database. Some
applications may not need to know this history, while others may consider this critical. Medical and police databases
come immediately to mind. If all a criminal had simply to change his surname to evade his history, we would have
problems in the administration of justice.
One might handle this problem by adding a column to the table called PreviousPK, and insert in each new row the PK of
the row it replaces. This approach complicates queries unnecessarily, in my opinion. It would force us to walk the chain
of PreviousPKs to obtain the history of the person of interest. A better approach, I think, would be to add a column
called OriginalPK, which may be NULL. A brand-new row would contain a null in this column, while all subsequent rows
relating to this person would contain the original PK. This makes it trivial to tie together all instances. We can then order
them using our other PTA columns, creating a history of changes to the information on our person of interest.
Table 2: Persons Table with PTA and Original PK tracking column.
PersonIDGivenSurnameDateCreatedDateEffectiveDateEnd DateReplacedOperator CodeOriginalPK
1234
Mary O'Hara
01-Jan-04
01-Jan-04
1234
Mary O'Hara
01-Jan-04
01-Jan-04
2345
Mary Roberts 01-Jun-04
02-Jun-04
jlarue
01-Jun-04 01-Jun-04
jlarue
bhoskins
1234
5/4/2015 12:37 AM
Database Design: A Point in Time Architecture
https://www.simple-talk.com/sql/database-administration/database-desig...
2345
Mary Roberts 01-Jun-04
02-Jun-04
3456
Mary Kent
12-Dec-05
12-Dec-05
3456
Mary Kent
12-Dec-05
12-Dec-05
4567
Mary Clark
06-Jun-06
07-Jun-06
12-Dec-0512-Dec-05
06-Jun-06 06-Jun-06
cwebb
1234
cwebb
1234
bhoskins
1234
bhoskins
1234
Given Point-In-Time 21-Dec-2005, then the row of interest is the penultimate row: the row whose DateEffective value is
12-Dec-2005 and whose DateEnd is 06-June-2006. How do we identify this row?
SELECT * FROM Persons
WHERE OriginalPK = 1234
AND DateEffective <= '21-Dec-2005'
AND (DateEnd IS NULL) OR (DateEnd > '21-Dec-2005'
The DateEffective value must be less than or equal to 21-Dec-2005 and whose DateEnd is either NULL or greater than
21-Dec-2005.
Dealing with cascading updates
Let us now suppose that during the course of our history of Mary O'Hara, she changed addresses several times. Her
simple changes of address are not in themselves problematic; we just follow the principles outlined above for the
PersonAddresses table. If her changes of address correspond to her marriages, however, the waters muddy slightly.,
because this implies that she has changed both her name and her address. But let's take it one step at a time.
Mary moves from one flat to another, with no other dramatic life changes. We stamp her current row with a DateEnd and
DateReplaced (which, again, might differ). We insert a new row in PersonAddresses, marking it with her current PK
from the Persons table, and adding the new address data. We mark it with a DateEffective corresponding to the lease
date, and leave the DateEnd and DateReplaced null. Should her surname change within the scope of this update then
we mark her row in Persons with a DateEnd and a DateReplaced, then insert a new row reflecting her new surname.
Then we add a new row to PersonAddresses, identifying it with Mary's new PK from Persons, and filling in the rest of
the data.
Each time Mary's Person row is logically updated, thus requiring a new row with a new PK, so we must logically update
the dependent row(s) in the PersonAddresses (and all other related tables with a new row that references the new PK
in the Persons table). This also applies to every other table in our database that relate to Persons. Fortunately, we can
trace the history of Mary's addresses using the Persons table.
In more general terms, the point to realize here is that every time a Type 2 update occurs in our parent table (Persons, in
this case), a corresponding Type 2 update must occur in every related table. How complex these operations will be
clearly depends on the particular database and its requirements. Again, there is no hard-and-fast rule to decide this.
Dealing with cascading deletes
A logical delete is represented in PTA as a row containing a not-null DateEnd and a null DateReplaced. Suppose we
have a table called Employees. As we know, employees come and go. At the same time, their IDs are probably FKs into
one or more tables. For example, we might track SalesOrders by EmployeeID, so that we can pay commissions. A
given employee departs the organization. That certainly does not mean that we can delete the row. So we logically
delete the row in the Employees table, giving it a DateEnd that will exclude this employee from any lists or reports
7 of 23
5/4/2015 12:37 AM
Database Design: A Point in Time Architecture
8 of 23
https://www.simple-talk.com/sql/database-administration/database-desig...
whose PTA date is greater than said date – and thus preserving the accuracy of lists and reports whose PTA date is prior
to the employee's departure.
On the other hand, suppose that our firm sells products from several vendors, one of whom goes out of business. We
logically delete the vendor as described above, and perhaps we logically delete all the products we previously purchased
from said vendor.
NOTE:
There is a tiny glitch here, beyond the scope of this article, but I mention it because you may have to consider what to do
in this event. Suppose that you still have several units on hand that were purchased from this vendor. You may want to
postpone those related deletes until the inventory has been sold. That may require code to logically delete those rows
whose QuantityOnHand is zero, and later on to revisit the Products table occasionally until all this vendor's products
have been sold. Then you can safely logically delete those Products rows.
Summary
The first time you confront the challenge of implementing Point in Time Architecture, the experience can be quite
daunting. But it is not rocket science. I hope that this article has illuminated the steps required to accomplish PTA. As
pointed out above, some applications may require the extra step of tracking the history of individual objects (such as
Persons), while others may not need this. PTA is a general concept. Domain-specific implementations will necessarily
vary in the details. This article, I hope, will serve as a practical guideline. I emphasize that there are rarely hard-and-fast
rules for implementing PTA. Different applications demand different rules, and some of those rules will only be
discovered through careful interrogation of the stakeholders. You can do it!
This article has been viewed 82960 times.
Thank this author by sharing:
3
Author profile: Arthur Fuller
Arthur Fuller has been developing database applications for more than 20 years. He frequently works with
Access ADPs, Microsoft SQL 2000 and 2005, MySQL, and .NET.
Search for other articles by Arthur Fuller
Rate this article: Avg rating:
Poor
OK
Good
from a total of 160 votes.
Great
Must read
Have Your Say
Do you have an opinion on this article? Then add your comment below:
You must be logged in to post to this forum
Click here to log in.
Subject:
Posted by:
Posted on:
Message:
RE: Audit mechanism...
Anonymous (not signed in)
Thursday, February 22, 2007 at 7:01 PM
Excellent article. This is a straight forward technical. There are
lot of professional programmers out there don't follow this.
Hence I had to cleaned out the pieces and re-designed the
database with the use of triggers.
5/4/2015 12:37 AM
Database Design: A Point in Time Architecture
9 of 23
https://www.simple-talk.com/sql/database-administration/database-desig...
Subject:
Posted by:
Posted on:
Message:
Errors?
Anonymous (not signed in)
Friday, February 23, 2007 at 5:09 PM
Please double-check your comparison operators in the
sections "Dealing with deletes" and "Dealing with selects."
Some of them seem backward to me.
Subject:
Posted by:
Posted on:
Message:
Alternate designs
Anonymous (not signed in)
Friday, February 23, 2007 at 5:16 PM
Reasonable approach. Good detail. Agree that database
designers often miss this crucial need.
Personally I prefer using two tables, one to store the logical
entity (e.g., Person) that only contains type 1 columns and a
1:Many table (e.g., PersonVersion) that stores each of its
versions and has all the type 2 columns. The logical entity
retains the same PK through its lifetime. Requires less date
comparisons, too, especially for the typical case of wanting the
current version.
I've seen other successful approaches, too. Depends on your
particular needs, of course.
Thanks again for the nice article.
Subject:
Posted by:
Posted on:
Message:
New ID's
Anonymous (not signed in)
Tuesday, February 27, 2007 at 7:13 PM
This is an interesting approach. I am in the midst of a PTA
type project. In my case, I am using "history tables" for the
type 2 values. But queries are complicated because I have to
join to each of 4 history tables (each record in the parent table
has 4 values that must be tracked for history) and check the
dates on each of them. Anyway, I don't understand why in your
examples you assign new ID values when the Type 2 data
changes? Why can't the PersonId stay the same. That really
complicates things.
Thanks.
Subject:
Posted by:
Posted on:
Message:
Addition to this approach
Anonymous (not signed in)
Wednesday, February 28, 2007 at 10:30 AM
I've used a PTA approach very similar to this for several
projects. I find it useful to also set up a few standard views for
each entity in the db; 1. a view that shows all logical records
with the valid start and end dates, 2. a view that shows the
"current" values, 3. one or more views for default time intervals
as needed for the application (1 hour ago, yesterday at
midnight, last month end, last quarter end, etc.) These views
can sometimes help speed up some select statements as well
as providing a place to relate record level security.
Subject:
Posted by:
Posted on:
Message:
amended article title
Anonymous (not signed in)
Wednesday, February 28, 2007 at 10:43 AM
PiTA!!
Subject:
re: Addition to this Approach
5/4/2015 12:37 AM
Database Design: A Point in Time Architecture
10 of 23
https://www.simple-talk.com/sql/database-administration/database-desig...
Posted by:
Posted on:
Message:
Anonymous (not signed in)
Wednesday, February 28, 2007 at 10:46 AM
"These views can sometimes help speed up some select
statements as well as providing a place to relate record level
security."
I wanted to do something like this, but it only speed up the
development - views negatively affects performance. Writing
queries is a struggle, but you'll get good at using alias and
max functions.
Subject:
Posted by:
Posted on:
Message:
Tigger Update?
Anonymous (not signed in)
Wednesday, February 28, 2007 at 10:58 AM
This is really good timing for me. I just had a discussion with
one of our developers about this. I forwarded the link. Thanks.
One minor detail - unless it's not caught - is to use a join with
the inserted table to do the update in the trigger. The other
option is to raise an error in the trigger if there is more than
one record being updated. The use of a local variable in a
trigger normally is a red flag for me.
Subject:
Posted by:
Posted on:
Message:
Normalization
Anonymous (not signed in)
Wednesday, February 28, 2007 at 11:26 AM
In one of our applications, we have implemented something
similar to this. The difference being (for right or wrong) we
have an Archive table with the history and an Actual table with
the current value. I like your approach better and am a true
believer in not deleting data. I cannot tell you how many times
I have been asked who made a change to a record and when
it was changed. With the current processes, these questions
can be answered about 50% of the time.
Upon developing our next application, however, we gave
consideration to the issue of normalization, or in this case,
denormalization. In theory, an attribute value should occur
once and only once in its object's table, regardless of time. If a
table has 30 columns and only one of the columns is updated
(assuming it is worthy of archiving), 29 columns will be
redundant. The issue of disk space, once a major
consideration in data storage (i.e. 6 character dates) is now a
non-issue. Disk space is cheap so just add another hard drive.
Unfortunately, it overlooks the larger issue of normalization,
which should be the major determining factor in data storage.
The other major factor to weight in the archiving process is the
data overhead (storage and retrieval processes). The
overhead increases tremendously when a normal path is
chosen. Instead of row (or record) archiving, history is now
being done at the field level. In order to get a record for an
object at any given point in time, each field's state is
determined by locating the field value as of that date/time and
joining it with the other object's fields. Tables of meta data can
be used to determine which fields will be archived.
Admittedly, this creates much more database management
overhead, but, in some cases, data retrieval may be increased
due to normalization.
That is my two cents worth.
Subject:
Posted by:
PK?
Anonymous (not signed in)
5/4/2015 12:37 AM
Database Design: A Point in Time Architecture
https://www.simple-talk.com/sql/database-administration/database-desig...
Posted on:
Message:
Wednesday, February 28, 2007 at 5:20 PM
I can't spot a consistent PK in the example. That leads me to
surmise that this approach is theoretical and hasn't been
implemented in a real world db yet. Is that right?
Subject:
Posted by:
Posted on:
Message:
Further to the trigger update
Anonymous (not signed in)
Wednesday, February 28, 2007 at 5:24 PM
Good article. I can see the benefits of this design.
Regarding the trigger update however, if you use a variable as
you have done, then when a multi row update occurs, you
won't capture the full extent of the changes.
It is important to remember that a triggers in some databases
behave differently from others.
For instance, SQL Server triggers run once for multiple update
statement, so you have to deal with the data as a set.
Subject:
Posted by:
Posted on:
Message:
Nice Job
Anonymous (not signed in)
Wednesday, February 28, 2007 at 7:41 PM
This was definitely worth the read... nice job, Arthur.
Subject:
Posted by:
Posted on:
Message:
Normalized solution
Anonymous (not signed in)
Thursday, March 1, 2007 at 1:58 AM
If you do the normalization correctly, then you will see that you
will have one table for type1 columns and each type2 column
will be in its own table (unless multiple type2 columns must
change at the same time, in which case they should naturally
be in the same table).
If you can create a table for each group of users showing the
start and end dates they currently use, then a view can be
created for each entity that contains all the many joins needed.
Don't open this view:-) Mosts rdbs will then be able to optimize
the view by compliation, caching and even the addition of
indexes to the view.
Updates (or triggered inserts) will need to understand which
tables to operate on.
This properly normalized logical system will be fast and
efficient because its physical representation will be smaller (no
repeated data), have smaller indexes, and - in general optimal usage of pages.
One additional comment I would make is that the user should
be able to decide whether an update is one that affects the
validity date or not on type2 columns. What could be termed
'data refactoring' where spelling, grammar and clarity are
improved without changing the meaning of the data might not
change the validity date.
Actually, this solution looks so good, I think I will do away with
my snapshot databases and frozen tables and implement it
myself:-)
11 of 23
5/4/2015 12:37 AM
Database Design: A Point in Time Architecture
12 of 23
Subject:
Posted by:
Posted on:
Message:
https://www.simple-talk.com/sql/database-administration/database-desig...
Your comments
Anonymous (not signed in)
Thursday, March 1, 2007 at 8:40 AM
Thank you all for your intelligent and thorough comments. I
haven't got much time at the moment (I'm a witness at a trial
today and have to leave in a moment), but when I return I shall
respond in detail to the questions you've posed.
Thanks again. It's very gratifying to know that so many people
read the piece, and to know that the piece was useful.
Subject:
Posted by:
Posted on:
Message:
Cascading Updates/Deletes
Anonymous (not signed in)
Thursday, March 1, 2007 at 10:06 AM
A very good article. I hadn't considered the cascading updates
and deletes. It seems to me that many applications may
require an ability to relate a current vendor (with a new gold
status)to an older transaction and the historical state of the
vendor at the time of the transaction (only silver status at the
time). It seems very costly in terms of space to duplicate all
related records with every change to the parent table.
Especially if there are many related rows and frequent
changes to the parent.
Subject:
Posted by:
Posted on:
Message:
Tracking "used" records
danjs (view profile)
Thursday, March 1, 2007 at 10:12 AM
We have the need for a PTA in our systems, but I would like to
achieve these use cases and specifications.
For example with storing a person's home address:
- User can enter address information; simple Insert into table
with current date as effective date
- User can change address information;
-- If address information is "unused" then just update data and
don’t change the effective date. This allows a user to correct a
simple data entry error without having to create a new record.
-- If existing address information is "used" then insert new
record with the current date as effective date. This keeps the
audit/history of the change when the information was used for
some purpose.
-- A used flag in the address table indicates if the information
was used for some purpose or not. For example if a product
was shipped to an address then the address information was
"used" and the history must be preserved. Any subsequent
change to the information demands the system insert a new
record instead of updating the existing record.
- User can enter address change to take effect on a given date
(ex. On May 1st 2007, John Doe will be moving from New York
to New Jersey)
-- If an effective date is set with the address change then
insert new record using the provided effective date.
Any experience with such a system?
Subject:
Posted by:
Posted on:
Message:
Commercial Solutions?
Anonymous (not signed in)
Thursday, March 1, 2007 at 12:15 PM
Are there any commercially available solutions that can
achieve some level of auditing?
5/4/2015 12:37 AM
Database Design: A Point in Time Architecture
13 of 23
https://www.simple-talk.com/sql/database-administration/database-desig...
Subject:
Posted by:
Posted on:
Message:
PTA
Anonymous (not signed in)
Thursday, March 1, 2007 at 2:22 PM
Excellent article.
Describes complex PTA architecture in a simple words with a
good examples
Subject:
Posted by:
Posted on:
Message:
Time-oriented SQL databases
Anonymous (not signed in)
Friday, March 2, 2007 at 6:11 AM
What were talking about is a temporal database. Two time
dimensions to consider:
Valid time - the time period during which a fact is true with
respect to the real world ("effective dates")
Transaction time - the time period during which a fact is stored
in the database.
A quick intro here: http://en.wikipedia.org
/wiki/Temporal_database
An entire book on Time-Oriented SQL:
http://www.cs.arizona.edu/people/rts/tdbbook.pdf
A prototpe considers how to implement transaction time in
SQL Server: http://www-users.cs.umn.edu/~mokbel
/ImmortalDBDemo.pdf
But at the end of the day, current SQL databases don't provide
in-built support for this, so we end up having to implement it all
manually using techniques describe in this article. My life
would be *so* much easier if DBMSs provided time-support
out of the box!
In the real world, one big issue with storing loads of versions
of the "same" row of data is performance. Will it run like a dog
due to vastly more data, when most of the time you still only
want the "current" view ?
Subject:
Posted by:
Posted on:
Message:
poor
Anonymous (not signed in)
Friday, March 2, 2007 at 6:16 AM
not even a mention to temporal dbs...
Subject:
Posted by:
Posted on:
Message:
Bitemporal databases
Anonymous (not signed in)
Friday, March 2, 2007 at 7:40 AM
Look at the extensive work of Snodgrass, Jensen et al. on
bitemporal databases:
http://www.cs.auc.dk/TimeCenter/pub.htm
Subject:
Posted by:
Posted on:
Message:
Temporal databases
Anonymous (not signed in)
Friday, March 2, 2007 at 6:10 PM
Those who mention temporal databases not beng exactly fair.
These things only exist in the minds and papers of academics.
No production quality dbms handles temporal databases.
5/4/2015 12:37 AM
Database Design: A Point in Time Architecture
14 of 23
Subject:
Posted by:
Posted on:
Message:
https://www.simple-talk.com/sql/database-administration/database-desig...
Time oriented SQL database
Anonymous (not signed in)
Friday, March 2, 2007 at 6:16 PM
"In the real world, one big issue with storing loads of versions
of the "same" row of data is performance. Will it run like a dog
due to vastly more data, when most of the time you still only
want the "current" view ?"
In a normalized model, the current data is an entity. Each
type2 (in Arthur's terminology) attribute, however, is not part of
the same entity and should be in a table on its own as
proposed in the "Normalized solution" comment above.
This means most of the time, the databse can ignore the
history, and it only takes up the space needed to store
changed data. The database will obviously have a lot more
tables, but these will just be ignored unless history is
requested.
Subject:
Posted by:
Posted on:
Message:
Consistent PK
Anonymous (not signed in)
Saturday, March 3, 2007 at 9:03 AM
"I can't spot a consistent PK in the example. That leads me to
surmise that this approach is theoretical and hasn't been
implemented in a real world db yet. Is that right?"
The listing in Table 1 was intended to illustrate a halfway-there
approach. It lacks the ability to trace the ancestry of an original
PK. That was the point I attempted to make in subsequent
paragraphs. Adding the column OriginalPK and "inheriting" it
in subsequent updated rows eliminates the problem of walking
the chain recursively. Apparently I didn't make this sufficiently
clear. Table 1 was intended as a point in time, so to speak -an initial design that we later discover cannot deal with the
second meaning of history -- tracking Mary O'Hara through her
several residences, marriages and name changes.
Table 2 illustrates the revised version of Table 1, and its
ultimate column shows a consistent PK that travels through all
the updates.
Arthur
Subject:
Posted by:
Posted on:
Message:
History Tables
Anonymous (not signed in)
Saturday, March 3, 2007 at 9:21 AM
Several readers have commented that my examples do not
include History tables, and they are quite correct in pointing
this out. In this piece, I attempted to describe the concept of
PTA and illustrate some of its wrinkles (cascades). Going into
every single detail of the problem would require a book, not an
article, and I'm not prepared to write a whole book on the topic
yet (though it's a thought).
At any rate, the concept of History tables is quite sound, and
was in fact used on the last PTA project I did. In brief, every
PTA table had a mirror table called X_History, where X =
Client or Product or whatever. When a row was updated, the
original was moved to History, leaving only the current row in
the active table.
5/4/2015 12:37 AM
Database Design: A Point in Time Architecture
15 of 23
https://www.simple-talk.com/sql/database-administration/database-desig...
This approach increases performance in some and perhaps
most operations, but when you need to interrogate both the
current and the history tables then you are forced into a
UNION or some other construct to assemble the whole
picture. I'm not suggesting this is a bad thing, but merely
pointing it out.
In the actual case that gave rise to this article, the History
tables were an entire database and resided on another server,
further complicating matters. But what is one to do, given that
the initial footprint was 500MB and the expected growth was
1TB per year? Add to this the fact that large segments of the
data were highly confidential, so there were multiple servers
and firewalls between them and even the programmers were
not allowed to see the actual data. (It was a medical
application. You can see the implications -- I might want to
check if the woman I met last night had been tested for AIDS,
for example.)
But overall, I definitely subscribe to the notion of History
tables, despite the performance hit when assembling the
entire history (current + past). Further, these History tables are
not to be confused with the OLAP version of the data. History
is identical in structure to Current. The OLAP version uses
dimensions to aggregate the data in meaningful ways (i.e. how
many women who have taken drug xyz subsequently
developed kidney ailments? etc.).
So, the full picture of a large database that must adhere to
PTA probably consists of 3 databases: Current, History and
OLAP. The logic of moving the data from C to H to O is
application-specific.
Arthur
Arthur
Subject:
Posted by:
Posted on:
Message:
Tracking "Used" Records
Anonymous (not signed in)
Saturday, March 3, 2007 at 9:30 AM
Danjs,
It sounds like you have an interesting problem to solve. I am
available to assist, for vast amounts of currency. :)
More seriously, your requirements outstrip what I attempted to
present in the article. I tried to present a simple case, to
illuminate the issues involved. If you want assistance on this,
email me privately ([email protected]). The wrinkles are too
wrinkly to explore here.
Subject:
Posted by:
Posted on:
Message:
Thanks
Anonymous (not signed in)
Saturday, March 3, 2007 at 9:33 AM
It is most gratifying to receive so many comments on this
article, and not only for quantity but also for quality. Clearly,
you read it and thought about it and responded with intelligent
consideration. Thank you, all.
5/4/2015 12:37 AM
Database Design: A Point in Time Architecture
16 of 23
https://www.simple-talk.com/sql/database-administration/database-desig...
Arthur
Subject:
Posted by:
Posted on:
Message:
4 posibilities
Anonymous (not signed in)
Sunday, March 4, 2007 at 8:31 AM
We are small software company and we create a lot of
information systems for different users (every year at least 2 or
3 new). PTA is really painful problem and we tried a lot of
options (all are from real applications and every has several
GB and more than 80 tables):
1. Archive table. Every table has archive table (mytable_A,
mytable2_A) and every update is insert into archive table too.
Cons: a lot of triggers and tables.
2. Archive data in table. Every PTA table has compound PK
(ID and IDN). Row with IDN=0 is actual record. When record is
updated, wil be created new archive row (where IDN is next
number in sequence).
3. One archive table for all tables where every column is new
row. Is create table with columns: TABLE_NAME, TABLE_ID,
VERSION, COLUMN_NAME, USERNAME, DATE, intvalue,
numbervalue, datevalue, blobvalue, stringvalue. After update
of any table, new value of every column is inserted into this
table. Cons: a lot of rows in this table
4. One archive table for all tables with XML record. Table has
columns: TABLE_NAME, TABLE_ID, VERSION, USERNAME,
DATE, XML-DATA. XML record is XML serialization in Java.
Subject:
Posted by:
Posted on:
Message:
Re: posibilities
Anonymous (not signed in)
Monday, March 5, 2007 at 10:15 AM
So which of the 4 approaches works best for your clients?
Subject:
Posted by:
Posted on:
Message:
Consistent PK
Anonymous (not signed in)
Monday, March 5, 2007 at 12:38 PM
"Table 2 illustrates the revised version of Table 1, and its
ultimate column shows a consistent PK that travels through all
the updates."
Al. I see it. It is either in the PK column or in the original PK
column depending on which row you look at. How would you
create a relationship with another table having PK or Original
Pk as the FK?
Subject:
Posted by:
Posted on:
Message:
Domains and FK's
ASdanz (view profile)
Tuesday, March 6, 2007 at 2:07 PM
I use domains extensively with ERStudio and do not have any
of the issues you describe with certain attributes like
IDENTITY being suitable only for PK's and not FK's. ERStudio
handles it just file. I'm surprised that you can't get the same
result using ERwin.
I would think that maintaining referential integrity in your PTA
design is going to be problematic at best. The whole notion of
5/4/2015 12:37 AM
Database Design: A Point in Time Architecture
17 of 23
https://www.simple-talk.com/sql/database-administration/database-desig...
choosing a good primary key is to choose a value that will
never (or very rarely) change. (The main reason so many
databases have surrogate keys instead of natural keys for
most primary keys.)
Subject:
Posted by:
Posted on:
Message:
More unsaid than said
Anonymous (not signed in)
Tuesday, March 6, 2007 at 6:03 PM
On the one hand this article doesn't refer to the significant
body of works that deal with temporal database solutions:
Date, Darwen and Lorentzos or Snodgrass for example. On
the other hand the "Type 1" and "Type 2" SCD terms are
appropriated from Kimball (without any attribution) and given
incomplete explanations. I would encourage anyone who is
seriously interested to study the standard works on this topic.
This field is already overburdened with jargon (see Kimball
again). The last thing we need is a new term - "PTA" - for an
old concept!
Subject:
Posted by:
Posted on:
Message:
Interesting - Other Approaches Exist Too
Anonymous (not signed in)
Wednesday, March 7, 2007 at 12:54 PM
Have extensively used what was available through the former
PeopleSoft organization and their solution is significantly
simpler. They kept the idea of the key that never changes and
tacked on the effective date and an "Active/Inactive" indicator.
If you had an employee, with an employee ID, then when I am
first recorded, I have that ID, effective dated and "Active".
When I change my name, I KEEP the same ID, have a new
effective date, and am "Active" - with a new name. When I
eventually retire (hopefully) or am fired (please, no - wait,
consultant and big bucks???), I KEEP the same ID, have a
new effective date, and become "Inactive". This deals with the
"what was the state" of each individual real world object being
modeled within the database. And, separately, they tacked on
what I have over many years lovingly referred to as an "Audit
Stamp" - when it was done, who did it, and, if you need it that
badly, what function was being used.
In this concept, related tables use my employee ID as the
foreign key (which can be a PK type value assigned and
should NOT be any type of social insurance number or other
"natural looking" key).
Respecting what I learned about normalization rules way back,
the idea of (in a logical design, physical implementation might
be different for performance reasons) storing the end date in
this record that matches the begin date of the next record in
the chain makes me double take. This may help with
performance and be legitimate - but I saw the PeopleSoft
applications perform well with large data loads and they did
NOT use the approach - they stuck with the simpler approach.
That the effective date of a record implied that the previous
record along the timeline ended at the time this one came into
being.
Now - having said all that, I will say that this does appear to be
reasonably well thought out from the perspective of writing an
article and I would encourage your hinted at desire to write an
5/4/2015 12:37 AM
Database Design: A Point in Time Architecture
18 of 23
https://www.simple-talk.com/sql/database-administration/database-desig...
entire book - assuming you have the necessary time available
to make such a committment. Also, thanks for the information amazing thinking.
Subject:
Posted by:
Posted on:
Message:
Referential Integrity
Anonymous (not signed in)
Tuesday, March 27, 2007 at 6:05 AM
This article does seem to gloss over a problem which I believe
is central to the physical design of a temporal database, and
that is how referential integrity is enforced. In a non-temporal
relational database (I use Oracle), referential integrity is
enforced by the imposition of FK constraints. If the database
becomes temporal then FK constraints can clearly not be used
in the same way as an FK constraint can only point to one
parent record at a time.
For instance, take the typical employee in a department
scenario with an employee and his department both having a
DateEffective of 01-Jan-2007 and a DateEnd of 31-Dec-2099,
with the employee table having an FK constraint to the PK of
the department table. Were the department record to be
updated today, there would then be two department records,
one with a DateEffective of 01-Jan-2007 and a DateEnd of
26-Mar-2007, the other with a DateEffective of 27-Mar-2007
and a DateEnd of 31-Dec-2099. Whether the PK of the
department table is a surrogate key or the business key +
DateEffective, the employee must now point to two parent
records so clearly the FK constraint cannot be used.
One solution would be, as implied in Arthur's article by his
mention of the PersonAddresses table, to include
many-to-many resolver tables between each of the main
business entities. Another would be to do away with FK
constraints altogether and instead use triggers to check the
existence of parents on insert and update. Either way, the
design of the database is greatly complicated, either by a
proliferation of extra tables and the code to maintain them, or
by the large amount of trigger code required. (One could, of
course, trust (ha!) the application to maintain referential
integrity, but, in my experience, this always ends in tears!)
I would be very interested to hear how other people resolve
this conundrum.
Subject:
Posted by:
Posted on:
Message:
Updating foreign key
Anonymous (not signed in)
Thursday, April 12, 2007 at 7:10 AM
I was working on implementing support for historical records in
an existing DB when I read this post.
My thoughts and experiences about historical data is that a
separate log/history table cause too much administrative work.
E.g. Table changes must be done in two places and retreiving
historical data will require more work.
So this time I decided to try to keep historical data in the
"main" table. Although, this gave me the trouble with changing
all FKs, since all PKs will have to be combined with a "start
time" for current value.
5/4/2015 12:37 AM
Database Design: A Point in Time Architecture
19 of 23
https://www.simple-talk.com/sql/database-administration/database-desig...
Each table that keep historical data has two extra columns: ST
(start time) and ET (end time).
ST will have "current timestamp" as default value and ET will
have NULL as default.
The active values will always have NULL as ET.
Add a trigger to the table (this is TSQL for Microsoft SQL
Server):
ALTER TRIGGER myTrigger
ON myTable
FOR UPDATE
AS
BEGIN
DECLARE @now DATETIME
SET @now = {fn NOW()}
-- Update start time on current row
UPDATE myTable
SET ST = @now
FROM inserted INNER JOIN myTable on myTable.id =
inserted.id AND myTable.ET IS NULL
-- Create history
INSERT INTO myTable
SELECT * FROM deleted
-- Update end time on historical data
UPDATE myTable
SET ET = @now
FROM deleted INNER JOIN myTable on myTable.id =
deleted.id AND myTable.ST = deleted.ST
END
It should be easy to port this to other flavors of SQL.
Subject:
Posted by:
Posted on:
Message:
Updating foreign key
Anonymous (not signed in)
Thursday, April 12, 2007 at 10:59 AM
I was working on implementing support for historical records in
an existing DB when I read this post.
My thoughts and experiences about historical data is that a
separate log/history table cause too much administrative work.
E.g. Table changes must be done in two places and retreiving
historical data will require more work.
So this time I decided to try to keep historical data in the
"main" table. Although, this gave me the trouble with changing
all FKs, since all PKs will have to be combined with a "start
time" for current value.
Each table that keep historical data has two extra columns: ST
(start time) and ET (end time).
ST will have "current timestamp" as default value and ET will
have NULL as default.
The active values will always have NULL as ET.
Add a trigger to the table (this is TSQL for Microsoft SQL
Server):
ALTER TRIGGER myTrigger
5/4/2015 12:37 AM
Database Design: A Point in Time Architecture
20 of 23
https://www.simple-talk.com/sql/database-administration/database-desig...
ON myTable
FOR UPDATE
AS
BEGIN
DECLARE @now DATETIME
SET @now = {fn NOW()}
-- Update start time on current row
UPDATE myTable
SET ST = @now
FROM inserted INNER JOIN myTable on myTable.id =
inserted.id AND myTable.ET IS NULL
-- Create history
INSERT INTO myTable
SELECT * FROM deleted
-- Update end time on historical data
UPDATE myTable
SET ET = @now
FROM deleted INNER JOIN myTable on myTable.id =
deleted.id AND myTable.ST = deleted.ST
END
It should be easy to port this to other flavors of SQL.
Subject:
Posted by:
Posted on:
Message:
Been thinking about this
Anonymous (not signed in)
Thursday, April 26, 2007 at 5:23 PM
Hmm, at first I was not happy about the primary keys
changing, due to having to create new records for all foreign
key references to it.
However the suggestion made to use a many to many
mapping table that stores the parent Primary key (one that
changes) and the childs foreign key is a good one as it
prevents unnecessary row duplication when only the foreign
key reference is changed. This means that if you change the
parents primary key, you only need a trigger to create a new
reference in the many to many table between the new parents
primary key and the old childs foreign key.
If you are worried about space then this will save a huge
ammount.
Negatives
(a) Queries a bit more complex (although consistent)
(b) Each parent and child table requires a date lookup
whereas the space filling alternative in the article has a direct
relationship.
Subject:
Posted by:
Posted on:
Message:
Gotta keep 'em separated
Anonymous (not signed in)
Monday, May 7, 2007 at 9:28 PM
The concept is a great one, and for the most part I agree (as
this is the approach I use myself).
The problem I have is that you want to store 'live' data with
deleted data... Good luck with that. That will not only make
your data muddy, but it will make your stored procedures more
5/4/2015 12:37 AM
Database Design: A Point in Time Architecture
21 of 23
https://www.simple-talk.com/sql/database-administration/database-desig...
complex (or increase exponentially in number).
You should really consider keeping your archived data in
separate tables.
Subject:
Posted by:
Posted on:
Message:
Wow good article
Anonymous (not signed in)
Tuesday, May 15, 2007 at 8:02 AM
Nicely explained in clear and concise text, great job.
I am doing something similar in a project now and am using
most of this.
Thanks
Subject:
Posted by:
Posted on:
Message:
History Tables
Kurt V (not signed in)
Thursday, June 28, 2007 at 3:30 PM
from a previous posting...
"When a row was updated, the original was moved to History,
leaving only the current row in the active table.
This approach increases performance in some and perhaps
most operations, but when you need to interrogate both the
current and the history tables then you are forced into a
UNION or some other..."
Rather than waiting till a record is updated before moving the
original to the history table, why not add the new record to the
history table and update the current table all in one
transaction. Then your history table also includes current and
there is never a need to UNION.
Subject:
Posted by:
Posted on:
Message:
Anonymous Comments Disabled
Sarah Grady (view profile)
Wednesday, December 19, 2007 at 9:57 AM
Due to a high volume of spam, anonymous comments have
been disabled.
Subject:
Posted by:
Posted on:
Message:
Correct use of RTS in PITA
D_George (view profile)
Tuesday, March 15, 2011 at 12:27 PM
This article continues a common, but serious error in the use
of the Replaced Timestamp field. While the article is otherwise
a good introduction to the subject, Table 1: Persons table with
PTA columns and comment is incorrect. Table 1 introduces us
to Mary O’Hara, a frequently married actress, and her
marriage history:
PersonID Given Surname DateCreated DateEffective DateEnd
DateReplaced Operator Code Description
1234 Mary O'Hara 01-Jan-04 01-Jan-04 jlarue Prior to
Wedding
1234 Mary O'Hara 01-Jan-04 01-Jan-04 01-Jun-04 01-Jun-04
jlarue Ends Maiden Name
2345 Mary Roberts 01-Jun-04 02-Jun-04 bhoskins Adopts
New Surname
The RTS field is labelled DateReplaced. In PITA, when this
field is null on a record, the record possesses three
5/4/2015 12:37 AM
Database Design: A Point in Time Architecture
22 of 23
https://www.simple-talk.com/sql/database-administration/database-desig...
characteristics:
1. There was never any point in time in the past when belief in
the correctness of its data ceased;
2. As a corollary, the record is believed at the present time;
3. For the context it represents, the record is the currently
active, i.e. most up-to-date, statement of facts as we know
them.
‘Context’ can refer to some object such as real estate property
or piece of equipment, or to an attribute such as a person’s
surname as in the present case.
When RTS is non-null on a record, its value is the point in
time, expressed as a date and time value, when we stop
believing that the values in the record are correct. Some of the
values may be correct, but we believe the value of at least one
of the fields is not. This extends to fields with null values. For
example, we may have come to believe that the value of
DateEnd in the first record is no longer null as of June 1st,
2004. That means the information contained in the record is
no longer in effect.
As the second record shows, the date the record ceases to be
in effect is the same date on which we stopped believing that
there was no end, namely June 1st, 2004. However the two
dates could be different, and the End Date can even be later
than the RTS.
The problem with the way the author has structured Table 1 is
that, in accordance with the three characteristics just cited
associated with a record whose RTS is null, it appears that the
first record is still in effect, and that we no longer believe the
second record is true. Record 1 does not have a non-null RTS
value, so our belief in it continues. Record 2, on the other
hand, has an RTS value of June 1st, 2004, so that is the date
on which we stopped believing that the information was
effective only between January 1st and June 1st of that year.
This is exactly the reverse of the author’s intention.
However, if we re-write the first two records as follows,
PersonID Given Surname DateCreated DateEffective DateEnd
DateReplaced Operator Code Description
1234 Mary O'Hara 01-Jan-04 01-Jan-04 01-Jun-04 jlarue Prior
to Wedding
1234 Mary O'Hara 01-Jun-04 01-Jan-04 01-Jun-04 jlarue
Ends Maiden Name
2345 Mary Roberts 01-Jun-04 02-Jun-04 bhoskins Adopts
New Surname
Table 1.1
the intention becomes obvious: Prior to June 1st, 2004 Mary’s
last name was believed to remain O’Hara forever. A day later,
however, she married and changed her last name to Roberts.
However, prior to that and since January 1st, her last name
was O’Hara, and we continue to believe that to be the case.
I have a somewhat lengthier document giving further
background and examples on the use of end dates and the
RTS field. If anyone wishes a copy, please email me at my
public address.
5/4/2015 12:37 AM
Database Design: A Point in Time Architecture
23 of 23
Subject:
Posted by:
Posted on:
Message:
https://www.simple-talk.com/sql/database-administration/database-desig...
Bitemporal database
D_George (view profile)
Wednesday, April 13, 2011 at 8:31 AM
There's a useful, if incomplete, discussion and history of the
topic in Wikipedia: http://en.wikipedia.org
/wiki/Temporal_database.
Be sure to read the discussion page too as it points out some
of the deficiencies of the main article. There is a considerable
body of academic writing on the subject of temporal data, lead
as usual by Chris Date and Richard Snodgrass. Your favourite
search engine will present a plethora of hits when queried with
'temporal database'.
About
Help
Privacy policy
Site map
Become an author
Terms and conditions
Newsletters
Contact us
©2005-2015 Red Gate Software Ltd
5/4/2015 12:37 AM