Let’s say we’re a company that provides regular services to other companies, and we periodically bill them for them. We have a system which calculates the billable amounts. We a simple reference system to track our customers. For now, the only attribute we care about is their name. We also assign each company an internal ID. We set up our system on 01-15 (MM-DD), and want to put a company in our database. One of our clients is called ‘Old Name’. It was incorporated on 01-01 (note: we’re not actually tracking incorporation dates). They want to change their name to ‘New name’. They do it on 02-01, but we only find out about it on on 02-15. So we have the following events:
Our first table, when we set it up on 01-15, might look like this:
id name
1 Old Name
Then when we find out they’ve changed their name on 02-15, we update the table (note the ID remains the same, because it’s the same company, just with a different name):
id name
1 New Name
Even on the surface level there are problems with this: Your system now doesn’t ‘remember’ that the company with ID 1 used to be called ‘Old Name’. If you get correspondence which erroneously refers to ‘Old Name’. Your system won’t be able to help you out, because it doesn’t remember.
Another scenario: say it’s currently 2-20. We want to issue an invoice to the company, but the date of the invoice is 1-30 (the day before the name change happened). For tax reasons, the invoice has to reference the old name of the company. But a) your system doesn’t even ‘remember’ the old name, and b) it couldn’t know that on 1-30 the name was ‘Old Name’.
What we need to maintain, instead of the current state of the entity, is a the history of the entity: An unbroken series, from ‘birth’ to ‘now’ of the attributes of an entity. A common way to implement this is to replace our entity table with a ‘history’ table.
COMPANY_TABLE
// state on 1-15
uid eid from to name
1 1 1-1 Null Old Name
// state on 2-15
uid eid from to name
1 1 1-1 2-1 Old Name
2 1 2-1 Null New Name
Querying the table for the value as it was on 1-30
(get company 1 1-30
) would look something like this:
select * from COMPANY_TABLE
where eid = 1
and from_date > '01-30'
and (to_date is null or to_date <= '01-30')
Changing the name to ‘New Name 2’ on 3-1 would look like this
UPDATE COMPANY_TABLE
SET to = '03-01'
WHERE to is Null;
INSERT INTO COMPANY_TABLE (eid, from_date, to_date)
VALUES (1, '03-01', Null, 'New Name 2');
COMPANY_TABLE
uid eid from_date to_date name
1 1 1-1 2-1 Old Name
2 1 2-1 3-1 New Name
3 1 3-1 Null New Name 2
We have had to change record 2 so the ‘to’ date is no longer
Null
, and inserted the line with uid=3
. To
generalise this, we can say that what we are creating for each ‘entity’
we care about is a timeline. A timeline-entry has the
structure {FROM TO <DATA>}
. A timeline is a
collection of timeline-entries. [{FROM TO <DATA>}]
.
There are certain invariants for a timeline:
to
to the last
from
which is not covered by a record.from
to
The update model proposed above is compatible with immutable history.
The timeline can only move forward. The only acceptable update is to add
a new record onto the ‘end’ of the timeline, with a from date that is
after the from data of the previous end. For example, you could
not go back and insert a new name for the period 1-15
to
2-15
.
But what if you do want to do that? There are certainly real-world circumstances when you might need to do that. And there’s nothing stopping you from doing it. But it is not trivial. Say you want to insert into the timeline ‘Tween Name’, from 1-15 to 2-15:
COMPANY_TABLE
uid eid from to name
1 1 1-1 1-15 Old Name
3 1 1-15 2-15 Tween Name
2 1 2-15 Null New Name
Notice that both existing records had to be modified: the from and to date respectively. Also, the rid is now ‘out of order’ - i.e. doesn’t follow the actual timeline. If we now insert another record, ‘OW name’, valid from 1-14 to 2-16, we get this
COMPANY_TABLE
uid eid from to name
1 1 1-1 1-14 Old Name
4 1 1-14 2-16 OW Name
2 1 2-16 Null New Name
Here, we not only modified records 1 and 2 again, but we actually had to delete record 3. All this deleting and updating-in-place should be starting to make your skin crawl. Writing SQL to do this safely and reliably would not be a fun thing.
There is another problem here as well: The whole point of introducing the history table was to remember the changes that were made, and when. But now we want to change the changes, and we will run into the same problem, a level down: Our system won’t remember that, at some point, we thought the name of the company was ‘Tween Name’ (even though it wasn’t).
Say you’re giving a presentation, and one of the slides (prepared a while ago) shows the company ‘Tween Name’. Someone asks “who is that?” You look in the system, and you see no reference to a company called ‘Tween Name’ - only Old Name, OW Name, and New Name. You have forgotten, again.1 1 Of course this example is quite unrealistic - the company name itself is unlikely to change that frequently. But it could easily be the case for other attributes. What we need is to record the history of the history.
Recall the timeline of events:
Up to now we’ve focused on when things ‘actually’ happened: on 1-1 the company was incorporated. On 2-1 the name was changed. What we have ignored is that, in both cases, there is a gap between when the event actually happened and when we recorded it in our system. In the first case, we recorded the incorporation in our system on 1-15, because that’s when they became a customer. In the second case, we recorded the name change on 2-15 because that’s when we (and our system) found out about it.
What we have, in effect, are two separate timelines: One representing events that actually happened, and one which represents when we recorded the effect of those events. Call these two timelines the ‘actual timeline’, and the ‘knowledge timeline’. On a knowledge date basis, you could represent the situation like this.
COMPANY_TABLE_KNOWLEDGE
uid eid from to name
1 1 1-15 NULL Old Name
COMPANY_TABLE_KNOWLEDGE
uid eid from to name
1 1 1-15 2-15 Old Name
2 1 2-15 Null New Name
I called the first timeline the ‘actual’ timeline. But there’s a bit of subtly there. Consider these two timelines as being observations of the same events from the perspective of two different participants. The first (where the incorporation happened on 1-1 and the name change on 2-1) is the perception of the Company itself, and importantly also of the ‘Company Registry’, who is legally responsible for maintaining company records.
We can refer to these as being the ‘real world’ or ‘actual’ events, but really this is just convention: The group who will agree on this perception includes a recognised ‘authority’ (the Registry, having the force of law), which gives the perception of the Registry a special significance. Most neutral participants would concede that the authority’s perception is the correct one. So we call this the ‘actual’ timeline, for convenience.
The second perception is your system’s. Your system observed the events on 1-15 and 2-15 respectively. This viewpoint can often be relevant to you, because ultimately your business is constantly making decisions based on what your system knows at the time the decision is made.
Ultimately there is nothing fundamentally special or privileged about each of these viewpoints: they are just different perceptions of the same events. We are lucky here that there is an authority whom we can rely on to provide an authoritative ordering of events. Sometimes we won’t be so lucky, and then things can get hard. We are also simplifying things significantly, in that the two perspectives differ in when things are changing, but do not have different perceptions of what is happening, which would add another layer of complexity.
We saw that you can represent the two timelines (the actual and knowledge) separately pretty easily. Things get more complicated when you try to represent both at the same time - which is what we need to do to solve the ‘history of the history’ problem posed above. This is the primary problem of bitemporality that the rest of this post will explore.
We have two timelines for our entities: 1. The ‘actual’ timeline (the perception of the authority) 2. The ‘system’ timeline (the perception of the system, really a proxy for the knowledge state of our business)
Things get more complicated when you try to record both timelines at once, because they become a product. You can represent them as a partially-filled matrix:
AD=Actual Date
KD=Knowledge Date
X=not in system
O=old name
N=new name
AD→ 12 1 1 2 2 2
| | | | | |
KD 1 1 1 1 1 1
↓ 5 4 5
12-1 X
1-1 X X
1-15 X O O
2-1 X O O O
2-14 X O O O O
2-15 X O O N N N
Thinking in multi-dimensional time is hard, so the semantics of this
can be tough to unpack. You can read the ‘rows’ as views of history. So
at KD=2-14
we thought the company started existing on 1-1,
and from that point up to ‘now’ (2-14) the name was Old Name. But b
KD=2-15
we learned that this was incorrect: the company did
start existing on 1-1, and was initially named ‘Old Name’, but from 2-1
it was called ‘New Name’.
You can read the columns as history of the our knowledge of the state
of the system at a particular date: What did we think the name was at
2-1? Reading down the relevant column, we first admit our ignorance:
when KD=12-1 and 1-1 and 1-15
, the knowledge dates are
before the actual dates. Our system won’t guess the future, so
it just says “I don’t know”. At KD=2-1
we thought the name
was ‘Old Name’. Note that this is KD=AD
. This has special
significance, because it represents ‘what we knew at the time’. At
KD=2-14
we still thought the name at AD=2-1
was ‘Old Name’. But at KD=2-15
we finally realize that the
name at AD=2-1
was actually ‘New Name’. So to coin a few
terms, we broadly have three ‘queries’ we can do over two time
dimensions:
Here are some more examples:
AD KD
get 1-1 1-03 => nothing
get 1-1 1-16 => Old Name
get 2-1 2-01 => Old Name (A=K: cotemporal)
get 2-1 2-16 => New Name (A<K: retrospective)
get 2-1 1-16 => Old Name OR Error (A>K: predictive)
get 3-1 2-15 => New Name OR Error
The problem we set out to solve with bitemporality was that we wanted our timeline to be mutable, but also to be able to remember historical changes we made to the actual timeline. We did that by adding a second time dimension. Thinking in general terms, there is no reason the second timeline can’t also be immutable. That is, you could change when your system found out about things. However this would bring the same problem of ‘forgetting’ changes, and the solution would be the same: Add a third time dimension.
In practice, however, it’s hard to build a real use-case for ‘tritemporal’ time. If reasoning in bitemporal time is hard, reasoning in three time dimensions doesn’t sound attractive. The reality is that, if your two time dimensions can reasonably be called the ‘actual’ and ‘knowledge’ timelines, then you’re probably OK, because your knowledge can be said to be immutable. You don’t suddenly discover that you knew something a week ago. If you knew it a week ago, you knew it a week ago.
So while it’s not absurd to think about higher dimensions of time, we will constrain ourselves to two dimensions.
Matrix representation has the benefits that it doesn’t privilege one timeline over another: you can easily read it either way. It’s also completely immutable: you don’t lose information, and can fully recreate a previous state from the current state.
The downsides are that it can be tough to understand, and that it’s really hard to represent as a data structure. A matrix that is growing in both dimensions all the time isn’t easy to model either in structures (i.e. nested name-values) or tables.
We’ve already seen the timelines structure:
TIMELINE = [{FROM TO <DATA>}]
. There’s no reason this
can’t be extended to a timeline of timelines:
TOT = [{FROM TO <TIMELINE>}]
. The only real
requirement here is that you have to order the dimensions: Do
you want a KD-Timeline of AD-Timelines, or an AD-Timeline of
KD-Timelines? This impacts access patterns and such. In most cases, the
answer is pretty simple: you want a KD-timeline of AD-Timelines. You
want to have immutable timeline on the ‘outside’, and the
mutable on the inside. A couple of representations:
k-from k-to
1-15 2-15
Name a-from a-to
Old Name 1-1 Null
K-from k-to
2-15 Null
Name a-from a-to
Old Name 1-1 2-1
New Name 2-1 Null
EID=1
[{1-15 2-15 [{1-1 Null Old-Name}]},
{2-15 Null [{1-1 2-1 Old-Name},
{2-1 Null New-Name}]}]
The datastructure for timelines in clojure are basically as above:
;; timeline
:from-date "01-01", :to-date "02-01", :data "Old Name"}
[{:from-date "02-01", :to-date nil, :data "New Name"}]
{
;; timelime of timelines
:from-date "01-15", :to-date "02-15",
[{:data [{:from-date "01-01", :to-date nil, :data "Old Name"}]}
:from-date "02-15", :to-date nil,
{:data [{:from-date "01-01", :to-date "02-01", :data "Old Name"}
:from-date "02-01", :to-date nil, :data "New Name"}]}] {
This implementation of this would be:
;; tot = timeline of timelines
defn update-timeline-bitemporal [tot new-data from to know-date]
(
(update-timeline tot:data (last tot)) new-data from to)
(update-timeline (nil))
know-date
defn update-timeline [timeline new-data from-date to-date]
(->> timeline
(mapcat #(eliminate-overlap from-date to-date %))
(cons {:from-date from-date, :to-date to-date :data new-data})
(sort-by :from-date)
(vec))
defn eliminate-overlap [from to entry]
(case (overlap from to (:from-date entry) (:to-date entry))
(:full []
:contain [(assoc entry :to-date from) (assoc entry :from-date to)]
:left [(assoc entry :to-date from)]
:right [(assoc entry :from-date to)]
:none [entry]))
defn overlap [new-from new-to old-from old-to]
(cond
(:contain
(d< old-from new-from new-to old-to) :full
(d<= new-from old-from old-to new-to) :left
(d<= old-from new-from old-to) :right
(d<= old-from new-to old-to) :else :none))
The bitemporal update is just updating the outer timeline
with the update of the inner timeline. The update itself places
the new timeline entry in the timeline, and detects and resolves any
overlaps, adjusting dates and deleting entries as required. Note the
d<
functions are just ordering predicates for whatever
representation of dates you’re using.
(update-timeline-bitemporal;; first timeline: when we only knew the old name from 1-15 -> 2-15
:from-date "01-15", :to-date "02-15",
[{:data [{:from-date "01-01", :to-date nil, :data "Old Name"}]}
;; second timeline: the current one, after we found out the
;; new name on 2-15
:from-date "02-15", :to-date nil,
{:data [{:from-date "01-01", :to-date "02-01", :data "Old Name"}
:from-date "02-01", :to-date nil, :data "New Name"}]}]
{;; adding a third name, valid from 3-1, known on 3-15
"Third Name" "03-01" nil "03-15")
;;=>
;; unchanged
[:from-date "01-15", :to-date "02-15",
{:data [{:from-date "01-01", :to-date nil, :data "Old Name"}]}
;; updated to date. This is now the state of system knowledge until
;; 3-15
:from-date "02-15",:to-date "03-15",
{:data [{:from-date "01-01", :to-date "02-01", :data "Old Name"}
:from-date "02-01", :to-date nil, :data "New Name"}]}
{;; new timeline-of-timeline, valid from 3-15
:from-date "03-15",:to-date nil,
{:data [{:from-date "01-01", :to-date "02-01", :data "Old Name"}
;; inner update has put a 'to date' to the actual change of name
:from-date "02-01", :to-date "03-01", :data "New Name"}
{;; and added the 3rd entry for the third name
:from-date "03-01", :to-date nil, :data "Third Name"}]}]
{
(update-timeline-bitemporal;; initial tot is the same
:from-date "01-15", :to-date "02-15",
[{:data [{:from-date "01-01", :to-date nil, :data "Old Name"}]}
:from-date "02-15", :to-date nil,
{:data [{:from-date "01-01", :to-date "02-01", :data "Old Name"}
:from-date "02-01", :to-date nil, :data "New Name"}]}]
{;; but now we are inserting a name into the history
"Tween Name" "01-15" "02-15" "03-15")
;;=>
:from-date "01-15", :to-date "02-15",
[{:data [{:from-date "01-01", :to-date nil, :data "Old Name"}]}
:from-date "02-15" :to-date "03-15",
{:data [{:from-date "01-01", :to-date "02-01", :data "Old Name"}
:from-date "02-01", :to-date nil, :data "New Name"}]}
{:from-date "03-15", :to-date nil,
{:data [{:from-date "01-01", :to-date "01-15", :data "Old Name"}
:from-date "01-15", :to-date "02-15", :data "Tween Name"}
{:from-date "02-15", :to-date nil, :data "New Name"}]}] {
Persisting bitemporal timelines in a SQL Database