2022_12_31_bitemporality

Let’s say we’re a company that provides regular services to other companies, and we periodically bill them for them. We have a system which calculates the billable amounts. We a simple reference system to track our customers. For now, the only attribute we care about is their name. We also assign each company an internal ID. We set up our system on 01-15 (MM-DD), and want to put a company in our database. One of our clients is called ‘Old Name’. It was incorporated on 01-01 (note: we’re not actually tracking incorporation dates). They want to change their name to ‘New name’. They do it on 02-01, but we only find out about it on on 02-15. So we have the following events:

Then when we find out they’ve changed their name on 02-15, we update the table (note the ID remains the same, because it’s the same company, just with a different name):

Even on the surface level there are problems with this: Your system now doesn’t ‘remember’ that the company with ID 1 used to be called ‘Old Name’. If you get correspondence which erroneously refers to ‘Old Name’. Your system won’t be able to help you out, because it doesn’t remember.

Another scenario: say it’s currently 2-20. We want to issue an invoice to the company, but the date of the invoice is 1-30 (the day before the name change happened). For tax reasons, the invoice has to reference the old name of the company. But a) your system doesn’t even ‘remember’ the old name, and b) it couldn’t know that on 1-30 the name was ‘Old Name’.

History tables and Timelines

What we need to maintain, instead of the current state of the entity, is a the history of the entity: An unbroken series, from ‘birth’ to ‘now’ of the attributes of an entity. A common way to implement this is to replace our entity table with a ‘history’ table.

Querying the table for the value as it was on 1-30 (get company 1 1-30) would look something like this:

select * from COMPANY_TABLE
where eid = 1
  and from_date > '01-30'
  and (to_date is null or to_date <= '01-30')

UPDATE COMPANY_TABLE
SET to = '03-01'
WHERE to is Null;

INSERT INTO COMPANY_TABLE (eid, from_date, to_date)
VALUES (1, '03-01', Null, 'New Name 2');

We have had to change record 2 so the ‘to’ date is no longer Null, and inserted the line with uid=3. To generalise this, we can say that what we are creating for each ‘entity’ we care about is a timeline. A timeline-entry has the structure {FROM TO <DATA>}. A timeline is a collection of timeline-entries. [{FROM TO <DATA>}]. There are certain invariants for a timeline:

Mutable History

The update model proposed above is compatible with immutable history. The timeline can only move forward. The only acceptable update is to add a new record onto the ‘end’ of the timeline, with a from date that is after the from data of the previous end. For example, you could not go back and insert a new name for the period 1-15 to 2-15.

But what if you do want to do that? There are certainly real-world circumstances when you might need to do that. And there’s nothing stopping you from doing it. But it is not trivial. Say you want to insert into the timeline ‘Tween Name’, from 1-15 to 2-15:

Notice that both existing records had to be modified: the from and to date respectively. Also, the rid is now ‘out of order’ - i.e. doesn’t follow the actual timeline. If we now insert another record, ‘OW name’, valid from 1-14 to 2-16, we get this

Here, we not only modified records 1 and 2 again, but we actually had to delete record 3. All this deleting and updating-in-place should be starting to make your skin crawl. Writing SQL to do this safely and reliably would not be a fun thing.

There is another problem here as well: The whole point of introducing the history table was to remember the changes that were made, and when. But now we want to change the changes, and we will run into the same problem, a level down: Our system won’t remember that, at some point, we thought the name of the company was ‘Tween Name’ (even though it wasn’t).

Say you’re giving a presentation, and one of the slides (prepared a while ago) shows the company ‘Tween Name’. Someone asks “who is that?” You look in the system, and you see no reference to a company called ‘Tween Name’ - only Old Name, OW Name, and New Name. You have forgotten, again.¹ 1 Of course this example is quite unrealistic - the company name itself is unlikely to change that frequently. But it could easily be the case for other attributes. What we need is to record the history of the history.

Perception and Bitemporality

Up to now we’ve focused on when things ‘actually’ happened: on 1-1 the company was incorporated. On 2-1 the name was changed. What we have ignored is that, in both cases, there is a gap between when the event actually happened and when we recorded it in our system. In the first case, we recorded the incorporation in our system on 1-15, because that’s when they became a customer. In the second case, we recorded the name change on 2-15 because that’s when we (and our system) found out about it.

What we have, in effect, are two separate timelines: One representing events that actually happened, and one which represents when we recorded the effect of those events. Call these two timelines the ‘actual timeline’, and the ‘knowledge timeline’. On a knowledge date basis, you could represent the situation like this.

Aside: What is ‘actual’ date something happened?

I called the first timeline the ‘actual’ timeline. But there’s a bit of subtly there. Consider these two timelines as being observations of the same events from the perspective of two different participants. The first (where the incorporation happened on 1-1 and the name change on 2-1) is the perception of the Company itself, and importantly also of the ‘Company Registry’, who is legally responsible for maintaining company records.

We can refer to these as being the ‘real world’ or ‘actual’ events, but really this is just convention: The group who will agree on this perception includes a recognised ‘authority’ (the Registry, having the force of law), which gives the perception of the Registry a special significance. Most neutral participants would concede that the authority’s perception is the correct one. So we call this the ‘actual’ timeline, for convenience.

The second perception is your system’s. Your system observed the events on 1-15 and 2-15 respectively. This viewpoint can often be relevant to you, because ultimately your business is constantly making decisions based on what your system knows at the time the decision is made.

Ultimately there is nothing fundamentally special or privileged about each of these viewpoints: they are just different perceptions of the same events. We are lucky here that there is an authority whom we can rely on to provide an authoritative ordering of events. Sometimes we won’t be so lucky, and then things can get hard. We are also simplifying things significantly, in that the two perspectives differ in when things are changing, but do not have different perceptions of what is happening, which would add another layer of complexity.

Bitemporality: a product of timelines

We saw that you can represent the two timelines (the actual and knowledge) separately pretty easily. Things get more complicated when you try to represent both at the same time - which is what we need to do to solve the ‘history of the history’ problem posed above. This is the primary problem of bitemporality that the rest of this post will explore.

A matrix representation of two-dimensional time

We have two timelines for our entities: 1. The ‘actual’ timeline (the perception of the authority) 2. The ‘system’ timeline (the perception of the system, really a proxy for the knowledge state of our business)

Things get more complicated when you try to record both timelines at once, because they become a product. You can represent them as a partially-filled matrix:

Thinking in multi-dimensional time is hard, so the semantics of this can be tough to unpack. You can read the ‘rows’ as views of history. So at KD=2-14 we thought the company started existing on 1-1, and from that point up to ‘now’ (2-14) the name was Old Name. But b KD=2-15 we learned that this was incorrect: the company did start existing on 1-1, and was initially named ‘Old Name’, but from 2-1 it was called ‘New Name’.

You can read the columns as history of the our knowledge of the state of the system at a particular date: What did we think the name was at 2-1? Reading down the relevant column, we first admit our ignorance: when KD=12-1 and 1-1 and 1-15, the knowledge dates are before the actual dates. Our system won’t guess the future, so it just says “I don’t know”. At KD=2-1 we thought the name was ‘Old Name’. Note that this is KD=AD. This has special significance, because it represents ‘what we knew at the time’. At KD=2-14 we still thought the name at AD=2-1 was ‘Old Name’. But at KD=2-15 we finally realize that the name at AD=2-1 was actually ‘New Name’. So to coin a few terms, we broadly have three ‘queries’ we can do over two time dimensions:

Histories of histories of histories? No thanks

The problem we set out to solve with bitemporality was that we wanted our timeline to be mutable, but also to be able to remember historical changes we made to the actual timeline. We did that by adding a second time dimension. Thinking in general terms, there is no reason the second timeline can’t also be immutable. That is, you could change when your system found out about things. However this would bring the same problem of ‘forgetting’ changes, and the solution would be the same: Add a third time dimension.

In practice, however, it’s hard to build a real use-case for ‘tritemporal’ time. If reasoning in bitemporal time is hard, reasoning in three time dimensions doesn’t sound attractive. The reality is that, if your two time dimensions can reasonably be called the ‘actual’ and ‘knowledge’ timelines, then you’re probably OK, because your knowledge can be said to be immutable. You don’t suddenly discover that you knew something a week ago. If you knew it a week ago, you knew it a week ago.

So while it’s not absurd to think about higher dimensions of time, we will constrain ourselves to two dimensions.

Alternative to matrix representation: Timeline of Timelines

Matrix representation has the benefits that it doesn’t privilege one timeline over another: you can easily read it either way. It’s also completely immutable: you don’t lose information, and can fully recreate a previous state from the current state.

The downsides are that it can be tough to understand, and that it’s really hard to represent as a data structure. A matrix that is growing in both dimensions all the time isn’t easy to model either in structures (i.e. nested name-values) or tables.

We’ve already seen the timelines structure: TIMELINE = [{FROM TO <DATA>}]. There’s no reason this can’t be extended to a timeline of timelines: TOT = [{FROM TO <TIMELINE>}]. The only real requirement here is that you have to order the dimensions: Do you want a KD-Timeline of AD-Timelines, or an AD-Timeline of KD-Timelines? This impacts access patterns and such. In most cases, the answer is pretty simple: you want a KD-timeline of AD-Timelines. You want to have immutable timeline on the ‘outside’, and the mutable on the inside. A couple of representations:

Timeline of timelines in code

;; timeline
[{:from-date "01-01", :to-date "02-01", :data "Old Name"} 
 {:from-date "02-01", :to-date nil,     :data "New Name"}]
 
;; timelime of timelines
[{:from-date "01-15", :to-date "02-15",
  :data [{:from-date "01-01", :to-date nil, :data "Old Name"}]} 
 {:from-date "02-15", :to-date nil,
  :data [{:from-date "01-01", :to-date "02-01", :data "Old Name"}
         {:from-date "02-01", :to-date nil, :data "New Name"}]}]

;; tot = timeline of timelines
(defn update-timeline-bitemporal [tot new-data from to know-date]
  (update-timeline tot
   (update-timeline (:data (last tot)) new-data from to)
   know-date nil))
   
(defn update-timeline [timeline new-data from-date to-date] 
  (->> timeline
       (mapcat #(eliminate-overlap from-date to-date %))
       (cons {:from-date from-date, :to-date to-date :data new-data})
       (sort-by :from-date)
       vec))

(defn eliminate-overlap [from to entry]
  (case (overlap from to (:from-date entry) (:to-date entry))
    :full    []
    :contain [(assoc entry :to-date from) (assoc entry :from-date to)]
    :left    [(assoc entry :to-date from)]
    :right   [(assoc entry :from-date to)]
    :none    [entry]))
    
(defn overlap [new-from new-to old-from old-to]
  (cond
    (d< old-from new-from new-to old-to)  :contain
    (d<= new-from old-from old-to new-to) :full
    (d<= old-from new-from old-to)        :left
    (d<= old-from new-to old-to)          :right
    :else                                 :none))

The bitemporal update is just updating the outer timeline with the update of the inner timeline. The update itself places the new timeline entry in the timeline, and detects and resolves any overlaps, adjusting dates and deleting entries as required. Note the d< functions are just ordering predicates for whatever representation of dates you’re using.

(update-timeline-bitemporal
  ;; first timeline: when we only knew the old name from 1-15 -> 2-15
 [{:from-date "01-15", :to-date "02-15",
   :data [{:from-date "01-01", :to-date nil, :data "Old Name"}]} 
  ;; second timeline: the current one, after we found out the 
  ;; new name on 2-15
  {:from-date "02-15", :to-date nil,
   :data [{:from-date "01-01", :to-date "02-01", :data "Old Name"}
          {:from-date "02-01", :to-date nil, :data "New Name"}]}]
   ;; adding a third name, valid from 3-1, known on 3-15
   "Third Name" "03-01" nil "03-15")
;;=>
[;; unchanged
 {:from-date "01-15", :to-date "02-15", 
  :data [{:from-date "01-01", :to-date nil, :data "Old Name"}]}
 ;; updated to date. This is now the state of system knowledge until
 ;; 3-15
 {:from-date "02-15",:to-date "03-15",
 :data [{:from-date "01-01", :to-date "02-01", :data "Old Name"} 
        {:from-date "02-01", :to-date nil, :data "New Name"}]}
 ;; new timeline-of-timeline, valid from 3-15
 {:from-date "03-15",:to-date nil,
 :data [{:from-date "01-01", :to-date "02-01", :data "Old Name"}
        ;; inner update has put a 'to date' to the actual change of name
        {:from-date "02-01", :to-date "03-01", :data "New Name"}
        ;; and added the 3rd entry for the third name
        {:from-date "03-01", :to-date nil, :data "Third Name"}]}]
        
(update-timeline-bitemporal
 ;; initial tot is the same
 [{:from-date "01-15", :to-date "02-15",
   :data [{:from-date "01-01", :to-date nil, :data "Old Name"}]}
  {:from-date "02-15", :to-date nil,
   :data [{:from-date "01-01", :to-date "02-01", :data "Old Name"}
          {:from-date "02-01", :to-date nil, :data "New Name"}]}]
 ;; but now we are inserting a name into the history
 "Tween Name" "01-15" "02-15" "03-15")
;;=>
[{:from-date "01-15", :to-date "02-15",
  :data [{:from-date "01-01", :to-date nil, :data "Old Name"}]}
 {:from-date "02-15" :to-date "03-15",
  :data [{:from-date "01-01", :to-date "02-01", :data "Old Name"}
         {:from-date "02-01", :to-date nil, :data "New Name"}]}
 {:from-date "03-15", :to-date nil,
  :data [{:from-date "01-01", :to-date "01-15", :data "Old Name"}
         {:from-date "01-15", :to-date "02-15", :data "Tween Name"}
         {:from-date "02-15", :to-date nil, :data "New Name"}]}]

Adventures in Bitemporality