develooper Front page | perl.datetime | Postings from January 2003

Working time zone code!

Thread Next
From:
Dave Rolsky
Date:
January 27, 2003 13:56
Subject:
Working time zone code!
Message ID:
Pine.LNX.4.51.0301271522400.23120@urth.org
I just checked in a lot of new/changed code under
modules/DateTime-TimeZone.  This code actually seems to work, and provides
access to the complete Olson historical data, plus ongoing time zone
changes in the future.

I'd realy appreciate some code review on this, even though it's still a
bit messy.

Here's how it works:

DateTime::TimeZone::OlsonDB contains code to parse the Olson TZ DB files
and represent them internally as objects.  For those who aren't familiar
with these files, here's some background.

These files define several types of data.  The first type is a zone
definition.  This defines the name for a zone and a set of observances for
that zone.  A given zone has 1+ observances, each of which is a time
_span_ during which the time zone has a base GMT offset, a "format" (which
is really a name like EST, sort of), and a start and end time.  However,
every (I think) zone has at least one observance that starts at the
beginning of time, and one that ends at the end of time (-inf & +inf).
This _may_ be the same observance.

Observances may have one rule.  A rule is a set of date times _or_ time
spans, during which an additional offset may be applied (like during DST)
to the base GMT offset.  Rules also affect a zone's short name.  For
example, the most recent observance for the "America/Chicago" zone defines
its "format" as "C%sT", which as you can see is intended to be fed into
sprintf.  Rules may have an optional "letter" assigned to them.  So for
example, the rule that defines DST for the US (part of the US set of
rules) has the letter "D", which when combined with the "C%sT" format
produces "CDT".

However, some rules may have no letter, and some observances may have no
rules.  And some rules may only happen once (like the peace time and war
time rules for the US).

These various types of data are represented via the following classes,

  DateTime::TimeZone::OlsonDB::Zone
  DateTime::TimeZone::OlsonDB::Observance
  DateTime::TimeZone::OlsonDB::Rule

The whole lot is a DateTime::TimeZone::OlsonDB object.  So an OlsonDB
object contains many Zone and Rule objects.  Zone objects in turn contain
1+ Observance objects.

But none of these things are particular useful, since they're just raw
data, and the rules and observances interact in complicated ways.

So what you really want to know is "when do changes happen in a zone".
Changes may be a change from one rule to another inside a single
observance, or a change from one observance to another, which may or may
not change rules at the same time!

Because both observances and rules may be spans, what we really want to
produce out of all of this is a set of time spans that combine _both_
observances and rules.

So there is a DateTime::TimeZone::OlsonDB::Change object.  This can only
be created after an entire Olson DB file has been read, since rules and
observances can be defined in any order.  A change is basically the
datetime of the change (UTC), a resolved short name (the observance format
+ optional rule's letter(s)), plus an observance and optional rule.

So here's a somewhat typical usage (for something that has only been used
by me ;):

  my $db = DateTime::TimeZone::OlsonDB->new;
  $db->parse_file( ... );

  foreach my $name ( $db->zone_names )
  {
      my $zone = $db->expanded_zone($name);

      foreach my $change ( $zone->sorted_changes )
      {
          # do something interesting with this
      }
  }

Now, for the purposes of calculating a given datetime's offset, we don't
want to just know when changes happen, we really need to know the time
span for a given set of data (GMT offset and the sort name).

So the script in tools/parse_olson takes pairs of changes and generates
data about time spans.

This script is used to create the module files for all the time zones.
The module files are pre-populated with a set of known spans.  If the last
span does not end with infinity (and most don't), then there is also a
subroutine that generates more spans based on the "infinite" rules (rules
with no specified ending date).

A span looks like this (this one is for Dar es Salaam):

   {
    'short_name' => 'BEAUT',
    'end_date' => '19601201024445',
    'start_date' => '19471201030000',
    'offset' => 9885
   },

The start and end dates have been converted to numbers in the hopes that
speeds up comparisons.  Some start or end dates may be negative or
positive infinity.  The end time for a given span is always the start time
time for the next span.

The data is all stored in a red-black tree (basically a binary tree with
built-in balancing).  The keys for the nodes are spans [ start, end ] and
the values are the whole span hash reference (seen above).

The red-black tree has a custom comparison function which returns a match
if the given datetime is >= start and < end (note that it is inclusive on
the left side _only_).  If the datetime is < start then it's considered
less than the node.  If the datetime is >= end, it is greater than.

So if we search the tree and find a span for a given datetime, we have our
offset and short name.  If we don't find one, we call the generator method
to create more spans until there is a match.  Those new spans are added to
the tree to cache them.  We could also generate extra spans after the
match, which might be a smart optimization.


Anyway, my limited testing shows that this all works, which amazed me.
I'd welcome code review, suggestions, optimization ideas, bug reports,
etc.


There's also the sticky wicket that this code requires DateTime.pm to
work, while DateTime.pm will require DateTime::TimeZone.  This will make
the first release tricky.


-dave

/*=======================
House Absolute Consulting
www.houseabsolute.com
=======================*/

Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About