Those were the days
04, MayIn this article I talk about dates that sometimes do and sometimes don't exist.
Discover the difference between SQL Server and Excel, and PowerShell.
Two missing days
Day 0
In MS Excel, days are saved as a number. Just type in a number in any cell, and set the type to date. The integer part of the number becomes a date, the decimal part becomes the time. The number 0 will be transformed into january 0 1900!
Day 1, to MS Excel, is january 1 1900.
To SQL Server, day 0 is january 1 1900, or 1900-01-01.
SELECT CAST(0 AS Datetime) AS 'SQL Server Day 0'
SELECT CAST(CAST('1900-1-1' AS Datetime) AS INT) AS 'SQL Server jan 1 1900 number'
A leap year that wasn't
If you remember the criteria to find leap years, you know that 1900 was not a leap year, because it is evely divided by 100 and not by 400.
TimeAndDate
Wikipedia
The developers who created Lotus123 took the decision to leave an error in the calculation of dates. Because there was no more than 640KB memory available, a completely faultless calculation would become too resource expensive. The choice they made was to treat the year 1900 as a leap year, so february 29 1900, or 1900-02-29, was considered a valid date. The developers who created MS Excel then held on to this decision, mainly because their concern was to make sure Lotus spreadsheets could be imported in MS Excel.
The developers who created SQL Server ( or its precursors ) chose from the beginning to deliver a correct calculation, meaning february 29 1900, or 1900-02-29, is not a valid date.
SELECT CAST(CAST('1900-2-29' AS Datetime) AS INT) AS 'SQL Server feb 29 1900 number'
Excel vs. SQL
Those two differences make for a one day shift for any date from january 1 1900 to february 29 1900, and a two day shift from march 1 1900 onwards.
INT | Excel | SQL Server |
---|---|---|
0 | 0 jan 1900 | 1 jan 1900 |
1 | 1 jan 1900 | 2 jan 1900 |
32 | 1 feb 1900 | 2 feb 1900 |
59 | 28 feb 1900 | 1 mar 1900 |
60 | 29 feb 1900 | 2 mar 1900 |
61 | 1 mar 1900 | 3 mar 1900 |
42641 | 28 sep 2016 | 30 sep 2016 |
Datekey
Presenting a date as an integer has its advantages. In a Data Warehouse it's considered a best practice. We usually create a date table, which offers endless possibilities by JOINing to sort, filter or group on all contained date properties. Examples are: year, month, week, day of week, weekend, holiday, ISO year, fiscal year, quarter, season,...
But all kinds of databases can profit from this practice. We see date tables in OLTP databases just as well.
On the www we find all kinds of methods to calculate DateKey columns:
- Some use the numbers in a date to concatenate them to an integer, like december 25 2015 becomes 12252015,
- That becomes 25122015 European Style,
- or 20151225 if they prefer ISO style.
- UNIX adepts may prefer to choose jan 1 1970 as a starting date.
- Some people choose their own starting date, from a not so far past if that suits their needs, like jan 1 2000.
The Power DBA's DateKey
I've always used the Excel numbers. The reason was I often exported results to MS Excel, and in older versions it was easier to work with numbers masked as dates. Importing dates to MS Excel and then creating pivot charts on them, for instance, was more error prone.
Since I don't need any dates older than 20 years, I can always create a date table with the DateKey being:
SELECT DATEDIFF(day, '1899-12-30', <mydate>)
By choosing a date two days before 1900-01-01, the DATEDIFF results in the MS Excel number for the date.
As explained above, this is one day to compensate for the difference in day 0, and another day for the shift because of the non-existing leap day in 1900.
If you do need dates from the year 1900, you will have to find another expression, or update a selection of dates to fulfill your wishes.
Be careful
Warning: Because of the existing multiple methods, it's wise to never let an application handle the conversions by its defaults!
Who knows what method any new version of MS Word, MS Access, Power BI or any new software will use?
Maybe Regional Settings or future updates with other defaults will eventually cause errors.
The important part is to use the appropriate method to convert back and forth. That means choosing the method that is used by the application you get your data from or submit it to, or reverting the conversion you made with the exact pendant method.
Whenever we need to convert a DateKey back to a date, we use:
SELECT DATEADD(day, <datekey>, '1899-12-30')
That is, of course, only correct if we used 1899-30-12 as base date for calculating the datekey before!
The dangerous part is that conversions will not fail ( unless you have 1900-02-29 in your set ), but just be off by 1 or 2 days. When mixing ETL, views or applications, it may result in having some correct dates, and some off. In many cases that is difficult to spot, but your data can be wrong! So double check all your date calculations and make sure you always use the same base!
By the way, a date table contains also the date column, which is just as suitable as a JOIN column. The use of a datekey is not mandatory.
And PowerShell?
Working with dates in PowerShell deserves a whole book on its own.
We can format, transform, disassemble and reconstruct dates.
As SQL Server, PowerShell knows 1900-02-29 never existed, and throws an error if you want to work with this non-existing date:
If we want to create a DateKey in PowerShell, we can do so. Subtract one date from another.
Disclaimer: The following examples are just for illustration. They're not best practice and need error handling and more. They're not suited for production use.
Use 1900-01-01 if you want the SQL Server numbers:
or, simpler:
Beware, those methods are dependent on your locale settings.
If you really need to use this, you'll have to write something a bit more elaborate, but that's for another article.
We can also calculate the date from the integer we know: