Recently I needed to write a query in SQL to retrieve a bunch of data from a big table with a specific interval. And I don’t know, is there any easier way in SQL to do this but I found date_trunc function in PostgreSQL that is an excellent solution for my problem.
The problem with an example
Consider the following table.
| id | datetime | flight | price |
|---|---|---|---|
| 1 | 2023-03-07 12:45:12.231069+00 | AB12 | 300 |
| 2 | 2023-03-07 12:50:23.341069+00 | AB12 | 295 |
| 3 | 2023-03-07 13:10:19.241069+00 | AB12 | 295 |
| 4 | 2023-03-07 13:33:45.321069+00 | AB12 | 299 |
| 6 | 2023-03-07 13:51:58.311069+00 | AB12 | 299 |
| 7 | 2023-03-07 13:58:47.141069+00 | AB12 | 298 |
| 8 | 2023-03-07 14:15:19.151069+00 | AB12 | 286 |
| 9 | 2023-03-07 14:39:55.221069+00 | AB12 | 289 |
| 10 | 2023-03-07 15:48:29.111069+00 | AB12 | 289 |
| .. | .. | .. | .. |
And, we want to fetch one row per hour(or the last price for each hour) for flight AB12.
The output should be something like this.
| datetime | flight | price |
|---|---|---|
| 2023-03-07 12:00:00.000000+00 | AB12 | 295 |
| 2023-03-07 13:00:00.000000+00 | AB12 | 298 |
| 2023-03-07 14:00:00.000000+00 | AB12 | 289 |
| 2023-03-07 15:00:00.000000+00 | AB12 | 289 |
date_trunc function
date_trunc function is simply truncating a datetime object based on an input.
Usage:
date_trunc(_`field`_, _`source`_ [, _`time_zone`_ ])
field can be any value of the following list:
microsecondsmillisecondssecondminutehourdayweekmonthquarteryeardecadecenturymillennium
And, source can be a column name or a value.
Example from PostgreSQL documentation:
SELECT date_trunc('hour', TIMESTAMP '2001-02-16 20:38:40');
-- Result: 2001-02-16 20:00:00
SELECT date_trunc('year', TIMESTAMP '2001-02-16 20:38:40');
-- Result: 2001-01-01 00:00:00
SELECT date_trunc('day', TIMESTAMP WITH TIME ZONE '2001-02-16 20:38:40+00');
-- Result: 2001-02-16 00:00:00-05
SELECT date_trunc('day', TIMESTAMP WITH TIME ZONE '2001-02-16 20:38:40+00', 'Australia/Sydney');
-- Result: 2001-02-16 08:00:00-05
SELECT date_trunc('hour', INTERVAL '3 days 02:47:33');
-- Result: 3 days 02:00:00
Solution to my problem
date_trunc is great for working with datetime, but it does not end up here. We can use it in our queries to fetch data with a specific interval based on field argument to this function.
The following SQL query gives us the output that we expect.
| |
Query explanation:
Line 2: We use date_trunc in the select statement to get a truncated datetime.
Line 3: For each hour we have multiple rows, for example at 13:00 we have four rows of data and we need to choose one of them for output, or use an aggregate function like count or avg, but here we want the last row and we can use to_json and array_agg to get the item we want.
Line 4: Same as the previous line, but we used sorting based on datetime to get the last price of each hour.
And finally, in the last line, we use date_trunc in our group statement to group data based on the hour.
Our output should be something like this:
| datetime | flight | price |
|---|---|---|
| 2023-03-07 12:00:00.000000+00 | AB12 | 295 |
| 2023-03-07 13:00:00.000000+00 | AB12 | 298 |
| 2023-03-07 14:00:00.000000+00 | AB12 | 289 |
| 2023-03-07 15:00:00.000000+00 | AB12 | 289 |