date_trunc function in PostgreSQL.

Recently I needed to write a query in SQL to retrieve a bunch of data from a big table with a specific interval. And I don’t know, is there any easier way in SQL to do this but I found date_trunc function in PostgreSQL that is an excellent solution for my problem.

The problem with an example

Consider the following table.

iddatetimeflightprice
12023-03-07 12:45:12.231069+00AB12300
22023-03-07 12:50:23.341069+00AB12295
32023-03-07 13:10:19.241069+00AB12295
42023-03-07 13:33:45.321069+00AB12299
62023-03-07 13:51:58.311069+00AB12299
72023-03-07 13:58:47.141069+00AB12298
82023-03-07 14:15:19.151069+00AB12286
92023-03-07 14:39:55.221069+00AB12289
102023-03-07 15:48:29.111069+00AB12289
........

And, we want to fetch one row per hour(or the last price for each hour) for flight AB12. The output should be something like this.

datetimeflightprice
2023-03-07 12:00:00.000000+00AB12295
2023-03-07 13:00:00.000000+00AB12298
2023-03-07 14:00:00.000000+00AB12289
2023-03-07 15:00:00.000000+00AB12289

date_trunc function

date_trunc function is simply truncating a datetime object based on an input.
Usage:

date_trunc(_`field`_, _`source`_ [, _`time_zone`_ ])

field can be any value of the following list:

  • microseconds
  • milliseconds
  • second
  • minute
  • hour
  • day
  • week
  • month
  • quarter
  • year
  • decade
  • century
  • millennium

And, source can be a column name or a value.
Example from PostgreSQL documentation:

SELECT date_trunc('hour', TIMESTAMP '2001-02-16 20:38:40');
-- Result: 2001-02-16 20:00:00

SELECT date_trunc('year', TIMESTAMP '2001-02-16 20:38:40');
-- Result: 2001-01-01 00:00:00

SELECT date_trunc('day', TIMESTAMP WITH TIME ZONE '2001-02-16 20:38:40+00');
-- Result: 2001-02-16 00:00:00-05

SELECT date_trunc('day', TIMESTAMP WITH TIME ZONE '2001-02-16 20:38:40+00', 'Australia/Sydney');
-- Result: 2001-02-16 08:00:00-05

SELECT date_trunc('hour', INTERVAL '3 days 02:47:33');
-- Result: 3 days 02:00:00

Solution to my problem

date_trunc is great for working with datetime, but it does not end up here. We can use it in our queries to fetch data with a specific interval based on field argument to this function. The following SQL query gives us the output that we expect.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
SELECT
  DATE_TRUNC('hour', datetime) AS datetime,
  TO_JSON(ARRAY_AGG(flight))->-1 AS flight,
  TO_JSON(ARRAY_AGG(price ORDER BY datetime ASC))->-1,
FROM
    flights
WHERE
    flight = 'AB12'
GROUP BY
    DATE_TRUNC('hour', datetime);

Query explanation:
Line 2: We use date_trunc in the select statement to get a truncated datetime.
Line 3: For each hour we have multiple rows, for example at 13:00 we have four rows of data and we need to choose one of them for output, or use an aggregate function like count or avg, but here we want the last row and we can use to_json and array_agg to get the item we want.
Line 4: Same as the previous line, but we used sorting based on datetime to get the last price of each hour.
And finally, in the last line, we use date_trunc in our group statement to group data based on the hour.
Our output should be something like this:

datetimeflightprice
2023-03-07 12:00:00.000000+00AB12295
2023-03-07 13:00:00.000000+00AB12298
2023-03-07 14:00:00.000000+00AB12289
2023-03-07 15:00:00.000000+00AB12289
comments powered by Disqus