Data Science Data Management

Understanding Hyperfunctions in TimescaleDB

By DSAITrends editors
August 11, 2021

The team at Timescale last week posted a detailed primer about the design principles and decisions behind its new hyperfunctions on the popular TimescaleDB.

TimescaleDB is an open-source relational database designed for time-series data, which are essentially data points indexed in time order. Whether for massive data collection from IoT devices with millions of data points per second or specialized data analytics, a time-series database (TSDB) can scale into the petabytes.

Under the hood, data in TimescaleDB is manipulated using SQL with its corresponding ease of use by both new and experienced SQL users, incorporating powerful, multi-node scalability enabled by being engineered on top of PostgreSQL.

You can read more about how TSDB can be used to store and retrieve data for financial trading, tracking cryptocurrency prices, or determining the virality of social media posts here. The Timescale team says there are currently three million active TimescaleDB databases across different industries.

TimescaleDB hyperfunctions

So where do hyperfunctions come in? Launched just last month, hyperfunctions are a series of functions within TimescaleDB that can be called using SQL to manipulate and analyze time-series data in PostgreSQL with fewer lines of code.

Some of the new hyperfunctions were outlined in this blog, and offer advanced capabilities such as calculating time-weighted averages, performing percentile analyses, tallying up distinct counts, computing various aggregated statistics, or smoothening datasets.

Inspired by how PostgreSQL implements aggregation, the Timescale team decided to implement a two-step aggregate design pattern for greater efficiency, distinguishing between parameters, enabling easier rollups with logically consistent results, and for easier retrospective analysis, says Timescale developer David Kohn.

In a nutshell, the Timescale team designed a system where users are unlikely to write low-performance code unintentionally, ensuring higher performance.

“[Our] goal is to provide a high level of developer productivity that enhances other PostgreSQL and TimescaleDB features, like aggregate deduplication and continuous aggregates,” wrote Kohn.

In the blog, he walked through examples of how the two-step aggregate design enhanced the implementations of hyperfunctions such as the time_weight() aggregate, average() accessor, and rollup() functions.

For now, the two-step aggregate approach is here to stay. It is worth noting that it isn’t used solely in the “stabilized” hyperfunctions but is also in various experimental features.

Image credit: iStockphoto/solar737