Rate limiting bursty workloads

If a system is required to handle short bursts of requests from the same users, the previous post strategy will not work. One solution in this case is what is called a Token Bucket. Each user starts with n tokens available, each requests consumes a token. Upon reaching 0 tokens, requests are denied. Tokens are replenished in a steady rate.

This post shows a implementation of rate limiting aimed at averaging 10 requests per second, with bursts of up to 10 requests (refilling at a rate of 1 token per second).

create unlogged table rate_limit(       -- 1
  id     int      not null,
  tokens smallint not null
                  default 9             -- 2
                  check(tokens >= 0 and
                        tokens <= 10),
  primary key (id)
);

create index on rate_limit
 using btree(tokens)
 where tokens < 10;                     -- 3

1: Unlogged table for faster writes; Please do change all your tables to unlogged without understanding what you lose =);

2: Inserting a new row occurs when a request is processed, tokens defaults to 9 (allowing 10 requests in total: the current one plus 9 remaining);

3: Partial indexing: keep (indexed) only rows that should be refilled.

Some SQL-fu, run for each request:

\set id random(0, 10000)                -- 1

with ins_or_upt as
(
     insert into rate_limit as x (id)
     values (:id)                       -- 2
         on conflict(id) do             -- 3
     update set tokens = x.tokens - 1
      where x.id = excluded.id
        and x.tokens > 0
  returning tokens                      -- 4
)
select tokens
  from ins_or_upt
 union all                              -- 5
select 0
 limit 1

1: The above snippet is used with pgbench later, hence the use of \set command;

2: Using the variable set at line 1;

3: If a row already exists for :id, update instead;

4: Returns the number of remaining tokens, useful for the user;

5: Always return exactly one row containing the remaining tokens for :id.

We can consume tokens, but they are never refilled (yet). The system's backend could, of course, periodically issue a statement to refill tokens one by one, but that's not this blog's theme. Enter pg_cron, an extension that does just that: periodically executes a statement. After installing it, we just schedule the following:

select cron.schedule(
  '1 second',
  $$ update rate_limit
        set tokens = tokens + 1
      where tokens < 10 $$
);

Done. This refills 1 token per second for rows with less than 10 tokens, capping at 10 (full bucket after 10 seconds of inactivity). All of this with no backend code.

Just for fun, I ran the insert script via pgbench using a single connection, single client, single thread, simulating 10k concurrent users.

docker exec -i pg pgbench \
       --time=30 \
       --client=1 \
       --jobs=1 \
       --protocol=prepared \
       --no-vacuum \
       --file=insert.sql

You're right, that's not a proper benchmark. Still, achieving ~20k transactions per second (tps) with my (okayish) setup and stock PostgreSQL configuration makes me wonder why some (NoSQL) database systems have such widespread adoption.

tps-go-up.avif
Figure 1: me changing a table to unlogged

If you are wondering about how to run this yourself, download this files:

Place everything in the same directory and run: docker compose up and then execute pgbench.

2026-01-02