Skip to main content

Rate limits

Requests are rate-limited per token. Every response carries headers describing your current budget so you can pace yourself instead of guessing.

Headers

HeaderMeaning
X-RateLimit-LimitMax requests allowed in the current window.
X-RateLimit-RemainingRequests left in the current window.
X-RateLimit-ResetWhen the window resets (epoch seconds).

When you exceed the limit you get a 429 with the same headers — X-RateLimit-Reset tells you when to try again.

Backing off

Respect Retry-After/X-RateLimit-Reset on a 429, and use exponential backoff with jitter for 429/5xx:

import time, random, requests

def call_with_backoff(do_request, max_attempts=5):
for attempt in range(max_attempts):
resp = do_request()
if resp.status_code not in (429, 500, 502, 503, 504):
return resp
# Prefer the server's hint; otherwise exponential backoff with jitter.
reset = resp.headers.get("X-RateLimit-Reset")
if resp.status_code == 429 and reset:
wait = max(0, int(reset) - int(time.time()))
else:
wait = (2 ** attempt) + random.random()
time.sleep(wait)
return resp # caller handles the final failure

Staying under the limit

  • Page larger, call less — use limit: 100 on searches instead of many small pages.
  • Project fields — request only the fields you need to keep responses (and your processing) lean.
  • Cache read-mostly data (pipelines, stage metadata) rather than refetching per operation.
  • Spread bulk work — for imports/exports, watch X-RateLimit-Remaining and slow down as it approaches zero rather than sprinting into a 429.
  • One token per workload — limits are per token, so isolate a heavy batch job behind its own token to avoid starving your interactive traffic.