Rate limits
Requests are rate-limited per token. Every response carries headers describing your current budget so you can pace yourself instead of guessing.
Headers
| Header | Meaning |
|---|---|
X-RateLimit-Limit | Max requests allowed in the current window. |
X-RateLimit-Remaining | Requests left in the current window. |
X-RateLimit-Reset | When the window resets (epoch seconds). |
When you exceed the limit you get a 429 with the same headers — X-RateLimit-Reset
tells you when to try again.
Backing off
Respect Retry-After/X-RateLimit-Reset on a 429, and use exponential backoff with jitter for
429/5xx:
import time, random, requests
def call_with_backoff(do_request, max_attempts=5):
for attempt in range(max_attempts):
resp = do_request()
if resp.status_code not in (429, 500, 502, 503, 504):
return resp
# Prefer the server's hint; otherwise exponential backoff with jitter.
reset = resp.headers.get("X-RateLimit-Reset")
if resp.status_code == 429 and reset:
wait = max(0, int(reset) - int(time.time()))
else:
wait = (2 ** attempt) + random.random()
time.sleep(wait)
return resp # caller handles the final failure
Staying under the limit
- Page larger, call less — use
limit: 100on searches instead of many small pages. - Project fields — request only the
fieldsyou need to keep responses (and your processing) lean. - Cache read-mostly data (pipelines, stage metadata) rather than refetching per operation.
- Spread bulk work — for imports/exports, watch
X-RateLimit-Remainingand slow down as it approaches zero rather than sprinting into a429. - One token per workload — limits are per token, so isolate a heavy batch job behind its own token to avoid starving your interactive traffic.