In the world of APIs, nobody gives direct access to his resources because you never know how much your services are going to be used. If you start thinking about limiting access to your APIs, a lot of things come to mind. So what are API throttling, API quota, API rate limiting and API burst?
API rate limiting/API throttling
If you start looking at and an end to end scenario, you first have an overall limit of calls your backend can process per time unit. This is often measured by TPS (Transaction per Second). Sometimes, systems might also have a physical limit of data that can be transferred in Bytes. For example, your backend might be able to process 2000 TPS (Transaction per Second). This is called Backend Rate Limiting.
Often, multiple clients get an overall rate limit which they are allowed to send called an Application Rate Limiting. If they are starting to send too many requests, their connection gets throttled. This means the processing slows down but does not disconnect. This allows to keep the connection open and help with keeping errors down. There is a risk of connections timing out and for sure you risk keeping connections longer which might open a vector for Denial of Service Attacks.
Sometimes you want to enable a single client to send more than its actual limit because your system has the bandwidth or is idle. This is called the API peak. Sometimes clients cannot control the API calls that are emitted. This is where API burst can help. It allows your client to send a certain amount of traffic more than usual. For example, you allow your client to send 20 TPS but they send 30 transactions which process very fast. Maybe your systems are able to consume the load. From an implementation perspective, Leaky Bucket might be a known algorithm.
Looking more at a commercial aspect and long-term consumption of calls and data, API quotas are used a lot. API quotas usually describe a certain amount of calls for longer intervals. For example, your API quota might be 5.000 calls per month. Remember that this could be combined with a rate limit or throttling setup e.g. 20 TPS (Transactions per Second).
To enforce an API quota, you need to identify the client or consumer, therefore the term user quota (aka organization quota) is used. Usually, this is where API Management solutions help. Consumers come in, select a certain plan which has a quota attached. Quite often you also find an SLA attached which defines the response times and availability of the service. This is important from a consumer side, but it’s also important for the provider to keep an eye on when the API is your product.
If you look inside API quota in detail you can also imagine that you not only set a limit based on an overall client/consumer, but also on a per consuming application level (e.g., per API Key), this is what we call application quota. But even further you could go and limit certain methods or calls. The reason for this might be that those calls would consume more compute power on your backend.
Besides throttling and quota management, there are other ways to modernize your API strategy. Read the white paper to learn how.