In the world of APIs nobody gives direct access to his resources because you never know how much your services are going to be used. If you start thinking about limiting access to your APIs, a lot of things come to mind. So what is API throttling, API quota, API rate limiting and API burst?
API Rate Limiting / API Throttling
If you start looking at and end to end scenario, you first have an overall limit of calls your backend can process per time unit. This is often measured by TPS (Transaction per Second). In some cases, systems might also have a physical limit of data that can be transferred in Bytes. For example your backend might be able to process 2000 TPS (Transaction per Second). This is called Backend Rate Limiting.
Very often multiple clients get an overall rate limit they are allowed to send called Application Rate Limiting. If they are starting to send too many requests, their connection gets throttled which means the processing slows down but does not disconnect. This allows to keep the connection open and help with keeping errors down. There is a risk of connections timing out and for sure you risk to keep connections longer which might open a vector for Denial of Service Attacks.
Sometimes you want to enable a single client to send more than its actual limit because your system has bandwidth or is idle. This is called API peak. Sometimes clients cannot control the API calls that are emitted. This is where API burst can help. It allows your client to send a certain amount of traffic more than usual. For example you allow your client to send 20 TPS but they send 30 transactions which process very fast. Maybe your systems are able to consume the load. From an implementation perspective, Leaky Bucket might be a known algorithm.
Looking more at a commercial aspect and long term consumption of calls and data, API quotas are used a lot. API quotas usually describe a certain amount of calls for longer intervals. For example your API quota might be 5.000 calls per month. Remember that this could be combined with a rate limit or throttling setup e.g. 20 TPS (Transactions per Second).
To enforce an API quota you need to identify the client or consumer, therefore the term user quota (aka organization quota) is used. Usually this is where API Management solutions help. Consumers come in, select a certain plan which has a quota attached. Quite often you also find a SLA attached which defines the response times and availability of the service. This is important from a consumer side but it’s also important for the provider to keep an eye on when the API is your product.
If you look inside API quota in detail you can also imagine that you not only set a limit based on an overall client / consumer but also on a per consuming application level (e.g. per API Key), this is what we call application quota. But even further you could go and limit certain methods or calls. Reason for this might be that those calls would consume more compute power on your backend.