Something not as obvious is the relationship between service time, utilisation, and response time. Even rough estimations of that can be very helpful, but most people I speak to don't even know there is a useful relationship -- some even think service time and response time are the same.
Come to think of it, the obvious general principles of queueing theory have several consequences that aren't immediately intuitive to people:
- A system where concurrency is limited (practically all systems) often bottlenecks on its slowest component, meaning almost any upgrade will do nothing to improve its performance.
- The average time a task is stuck waiting is longer than the average waiting time, once there's significant variation (greater than Poisson) in arrivals.
- Based on only local measurements in an auxiliary, asynchronous component you can determine global throughput for the whole system.
If we're intellectually honest, how many hours of studying Markov chains does any one insight there really justify? And what are the odds that any one insight is useful even while dealing with an honest-to-goodness queue? We're not exactly talking e^{i\pi} levels of "wow!" which almost justify teaching complex numbers just to hit people with the one equation.
The power is in the sheer number of semi-trivial observations that a queue theorist can start making after seeing only a small part of the system. And that is impressive - but mostly because you don't need to consider all those individual things as variables once the basic theory is understood. So the theorist can start ignoring all those variables really quickly and move on to dealing with the problem at hand.
You should really check out the recent Aperture[1] project on GitHub that applies all these ideas in practice to protect services from cascading failures.
1. Aperture automatically detects queue buildup based on metrics such as latency.
2. Adjusts the concurrency on a service.
3. Weighted Fair Scheduling of workloads (i.e. APIs) based on their labels.
Come to think of it, the obvious general principles of queueing theory have several consequences that aren't immediately intuitive to people:
- A system where concurrency is limited (practically all systems) often bottlenecks on its slowest component, meaning almost any upgrade will do nothing to improve its performance.
- The average time a task is stuck waiting is longer than the average waiting time, once there's significant variation (greater than Poisson) in arrivals.
- Based on only local measurements in an auxiliary, asynchronous component you can determine global throughput for the whole system.