Expand description
§SCX Load Calculator
A crate providing abstractions for calculating load between scheduling domains.
§Load Balancing Primer
Let’s begin by doing a brief overview of load balancing. In order to understand load balancing, we’ll introduce the relevant terms, and explain what load balancing is from there:
Weight
A positive integer value representing a task’s priority in the system. The meaning of weight is ultimately up to the caller to determine and apply, but conceptually speaking weight is a linear scaling factor of a task’s perceived load (defined below) in the system.
Duty Cycle
A task’s duty cycle is the proportion of time that it could have used a CPU in some time window. For example, say that a task does nothing but spin in a tight loop for 10ms, and then sleep for 10ms. Then in a 20ms time window, its duty cycle would be 0.5.
Note that this is not the same thing as the proportion of time it’s runnable. For example, if you had two such tasks sharing a CPU, one of them may wait for 10ms to use the core while the other one runs, and then it would run for its 10ms duty cycle. It was runnable for 20ms, but could only actually use the CPU for 10ms, so its duty cycle was 10ms / 20ms = 0.5.
Load
A scheduling entity’s load l_x is simply the product of its weight and duty cycle:
l_x = w_x * d_x
Infeasible Weights
At a conceptual level, the goal of a load balancer is of course to balance load across the system. If a scheduling entity has load l_x, and the total sum of all entities’ loads on the system is L, then entity x should receive l_x / L proportion of the available CPU capacity on the system.
This formulation works fine in many cases, but can break down when an entity’s proportion of load (and thus allotted CPU capacity) exceeds the amount of CPU it can consume.
For example, say you were on a system with 2 sets of 2-core groups (for a total of 4 cores on the system), eight tasks with the following duty cycle and weight:
ID Weight Duty Cycle Load
0 1 1.0 1 1 1 1.0 1 2 1 1.0 1 3 1 1.0 1 4 1 1.0 1 5 1 1.0 1 6 1 1.0 1 7 10000 1.0 10000
The load sum L of the system is 1 * 7 + 1 * 10000 = 10007. This means that tasks 0-6 have a load proportion of 1 / 10007 ~= 0.0001, and task 7 has a load proportion of 0.9993. In other words, tasks 0-6 are entitled to 0.0001 * 4 = 0.0004 CPUs worth of capacity, and task 7 is entitled to 0.9993 * 4 = 3.9972 CPUs worth of capacity.
Task 7 can only consume at most 1.0 CPU due to its duty cycle being 1.0, so its weight is “infeasible” as the amount of CPU capacity it would be afforded exceeds what it can actually use.
What we instead want is to find an adjusted weight w’ that we can assign to Task 7 such that it gets its full duty cycle of 1.0, but allows the remaining 3 cores worth of compute to be equally spread amongst the remaining tasks. The task of finding this weight w’ is known as the “infeasible weights problem”, and is solved by this crate.
More details on load balancing and the infeasible weights problem are provided in the following Google Drive document:
https://drive.google.com/file/d/1fAoWUlmW-HTp6akuATVpMxpUpvWcGSAv
§Using the Crate
This crate has two primary sets of APIs:
(1) APIs for aggregating domain loads specified by the caller (2) APIs for querying those loads, after being adjusted for infeasibility
§LoadAggregator
The object that addresses (1) is the LoadAggregator. This object receives load passed by the user, and once all load has been received, runs the numbers to create load sums and weights that are adjusted for infeasibility as described above. LoadAggregator objects can be created and used as follows:
Assume we’re on a 16-core (32-CPU) host with two core complexes:
use scx_utils::LoadAggregator;
use log::info;
let mut aggregator = LoadAggregator::new(32, false);
// Create a LoadAggregator object, specifying the number of CPUs on the
// system, and whether it should only aggregate duty cycle.
let mut aggregator = LoadAggregator::new(32, false);
// Domain 0, weight 1, has duty cycle 1.0.
aggregator.record_dom_load(0, 1, 1.0);
// Domain 1, weight 1, has duty cycle 1.0.
aggregator.record_dom_load(1, 1, 1.0);
// ...
aggregator.record_dom_load(31, 1, 1.0);
// ...
aggregator.record_dom_load(63, 1, 1.0);
// Domain 64, weight 10000, has duty cycle 1.0.
aggregator.record_dom_load(64, 10000, 1.0);
// Note that it is allowed to record load for a domain more than once.
// For a given domain you may only record load for a specific weight
// once, but you may record loads for multiple different weights for a
// single domain.
// Create the LoadLedger object
let ledger = aggregator.calculate();
// Outputs: 66.06451612903226
info!("{}", ledger.global_load_sum());
In the above example, we have 65 domains, all with duty cycle 1.0. 64 of them have weight 1, and one of them has weight 10000. If there were multiple tasks per domain, we would record different or additional values. For example, if we had two tasks with weight 1 in domain 0, and an additional task with weight 100 in domain 1, we would record their loads as follows:
// Assume the same aggregator as above.
// In this version, domain 0 has 2 tasks with weight 1.0 and duty cycle
// 1.0.
aggregator.record_dom_load(0, 1, 2.0);
// In this version, domain 1 also has a task with weight 100 and duty
// cycle 1.0.
aggregator.record_dom_load(1, 100, 1.0);
Note that the abstractions here are meant to be generic across schedulers. LoadAggregator assumes nothing about the scheduling domains being passed to it. For example, they may span layers defined in a scheduler, domains specified by the user via cpumask strings, or domains that span L3 caches.
§LoadLedger
Once you have fed all load values to the LoadAggregator, you can use it to calculate load sums (including adjusting for weight infeasibility), and create a read-only LoadLedger object that can be used to query the values as described in (2).
There are various values of interest that can be queried from a LoadLedger object. For example, you may ask for the sum of load (adjusted for infeasibility) for the whole system, or the sum of duty cycle for the whole system, or the sum of load for each domain (adjusted for infeasibility):
use scx_utils::LoadAggregator;
use log::info;
let mut aggregator = LoadAggregator::new(32, false);
aggregator.record_dom_load(0, 1, 1.0);
// ...
aggregator.record_dom_load(63, 1, 1.0);
aggregator.record_dom_load(64, 10000, 1.0);
let ledger = aggregator.calculate();
info!("Global load sum: {}", ledger.global_load_sum());
info!("Global duty cycle sum: {}", ledger.global_dcycle_sum());
let dom_dcycles = ledger.dom_dcycle_sums();
let dom_loads = ledger.dom_dcycle_sums();
let effective_max_weight = ledger.effective_max_weight();
// ...