Expand description
§Round-Robin Linux kernel scheduler that runs in user-space
§Overview
This is a fully functional Round-Robin scheduler for the Linux kernel that operates in user-space and it is 100% implemented in Rust.
It dequeues tasks in FIFO order and assigns dynamic time slices, preempting and re-enqueuing tasks to achieve basic Round-Robin behavior.
The scheduler is designed to serve as a simple template for developers looking to implement more advanced scheduling policies.
It is based on scx_rustland_core
, a framework that is specifically designed to simplify the
creation of user-space schedulers, leveraging the Linux kernel’s sched_ext
feature (a
technology that allows to implement schedulers in BPF).
The scx_rustland_core
crate offers an abstraction layer over sched_ext
, enabling developers
to write schedulers in Rust without needing to interact directly with low-level kernel or BPF
internal details.
§scx_rustland_core API
§struct BpfScheduler
The BpfScheduler
struct is the core interface for interacting with sched_ext
via BPF.
-
Initialization:
BpfScheduler::init()
registers the scheduler and initializes the BPF component.
-
Task Management:
dequeue_task()
: Consume a task that wants to run, returns a QueuedTask objectselect_cpu(pid: i32, prev_cpu: i32, flags: u64)
: Select an idle CPU for a taskdispatch_task(task: &DispatchedTask)
: Dispatch a task
-
Completion Notification:
notify_complete(nr_pending: u64)
Give control to the BPF component and report the number of tasks that are still pending (this function can sleep)
Each task received from dequeue_task() contains the following:
struct QueuedTask { pub pid: i32, // pid that uniquely identifies a task pub cpu: i32, // CPU previously used by the task pub flags: u64, // task’s enqueue flags pub sum_exec_runtime: u64, // Total cpu time in nanoseconds pub weight: u64, // Task priority in the range [1..10000] (default is 100) pub nvcsw: u64, // Total amount of voluntary context switches pub slice: u64, // Remaining time slice budget pub vtime: u64, // Current task vruntime / deadline (set by the scheduler) }
Each task dispatched using dispatch_task() contains the following:
struct DispatchedTask { pub pid: i32, // pid that uniquely identifies a task pub cpu: i32, // target CPU selected by the scheduler // (RL_CPU_ANY = dispatch on the first CPU available) pub flags: u64, // task’s enqueue flags pub slice_ns: u64, // time slice in nanoseconds assigned to the task // (0 = use default time slice) pub vtime: u64, // this value can be used to send the task’s vruntime or deadline // directly to the underlying BPF dispatcher }
Other internal statistics that can be used to implement better scheduling policies:
let n: u64 = *self.bpf.nr_online_cpus_mut(); // amount of online CPUs let n: u64 = *self.bpf.nr_running_mut(); // amount of currently running tasks let n: u64 = *self.bpf.nr_queued_mut(); // amount of tasks queued to be scheduled let n: u64 = *self.bpf.nr_scheduled_mut(); // amount of tasks managed by the user-space scheduler let n: u64 = *self.bpf.nr_user_dispatches_mut(); // amount of user-space dispatches let n: u64 = *self.bpf.nr_kernel_dispatches_mut(); // amount of kernel dispatches let n: u64 = *self.bpf.nr_cancel_dispatches_mut(); // amount of cancelled dispatches let n: u64 = *self.bpf.nr_bounce_dispatches_mut(); // amount of bounced dispatches let n: u64 = *self.bpf.nr_failed_dispatches_mut(); // amount of failed dispatches let n: u64 = *self.bpf.nr_sched_congested_mut(); // amount of scheduler congestion events
Modules§
Structs§
Constants§
- SLICE_
NS 🔒
Functions§
- main 🔒
- print_
warning 🔒