[LITMUS^RT] prop/sched-domains

Wed Feb 5 03:05:52 CET 2014

Hi Everyone,

I’ve always found working with clusters in Litmus to be a little clunky.  Here’s an example of how it manifests: rtspin takes two parameters, cluster ID and cluster size.  liblitmus uses this information to automatically set up CPU affinity masks and set rt_task::cpu.  Unfortunately, liblitmus’s algorithm breaks when the CPUs within a cluster are NOT enumerated adjacently.  For example, this happens on UNC’s Ludwig system, where CPUs 0 and 4 share an L2 cache.  liblitmus incorrectly assumes that it is CPUs 0 and 1 that share the L2.  (Generally speaking, I think the algorithm is broken on systems where there are multiple levels of shared caches.)

liblitmus could be made more intelligent by examining cache information in /sys.  However, working directly with /sys is a royal pain-in-the-sys when using C’s file APIs.  Further, liblitmus must still rely on the user to give it cluster ID and cluster size information.  It would be much nicer if a user just had to specify a cluster ID (sched domain ID).  This is much cleaner.

With this in mind, I have a few patches that I’d like to contribute to make working with clusters a bit easier.  I’ve pushed a branch to github called "prop/sched-domains”.  What do these patches do?  It updates the plugins to export information about their clusters (aka scheduling domains) to /proc/litmus.

New directories:
* /proc/litmus/cpus
* /proc/litmus/domains

New files:
* /proc/litmus/cpus/<CPU #>: There is one file for each CPU (except the release master).  The files take the name of the CPU for which they express information.  That is, the file for CPU 0 is just “0”.  The contents of the file is a bitmask of the domains that schedule that CPU.  Suppose you are using C-EDF to make the following CPU clusters: {CPU 0, 1, 2, 3} and {4, 5, 6, 7}.  Files /proc/litmus/cpus/0..3 would contain the mask “00001”, indicating that these CPUs are managed by the first cluster (cluster 0).  /proc/litmus/cpus/4..7 contain the mask “00002”, indicating that they are managed by the second cluster (cluster 1).
* /proc/litmus/domains/<DOMAIN #>: There is one file for each scheduling domain.  The file name is the domain ID.  The contents of the file is a bitmask of the CPUs scheduled by that domain.  For example, for cluster 0 from earlier, it’s mask would be “0000f”, indicating that it manages CPUs 0 - 3.  For cluster 1, it’s mask would be “000f0”.  Note, these bitmasks do NOT include the release master.  Further, when cluster size is 1 and release master is active, no file gets set up for the empty-set domain, and the domains are enumerated transparently (there’s no gap in the cluster numbering scheme).

I like this interface because its:
1) Clean.
2) Supports arbitrary clusters.
3) Provides a path forward for working with containers (every container could have its own file in /proc/litmus/domains/).  (Not that anyone is planning on starting container work anytime soon…)

I plan to submit a patch for liblitmus to use this information rather than its current fragile algorithm.  I’d also like to do away with the rt_task::cpu field and replace it with a domain ID.  I understand the historical reasons for this interface, but it’s really kludgy to have to find a CPU to assign to that field when you’re doing cluster scheduling.  liblitmus has some routines to make it less painful, but it’s still ugly.

One last gripe: I’d also like to do away with the restriction that a task must have set its CPU mask to match its cluster prior to entering real-time mode.  What’s the point?  Why can’t scheduler just migrate the task automatically?  More frustratingly, the plugins silently reject the task in admit_task()—nothing gets printk()’ed.

Comments and suggestions for improving the patches are welcome.

Here’s a link for the branch: https://github.com/LITMUS-RT/litmus-rt/compare/prop;sched-domains
(Note: The branch is based off of the staging branch.)

Thanks,
Glenn