[LITMUS^RT] prop/sched-domains

Björn Brandenburg bbb at mpi-sws.org
Wed Feb 5 08:52:48 CET 2014


On 05 Feb 2014, at 03:05, Glenn Elliott <gelliott at cs.unc.edu> wrote:
> 
> I’ve always found working with clusters in Litmus to be a little clunky.  Here’s an example of how it manifests: rtspin takes two parameters, cluster ID and cluster size.  liblitmus uses this information to automatically set up CPU affinity masks and set rt_task::cpu.  Unfortunately, liblitmus’s algorithm breaks when the CPUs within a cluster are NOT enumerated adjacently.  For example, this happens on UNC’s Ludwig system, where CPUs 0 and 4 share an L2 cache.  liblitmus incorrectly assumes that it is CPUs 0 and 1 that share the L2.  (Generally speaking, I think the algorithm is broken on systems where there are multiple levels of shared caches.)
> 
> liblitmus could be made more intelligent by examining cache information in /sys.  However, working directly with /sys is a royal pain-in-the-sys when using C’s file APIs.  Further, liblitmus must still rely on the user to give it cluster ID and cluster size information.  It would be much nicer if a user just had to specify a cluster ID (sched domain ID).  This is much cleaner.


Hi Glenn,

thanks a lot for your patches. I haven’t looked at the code carefully yet, but generally agree with the idea. However, I think the old liblitmus approach also had something going for it. By just having a cpu/cluster field (name it however you like), liblitmus didn’t need *any* understanding of clusters, etc. The user simply had to pick any CPU in a cluster (not necessarily the first), and the task was admitted just fine. This had the advantage of keeping liblitmus simple by pushing all clustering logic etc. into the experimental framework(s) that launch the tasks (typically not written in C and much more flexible, also everyone can come up with their own conventions as they please). This worked just fine for the first five years of LITMUS^RT.

However, if we are going to have all the cluster parsing etc. in C in liblitmus, then the new patches appear to make a lot of sense.

> […]
> I plan to submit a patch for liblitmus to use this information rather than its current fragile algorithm.  I’d also like to do away with the rt_task::cpu field and replace it with a domain ID.  I understand the historical reasons for this interface, but it’s really kludgy to have to find a CPU to assign to that field when you’re doing cluster scheduling.

It used to be you assigned any CPU that’s part of the cluster that you want the task to execute in. Personally, that didn’t offend my taste too much.

>  liblitmus has some routines to make it less painful, but it’s still ugly.

What I don’t like about these patches is that they break liblitmus compatibility yet again. I had to fix up a number of repositories depending on liblitmus the last time liblitmus gained cluster awareness instead of the simpler cpu field. Now we are changing the interfaces yet again, so anyone doing anything with liblitmus is going to have to change all their code again. Is the new interface now flexible enough? Or are we going to need yet another API change next year…? Having just a simple cpu/cluster field in the task is maybe not quite as elegant as having a full-blown API, but it requires a lot less code to be maintained in liblitmus and the kernel and is easier to abuse (that is, “extend and adopt to new uses”) since each plugin can freely interpret that field however it chooses.

To circumvent the naming ugliness, we could make it a union.

union {
	int partition;
	int cluster;
	int domain;
	int container;
};

Now everyone can pick whatever name they find appropriate.

By the way, the new API doesn’t handle the case where you’d want tasks to have arbitrary processor affinities. That’s something that we are working on, so we’ll need yet another hack in the future.

> One last gripe: I’d also like to do away with the restriction that a task must have set its CPU mask to match its cluster prior to entering real-time mode.  What’s the point?  

It makes the scheduler simpler. You don’t have to deal with the situation that a real-time task shows up on the “wrong” CPU / in the "wrong" cluster. One corner case less to worry about.

> Why can’t scheduler just migrate the task automatically? More frustratingly, the plugins silently reject the task in admit_task()—nothing gets printk()’ed.

It could. I’m happy to merge well-tested patches that do this on a plugin-by-plugin basis. However, I wouldn’t remove this as a general rule. We have some plugins where we’d like to keep enforcing that restriction.

By the way, plugins should not print anything on a *successful* task admission. Systems configured to redirect printk() to the serial port (e.g., our 64-core server that’s hooked up to RAC) can incur ridiculous latencies when talking to the serial port driver. For Felipe's OSPERT”13 paper, we had to remove some printk()s from the non-error-path because the serial port driver gave us interrupt latencies in the millisecond range (when flushing printk()s, it apparently spins with interrupts off, waiting for the other side to ack writes).

Thanks,
Björn



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.litmus-rt.org/pipermail/litmus-dev/attachments/20140205/42680492/attachment.html>


More information about the litmus-dev mailing list