[LITMUS^RT] prop/sched-domains

Wed Feb 5 14:23:10 CET 2014

On Feb 5, 2014, at 2:52 AM, Björn Brandenburg <bbb at mpi-sws.org> wrote:

> 
> On 05 Feb 2014, at 03:05, Glenn Elliott <gelliott at cs.unc.edu> wrote:
>> 
>> I’ve always found working with clusters in Litmus to be a little clunky.  Here’s an example of how it manifests: rtspin takes two parameters, cluster ID and cluster size.  liblitmus uses this information to automatically set up CPU affinity masks and set rt_task::cpu.  Unfortunately, liblitmus’s algorithm breaks when the CPUs within a cluster are NOT enumerated adjacently.  For example, this happens on UNC’s Ludwig system, where CPUs 0 and 4 share an L2 cache.  liblitmus incorrectly assumes that it is CPUs 0 and 1 that share the L2.  (Generally speaking, I think the algorithm is broken on systems where there are multiple levels of shared caches.)
>> 
>> liblitmus could be made more intelligent by examining cache information in /sys.  However, working directly with /sys is a royal pain-in-the-sys when using C’s file APIs.  Further, liblitmus must still rely on the user to give it cluster ID and cluster size information.  It would be much nicer if a user just had to specify a cluster ID (sched domain ID).  This is much cleaner.
> 
> 
> Hi Glenn,
> 
> thanks a lot for your patches. I haven’t looked at the code carefully yet, but generally agree with the idea. However, I think the old liblitmus approach also had something going for it. By just having a cpu/cluster field (name it however you like), liblitmus didn’t need *any* understanding of clusters, etc. The user simply had to pick any CPU in a cluster (not necessarily the first), and the task was admitted just fine. This had the advantage of keeping liblitmus simple by pushing all clustering logic etc. into the experimental framework(s) that launch the tasks (typically not written in C and much more flexible, also everyone can come up with their own conventions as they please). This worked just fine for the first five years of LITMUS^RT.
> 
> However, if we are going to have all the cluster parsing etc. in C in liblitmus, then the new patches appear to make a lot of sense.
> 
>> […]
>> I plan to submit a patch for liblitmus to use this information rather than its current fragile algorithm.  I’d also like to do away with the rt_task::cpu field and replace it with a domain ID.  I understand the historical reasons for this interface, but it’s really kludgy to have to find a CPU to assign to that field when you’re doing cluster scheduling.
> 
> It used to be you assigned any CPU that’s part of the cluster that you want the task to execute in. Personally, that didn’t offend my taste too much.
> 
>>  liblitmus has some routines to make it less painful, but it’s still ugly.
> 
> What I don’t like about these patches is that they break liblitmus compatibility yet again. I had to fix up a number of repositories depending on liblitmus the last time liblitmus gained cluster awareness instead of the simpler cpu field. Now we are changing the interfaces yet again, so anyone doing anything with liblitmus is going to have to change all their code again. Is the new interface now flexible enough? Or are we going to need yet another API change next year…? Having just a simple cpu/cluster field in the task is maybe not quite as elegant as having a full-blown API, but it requires a lot less code to be maintained in liblitmus and the kernel and is easier to abuse (that is, “extend and adopt to new uses”) since each plugin can freely interpret that field however it chooses.
> 
> To circumvent the naming ugliness, we could make it a union.
> 
> union {
> 	int partition;
> 	int cluster;
> 	int domain;
> 	int container;
> };
> 
> Now everyone can pick whatever name they find appropriate.
> 
> By the way, the new API doesn’t handle the case where you’d want tasks to have arbitrary processor affinities. That’s something that we are working on, so we’ll need yet another hack in the future.
> 
>> One last gripe: I’d also like to do away with the restriction that a task must have set its CPU mask to match its cluster prior to entering real-time mode.  What’s the point?  
> 
> It makes the scheduler simpler. You don’t have to deal with the situation that a real-time task shows up on the “wrong” CPU / in the "wrong" cluster. One corner case less to worry about.

Hi Björn,

I can concede the point that rt_task::cpu is still useful, and we should keep it moving forward.  I also understand why it’s helpful to ensure that a task has already migrated to an appropriate cluster prior to becoming a real-time task.  I hadn’t thought of the case where an otherwise simple plugin would have to implement migration just to handle an initial “wrong-cpu-case” like this.

It seems to me that your concerns focus mainly on breaking liblitmus’s API.  We can keep the old functions and tag them as deprecated.  Under these patches, the clusterSize field would just be ignored.

>> Why can’t scheduler just migrate the task automatically? More frustratingly, the plugins silently reject the task in admit_task()—nothing gets printk()’ed.
> 
> It could. I’m happy to merge well-tested patches that do this on a plugin-by-plugin basis. However, I wouldn’t remove this as a general rule. We have some plugins where we’d like to keep enforcing that restriction.

I can put together a simple patch that puts a printk() on the failure path.

> By the way, plugins should not print anything on a *successful* task admission. Systems configured to redirect printk() to the serial port (e.g., our 64-core server that’s hooked up to RAC) can incur ridiculous latencies when talking to the serial port driver. For Felipe's OSPERT”13 paper, we had to remove some printk()s from the non-error-path because the serial port driver gave us interrupt latencies in the millisecond range (when flushing printk()s, it apparently spins with interrupts off, waiting for the other side to ack writes).
> 
> Thanks,
> Björn

I’m flexible with regards to liblitmus API changes/compatibility.  The case I am trying to make is that plugins should explicitly report their cluster configurations, rather than leave it up to userspace code (or scripts) to deduce it.  The deduction is hard: you have to look at which CPU is release master (if any), look at cache configurations in /sys, examine the clustering configuration in /proc/litmus/plugins/<plugin>/cluster, and understand the inner-workings of the plugin in question.  I am proposing the /proc/litmus interface is cleaner and far more foolproof.

Thanks for the feedback!

-Glenn
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.litmus-rt.org/pipermail/litmus-dev/attachments/20140205/cd5e39d0/attachment.html>