[LITMUS^RT] Hash-Based Tie-Breaking for EDF

Mon Aug 6 03:22:17 CEST 2012

Hi All,

I've run into an interesting issue in Litmus in my particular system use-case.  Suppose you have a task set with implicit deadlines and all tasks have the same period.  Further suppose that job releases are in sync.  You might see a task set like this with media processing.  In my case, it was computer vision processing of video streams.

Let's say we schedule our system with *-EDF.  Currently, Litmus tie-breaks jobs with equal deadlines by PID.  Tasks with lower PIDs get priority over those with higher PIDs.  Basically, the system becomes a static-priority system on synchrounsly-released sets of jobs.  Though tardiness may or may not be bounded, the jobs with greater PIDs always suffer the greatest levels of tardiness. This can make analysis of task set runs cumbersome since there can be a large skew in average response times of tasks.  I am of the opinion that if all tasks have the same periods and all have the same execution behaviors, it would be nice if all tasks had the same average response time (at least in the soft real-time case).

How do we tie-break if not by PID?

We could try to tie-break by release-time, but this doesn't work when all tasks share the same period.  We could try prioritizing jobs with the latest completion time of their prior job over others, but this requires carrying state information from the prior job into the next; this seems cumbersome.

I'd like to propose the following solution:  Instead of tie-breaking by PID, let's instead generate a single hash for the key (pid, job number)-- that is, the pair makes up the hash key (from a crypto-perspective, PID is the salt and job number is the input).  With deadlines, this gives us a total sorted order of all active jobs.  Presuming that the hash function is uniform, response time skew due to PID-base tie-breaks should be eliminated.  In theory, tardiness should be averaged out across all tasks.

I see several issues to consider with hash-based tie breaking:
1) Repeatability in experiments will be more challenging since PIDs change from run to run.  (Bugs are also harder to reproduce.)
2) I realize that "arbitrary tie-breaking" is not the same thing as "randomized tie-breaking."  Introducing randomness is usually the last thing you want to add to a real-time system, but perhaps this is okay for tie-breaking.
3) Real-time analysis may sometimes require a consistent tie-breaking method.  Hash-based tie-breaking breaks such an underlying assumption.  Perhaps Jeremy can comment as to if this is a real concern.
4) I suspect that per-task response-time jitter may increase under this approach.  However, in systems like the one I described earlier, there should be less variance among average response times of each task.

I've attached a prototype patch (based upon August 2012 Litmus) that implements hash-based tie-breaking for EDF.  It uses hash functions already implemented by Linux (include/linux/hash.h).  In the unlikely event of a hash collision, the tie-break falls back to a PID-based break.

Hash-based tie-breaking is enabled with a compile-time option (and is _enabled_ by default).

There are two issues with the implementation that I haven't fully worked out:
1) Should hashes be generated on job release?  Hashes are currently computed every time on-the-fly in edf_higher_prio(); results are then discarded.  This is an optimistic approach, since hashing overhead is only incurred as needed.  However, I wonder if it might be better to generate a hash on job release and cache it in the job parameters.  Note that hashes are computed by bit-shifts, so overheads should still be low.
2) I did some off-line testing of the Linux kernel hash functions.  They're not so great.  hash_64() performs VERY poorly when PIDs and job numbers are close together (I expect this would be our common case).  For some reason, hash_64() isn't uniformly mapping these close-together keys very well.  I found another solution based upon nested hash_32() calls (http://lkml.indiana.edu/hypermail/linux/kernel/0605.1/0258.html).  This seems to perform much better.  However, the distribution isn't always uniform.  I ran tests where I repetitively compared two fixed PIDs with incrementing jobs and found that even after 10,000 comparisons, one task could win tie breaks by +1.5% of the time.  However, on most input, the bias was < 0.5%.  Still, I would hope for something better.  Perhaps someone can suggest a better hash function that still has low overheads?

Let me know what you all think!

Note: I've pushed this patch to wip-better-break to rtsrv.cs.unc.edu for those with access.

(As an aside, the logic in the return statement of edf_higher_prio() is getting long and complex.  I had to stare at my modifications for some time before I felt that I hadn't broken anything.  Could we consider breaking it up into a series of if-else statements?  Would the compiler generate just-as-efficient code?)

Thanks,
Glenn

-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-Implement-hash-based-EDF-tie-breaking.patch
Type: application/octet-stream
Size: 4845 bytes
Desc: not available
URL: <http://lists.litmus-rt.org/pipermail/litmus-dev/attachments/20120805/f1fd6646/attachment.obj>
-------------- next part --------------