[LITMUS^RT] Hash-Based Tie-Breaking for EDF

Tue Aug 21 00:10:42 CEST 2012

On Wed, Aug 15, 2012 at 5:58 PM, Jeremy Erickson <jerickso at cs.unc.edu> wrote:
> Comments to selected things inline.
>
>>
>>
>> > 2) For example, why not simply leave out tie breaks altogether (i.e.,
>> > remove the PID check)? Then the insertion order into the ready queue would
>> > create "arbitrary" tie breaks. While these tie breaks would not be
>> > necessarily consistent, I don't think this would affect worst-case analysis,
>> > which typically assumes that ties are broken against the job under analysis
>> > anyway.
>>
>> I like this idea and it is something I will consider.  Although, a
>> job's priority can change if it suspends (either for a lock or I/O).
>> This may be harmless, but I want to make sure that there aren't any
>> strange edge cases.  With regards to priority inheritance, I suppose
>> there would be no inheritance between jobs of equal deadlines.  Still,
>> suppose we have a self-suspension within a critical section.  Could an
>> inversion occur since the lock holder's priority would decrease due to
>> its suspension?  If there is an inversion, would it be bounded without
>> priority inheritance?
>>
> Although I don't think doing this would invalidate any of our worst-case
> analytical results (for exactly the reason Bjoern mentioned), it does break
> some of the proofs as written.  For example, all of the Uma-style tardiness
> proofs (including mine) work by inducting over the set of all jobs in
> priority order.  If ties are not broken consistently between jobs, no such
> ordering really exists, so the proof would need to be tweaked a bit.  As
> Glenn pointed out, we no longer have job-level static priorities, which I
> think could cause issues for synchronization.  So I wouldn't jump on this
> method too quickly.
>
>>
>> > 3) If your goal is to equally distribute tardiness among jobs, why not
>> > directly tie-break by the current tardiness estimate? To deal with tasks
>> > with greatly differing periods, the tardiness  tie-break should probably be
>> > normalized with regard to the period (i.e., tie-break by relative
>> > tardiness). A job's estimated tardiness (and thus its estimated relative
>> > tardiness) assuming it is scheduled right away can be inferred from the
>> > current time, its deadline, and its remaining execution budget. In fact,
>> > when breaking a tie among two jobs A and B, you could even factor A's
>> > remaining execution budget to estimate B's tardiness if B is *not* scheduled
>> > right away.
>>
>> I think this approach could perform better than hashes, but I would
>> rather avoid methods that rely on execution time estimates and
>> budgets.  Execution times can vary and we sometimes run with budget
>> enforcement disabled.  Perhaps this approach would not work best for
>> my GPU use-cases, but would for others.  We could add this as a
>> compile time (or runtime?) tie-break method.
>>
>> > 4) Alternatively, you could tie-break by the last job's actual
>> > tardiness. This would only be one additional lt_t variable in struct
>> > task_struct, which I think would be acceptable.
>>
>> I like this idea more than the previous one.  I imagine that tasks
>> would naturally take turns being the tie-break victim (in
>> round-robin?).  As you suggested in #3, I suppose this would have to
>> be normalized?  Should we normalize by relative deadline instead of
>> period since we now support arbitrary deadlines?  Do we also need to
>> consider utilization/density?
>>
> I do kind of like this method.  It might even be possible to exploit
> analytically somehow with new analysis.  The previous method has a similar
> potential, but looks more complicated to implement.
>
>>
>> > 5) Another simple solution for your particular use case might be to have
>> > a round-robin "tie-break" token, but this is probably also quite difficult
>> > to exploit analytically.
>>
>> This was my first thought, but I couldn't come up with a
>> straight-forward implementation.  Also, I'm not sure how it would work
>> with tasks of different relative deadlines.
>>
> My intuition is that this would be harder to exploit analytically than the
> previous two, although I haven't thought about it enough to have any actual
> results.
>
>>
>> > Personally, I would prefer simple solutions like 2) or 4), or some other
>> > solution with a clear analytical benefit.
>>
>> I agree, but I would also like to avoid dependencies on things like
>> execution times and budgets.  Perhaps one of the theory guys would
>> like to chime in and share their thoughts?
>>
> Method 4 only depends on deadlines, not execution times or budgets.  It
> seems like the most promising option to me, although at present it doesn't
> buy you anything analytically (like all proposals here).
>
> -Jeremy Erickson
>
> _______________________________________________
> litmus-dev mailing list
> litmus-dev at lists.litmus-rt.org
> https://lists.litmus-rt.org/listinfo/litmus-dev
>

I've implemented an alternative to hash-based tie-breaks.  This
alternative implements Bjorn's suggestion to tie-break on tardiness.
Jobs that were tardy in their last invocation get priority.  That is,
the tardier a task's prior job was, the greater priority it gets in
tie breaking.  Please note that instead of strict tardiness, I compute
lateness.  Unlike tardiness, lateness can be negative-- lateness is
negative if a job finishes before its deadline.  I believe Jeremy's
2012 ECRTS paper relies on this definition.

Some things to note:
1) Lateness is not normalized against relative deadline, so I believe
some biases may emerge for some task sets.  For example, suppose we
have a implicit-deadline periodic task set where all tasks have equal
periods, but execution times vary from task to task.  I believe in the
long term, tasks with longer execution times will be favored in
tie-breaks (they will have greater lateness values, on average, even
if they consistently win tie-breaks).  At first I thought this was a
problem, but now I think this may actually be a desirable property.
2) Lateness is computed in prepare_for_next_period().  While our use
of lateness in this patch is limited to EDF scheduling, I thought this
information might be useful in future plugins.  It could also be
exposed to the user space.  I'm not sure if this is really of any
value though.
3) The computation of lateness casts (unsigned long long)'s to (long
long)'s before computing a difference.  I originally had slightly more
complicated code that avoided casts, but then I realized that our
deadline comparators, i.e., lt_after(), do this casting as well.  If
we're not concerned about overflow there, I figured I shouldn't worry
about it either.  (Although, it does make me wonder why lt_t is
(unsigned long long) and not (long long).)
4) The idea of changing EDF tie breaking was generally well-received,
so tie breaking methods is no longer a compile-time option.  I can add
it back in if others object.
5) As per our earlier discussions, I've also recoded the EDF logic to
be more readable in a separate patch.

Two patches are attached. Patch #1 makes EDF logic more readable.
Patch #2 implements lateness-based tie-breaks.  (Patch #2 depends upon
Patch #1.)  These patches may be viewed on rtsrv.cs.unc.edu for UNC
folks (branch: wip-edf-tie-break).  Note: I do not believe these
patches include Bjorn's recent emergency fixes for partitioned
scheduling.

Thoughts?

-Glenn
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-Improve-readability-of-EDF-comparisons.patch
Type: application/octet-stream
Size: 2071 bytes
Desc: not available
URL: <http://lists.litmus-rt.org/pipermail/litmus-dev/attachments/20120820/dddf9033/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0002-EDF-priority-tie-breaks-by-lateness.patch
Type: application/octet-stream
Size: 4129 bytes
Desc: not available
URL: <http://lists.litmus-rt.org/pipermail/litmus-dev/attachments/20120820/dddf9033/attachment-0001.obj>