[LITMUS^RT] Schedulability checking

Björn Brandenburg bbb at mpi-sws.org
Sun Feb 19 15:38:07 CET 2012


On Feb 17, 2012, at 6:28 PM, Felipe Cerqueira wrote:

> Hmm... The overhead costs obtained from ft_tools are not 100% exact because of the unpredictabilities of the system, but at least the approach doesn't depend on scheduling decisions, right? What I was trying to do is much less reliable.

Any approach relying on *measured* overheads is not truly safe. To get absolute certainty, you need to analytically derive worst-case execution time (WCET) bounds.

On x86, this is essentially infeasible, for several reasons.

1) The required processor documentation is not available (you can't derive an exact processor model for the WCET analysis).

2) Existing x86 processors are highly unpredictable and tuned towards good average-case throughput at the expense of worst-case delays.

3) The processor interconnects used in x86 multicores are highly unpredictable (and also not openly documented).

Even if these obstacles would be overcome somehow, Linux (and hence LITMUS^RT) is not well suited to WCET analysis since it uses unbounded loops, copious amounts of function pointers, etc.

Therefore, some compromises have to be made both when working with any RTOS on x86 and when working with Linux on any platform.

Does this mean that schedulability analysis and overhead accounting is useless? Not at all.

With property WCET analysis, you get the following guarantee if a task set passes a schedulability test.

	[1] "hardware does not fail" && "processor model correct" && "WCET analysis correct" =>  "no deadlines will be missed"

If instead you use measured overheads and measured execution times, you get the following property.

	[2] "hardware does not fail" && "actual WCET does not exceed assumed/measured WCET" && "actual overheads do not exceed assumed/measured WCET" => "no deadlines will be missed"

While [2] is not as strong as [1], it is still a lot stronger than what simply running and observing the system for X time units (e.g., doing a test run for 24 hours), which is the following:

	[3] "hardware does not fail" && "actual WCET does not exceed WCET during test run" && "actual overheads do not exceed overheads as they occurred during test run" && "actual arrival sequence is not 'more difficult to schedule' than the one that occurred during the test run" => "no deadlines will be missed"

The key difference is that [2] holds true for *any* arrival sequence and *any* combination of overheads within the assumed limits, whereas [3] only applies to a particular execution sequence and a particular combination of overheads, which may or may not be representative of the worst case. I think that makes a huge difference in the degree of "trustworthiness" achieved by the final system. It's not perfect, but  in my opinion it's a whole lot better than the "ship it if it didn't crash during testing" approach.
 
Note that [2] can be made more resilient in practice by assuming higher WCETs and overheads than measured, that is, it is reasonable to add some "engineering margin" to measurements in practical systems.

Sorry for the long reply, but I hope it provides some context for LITMUS^RT is and what it is aimed at.

Cheers,
Björn

PS: I've also touched on this topic on pages 161-164 in http://www.cs.unc.edu/~bbb/diss/brandenburg-diss.pdf.





More information about the litmus-dev mailing list