[LITMUS^RT] running rtspin and calculate deadline miss ratio? -- Sisu

Wed Apr 17 19:43:23 CEST 2013

Hi Sisu,

I believe you should compute deadline misses by analyzing shech_trace logs.  /dev/litmus/sched_trace# has a character device for each CPU.  See "Recording Scheduling Traces" here: https://wiki.litmus-rt.org/litmus/Tracing

You will have to write your own tool to combine the binary recordings for each CPU (each is timestamped).  The easiest way to do this is:
1) mmap() each sched_trace file into your analysis application.
2) Treat each mmap()'ed file as a giant C-array of "struct st_event_record".  You may want to #include sched_trace.h from litmus-rt/include/litmus to get the struct definitions.
3) Records can appear out of order, so for each stream, take the first 50-or-so records and put them into a single timestamp-ordered minheap.
4) Process each record one at a time by popping the first record on the min-heap.  Keep the min-heap full by moving more records from the arrays. 

(An alternative approach is to qsort() each array (or just the first X-elements of the array), and process the record with the smallest timestamp at the heads of the record streams.)

Detecting deadline misses is then pretty straight forward once you've got this set up.  A job can be uniquely identified by its TID, and Job Number.  Every job has one release record, which includes a deadline.  Correlate this release record (by matching <TID, Job#>) to a unique completion record.  A deadline miss has occurred if the completion time is later than the deadline.

Some tips:
(1) The user processes' notion of a job can differ from the kernel's notion of a job if you use budget enforcement.  Accounting for this requires more complex processing of the sched_trace data.  I have some patches in the works that makes this correlation easier to do, but it's not ready for prime-time.
(2) If your system is severely overutilized, that the regular tasks that read the sched_trace buffers and write them to disk can be starved.  This can cause the sched_trace ring buffers to overflow, resulting in the loss of tracing data.  You can control the size of the sched_trace buffers at Litmus compilation time (see CONFIG_SCHED_TASK_TRACE_SHIFT).  However, the kernel may refuse to compile if CONFIG_SCHED_TASK_TRACE_SHIFT and NR_CPUS lead to too much sched_trace buffer space---the binary kernel image becomes too big.  You have to hack other aspects of the kernel to get larger buffers, but it is tricky.  Let me know if you run into this problem.
(3) (This holds for ft_tracing as well.)  I find it easier to dump logs to shared memory (RAM disk) during tracing because this causes fewer overheads.  I then copy the trace data out of RAM and write it to disk after experimentation is over.  Ubuntu has a RAM disk already set up for you at /dev/shm.  Just dump data to /dev/shm and read back the files later.  Of course, your system has to have sufficient RAM to make this work.

-Glenn

On Apr 17, 2013, at 12:47 PM, Sisu Xi <xisisu at gmail.com> wrote:

> Hi, all:
> 
> Is there any tutorial on running multiple rt tasks (say, rtspin) for some time and calculate the deadline miss ratio? like the ones you presented in the paper?
> 
> I run a single rtspin task with wcet of 5 and period of 10 for 100 seconds. There is no output of this.
> 
> I can trace the task execution via reading /dev/litmus/log, it shows:
> 
> 208107 P2: rt: adding rtspin/1357 (5000000, 10000000, 10000000) rel=1088752672478 to ready queue at 1088754177964
> 208108 P2: check_for_preemptions: attempting to link task 1357 to 1
> 208110 P2: (rtspin/1357:2350) blocks:0 out_of_time:0 np:0 sleep:1 preempt:0 state:0 sig:0
> 208111 P2: (rtspin/1357:2350) job_completion().
> 208112 P2: rt: adding rtspin/1357 (5000000, 10000000, 10000000) rel=1088762672478 to ready queue at 1088764173889
> 208113 P2: check_for_preemptions: attempting to link task 1357 to 1
> 208115 P2: (rtspin/1357:2351) blocks:0 out_of_time:0 np:0 sleep:1 preempt:0 state:0 sig:0
> 208116 P2: (rtspin/1357:2351) job_completion().
> 
> I assume the 1357 is the pid, and the number following (2350, 2351, etc) is the job id. However, I don't know the exact job release time and job completion time. Thus I don't know whether this job missed its deadline or not.
> 
> How do you guys trace the deadline miss ratio?
> 
> Thanks very much!
> 
> Sisu
> 
> 
> 
> -- 
> Sisu Xi, PhD Candidate
> 
> http://www.cse.wustl.edu/~xis/
> Department of Computer Science and Engineering
> Campus Box 1045
> Washington University in St. Louis
> One Brookings Drive
> St. Louis, MO 63130
> _______________________________________________
> litmus-dev mailing list
> litmus-dev at lists.litmus-rt.org
> https://lists.litmus-rt.org/listinfo/litmus-dev

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.litmus-rt.org/pipermail/litmus-dev/attachments/20130417/b77e2d79/attachment.html>