[LITMUS^RT] RFC: kernel-style events for Litmus^RT

Tue Feb 14 20:59:30 CET 2012

On 02/14/2012 12:05 AM, Glenn Elliott wrote:
> 
> On Feb 11, 2012, at 4:17 PM, Andrea Bastoni wrote:
> 
>> Hi all,
>>
>> I've managed to expand and polish a bit a patch that I've had around for a
>> while. It basically enables the same sched_trace_XXX() functions that we
>> currently use to trace scheduling events, but it does so using kernel-style
>> events (/sys/kernel/debug/tracing/ etc.).
>>
>> So, why another tracing infrastructure:
>> - Litmus tracepoints can be recorded and analyzed together (single
>>  time reference) with all other kernel tracing events (e.g.,
>>  sched:sched_switch, etc.). It's easier to correlate the effects
>>  of kernel events on litmus tasks.
>>
>> - It enables a quick way to visualize and process schedule traces
>>  using trace-cmd utility and kernelshark visualizer.
>>  Kernelshark lacks unit-trace's schedule-correctness checks, but
>>  it enables a fast view of schedule traces and it has several
>>  filtering options (for all kernel events, not only Litmus').
>>
>> Attached (I hope the ML won't filter images ;)) you can find the visualization
>> of a simple set of rtspin tasks. Particularly, getting the trace of a single
>> task is straightforward using trace-cmd:
>>
>> # trace-cmd record -e sched:sched_switch -e litmus:* ./rtspin -p 0 50 100 2
>>
>> and to visualize it:
>>
>> # kernelshark trace.dat
>>
>> trace-cmd can be fetch here:
>>
>> git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/trace-cmd.git
>>
>> (kernelshark it's just the "make gui" of trace-cmd; trace-cmd and kernelshark
>> have a lot more features than simple filtering and visualization; hopefully it
>> should be a good help for debugging.)
>>
>> The patch is on "wip-tracepoints" on main repository and jupiter.
>>
>> Info on trace-cmd, kernelshark, and ftrace are available here:
>>
>> http://lwn.net/Articles/341902/
>> http://lwn.net/Articles/425583/
>> http://rostedt.homelinux.com/kernelshark/
>> http://lwn.net/Articles/365835/
>> http://lwn.net/Articles/366796/
> 
> 
> I saw these tracing tools at RTLWS this year and thought it would be nice to
leverage the OS tracing and visualization tools. The validation methods of
unit-trace are nice, but have fallen out of use. Unit-trace is mostly used for
visual inspection/validation and I think kernelshark is probably more robust
than unit-trace, right?

Umm, I think the major strength of this approach is that it's easier to
correlate (also visually) Linux tasks and Litmus tasks. It also enable a quick
way to visualize schedule traces, but ATM:

- unit-trace schedule plots are prettier! :)
When you visualize plots with kernelshark you also get (if you don't disable
them) all the "spam" from other events/tracing points.

- unit-trace can automatically check for deadline misses

> Questions:
> (1) I guess this would completely remove the feather-trace under-pinnings to sched_trace in favor of this?

Nope, as I said in a previous email, it adds to sched_trace_XXX(). You can have
both enabled, both disabled, or one enabled and the other disabled. The defines
in [include/litmus/sched_trace.h] do the enable/disable trick.

> (2) How might this affect the analysis tools we use in sched_trace.git?  Can
> we merely update to new struct formats, or is it more complicated than that?

Umm, you're always more than welcome to update them if you want! :) I don't see
problems in using both methods. It's always nice to have Litmus-only traces
without all the spam that can be generated by kernel function tracers. (You can
play with "./trace-cmd record -e all /bin/ls" to get an idea on how many events
will be recorded... and you're just tracing events, not all the functions!)

> (3) How big is the buffer used by the Linux tracing?  Using
> feather-trace-based tracing, I've seen dropped events in systems that are
> temporarily overutilized.  This is because ft-trace gets starved for CPU
> time.  I've made the sched_trace buffers huge to counter this, but this "fix"
> doesn't always work.  Would Linux tracing make dropped events more or less
> likely?  What recourse do we have if we find that events are being dropped?

[snip]
> Info on trace-cmd, kernelshark, and ftrace are available here:
>
[snip]
> http://lwn.net/Articles/366796/

buffer_size_kb; and perhaps starting/stopping the trace from the kernel may work.

Thanks,
- Andrea

> -Glenn
> 
> 
> _______________________________________________
> litmus-dev mailing list
> litmus-dev at lists.litmus-rt.org
> https://lists.litmus-rt.org/listinfo/litmus-dev
>