[LITMUS^RT] RFC: kernel-style events for Litmus^RT

Tue Feb 14 00:05:16 CET 2012

On Feb 11, 2012, at 4:17 PM, Andrea Bastoni wrote:

> Hi all,
> 
> I've managed to expand and polish a bit a patch that I've had around for a
> while. It basically enables the same sched_trace_XXX() functions that we
> currently use to trace scheduling events, but it does so using kernel-style
> events (/sys/kernel/debug/tracing/ etc.).
> 
> So, why another tracing infrastructure:
> - Litmus tracepoints can be recorded and analyzed together (single
>  time reference) with all other kernel tracing events (e.g.,
>  sched:sched_switch, etc.). It's easier to correlate the effects
>  of kernel events on litmus tasks.
> 
> - It enables a quick way to visualize and process schedule traces
>  using trace-cmd utility and kernelshark visualizer.
>  Kernelshark lacks unit-trace's schedule-correctness checks, but
>  it enables a fast view of schedule traces and it has several
>  filtering options (for all kernel events, not only Litmus').
> 
> Attached (I hope the ML won't filter images ;)) you can find the visualization
> of a simple set of rtspin tasks. Particularly, getting the trace of a single
> task is straightforward using trace-cmd:
> 
> # trace-cmd record -e sched:sched_switch -e litmus:* ./rtspin -p 0 50 100 2
> 
> and to visualize it:
> 
> # kernelshark trace.dat
> 
> trace-cmd can be fetch here:
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/trace-cmd.git
> 
> (kernelshark it's just the "make gui" of trace-cmd; trace-cmd and kernelshark
> have a lot more features than simple filtering and visualization; hopefully it
> should be a good help for debugging.)
> 
> The patch is on "wip-tracepoints" on main repository and jupiter.
> 
> Info on trace-cmd, kernelshark, and ftrace are available here:
> 
> http://lwn.net/Articles/341902/
> http://lwn.net/Articles/425583/
> http://rostedt.homelinux.com/kernelshark/
> http://lwn.net/Articles/365835/
> http://lwn.net/Articles/366796/

I saw these tracing tools at RTLWS this year and thought it would be nice to leverage the OS tracing and visualization tools.  The validation methods of unit-trace are nice, but have fallen out of use.  Unit-trace is mostly used for visual inspection/validation and I think kernelshark is probably more robust than unit-trace, right?

Questions:
(1) I guess this would completely remove the feather-trace under-pinnings to sched_trace in favor of this?
(2) How might this affect the analysis tools we use in sched_trace.git?  Can we merely update to new struct formats, or is it more complicated than that?
(3) How big is the buffer used by the Linux tracing?  Using feather-trace-based tracing, I've seen dropped events in systems that are temporarily overutilized.  This is because ft-trace gets starved for CPU time.  I've made the sched_trace buffers huge to counter this, but this "fix" doesn't always work.  Would Linux tracing make dropped events more or less likely?  What recourse do we have if we find that events are being dropped?

-Glenn