[LITMUS^RT] RFC: kernel-style events for Litmus^RT

Thu Feb 16 00:36:15 CET 2012

This is really, really nice. I'll give it a couple days for everyone to
check it out then probably merge
into staging. It has inspired another question: should we move sched_trace
towards this infrastructure?

I need to add visualization for container scheduling into something so that
I can practically debug
my implementation. The unit-trace visualization code is a tad obtuse and I
was not looking forward to
adding container support. The code for kernelshark seems modularized and
slick. I would much rather
add code to this. I could add visualization for releases / deadlines /
blocking etc fairly easily.

Other / future work (Glenn's interrupts, Chris's memory management) on
litmus would benefit from an
easily extendable tracing framework. I don't want to extend unit-trace if
we'll have to abandon it for
tracepoints anyway.

Chris, Glenn, Mac, and I are pro abandoning unit-trace for kernel
visualization. Bjoern and Andrea, what do
you think about this? Going forward, I would see us dropping unit-trace for
kernel visualization, but could
we replace sched_trace entirely in the long term? Would we want to?

For those that didn't get a chance to play with it, this also supports
dynamically enabling / disabling events
as well as a task-centric view of system events, so that you can list
rt-spin processes and see how they are
behaving.

On Tue, Feb 14, 2012 at 2:59 PM, Andrea Bastoni <bastoni at cs.unc.edu> wrote:

> On 02/14/2012 12:05 AM, Glenn Elliott wrote:
> >
> > On Feb 11, 2012, at 4:17 PM, Andrea Bastoni wrote:
> >
> >> Hi all,
> >>
> >> I've managed to expand and polish a bit a patch that I've had around
> for a
> >> while. It basically enables the same sched_trace_XXX() functions that we
> >> currently use to trace scheduling events, but it does so using
> kernel-style
> >> events (/sys/kernel/debug/tracing/ etc.).
> >>
> >> So, why another tracing infrastructure:
> >> - Litmus tracepoints can be recorded and analyzed together (single
> >>  time reference) with all other kernel tracing events (e.g.,
> >>  sched:sched_switch, etc.). It's easier to correlate the effects
> >>  of kernel events on litmus tasks.
> >>
> >> - It enables a quick way to visualize and process schedule traces
> >>  using trace-cmd utility and kernelshark visualizer.
> >>  Kernelshark lacks unit-trace's schedule-correctness checks, but
> >>  it enables a fast view of schedule traces and it has several
> >>  filtering options (for all kernel events, not only Litmus').
> >>
> >> Attached (I hope the ML won't filter images ;)) you can find the
> visualization
> >> of a simple set of rtspin tasks. Particularly, getting the trace of a
> single
> >> task is straightforward using trace-cmd:
> >>
> >> # trace-cmd record -e sched:sched_switch -e litmus:* ./rtspin -p 0 50
> 100 2
> >>
> >> and to visualize it:
> >>
> >> # kernelshark trace.dat
> >>
> >> trace-cmd can be fetch here:
> >>
> >> git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/trace-cmd.git
> >>
> >> (kernelshark it's just the "make gui" of trace-cmd; trace-cmd and
> kernelshark
> >> have a lot more features than simple filtering and visualization;
> hopefully it
> >> should be a good help for debugging.)
> >>
> >> The patch is on "wip-tracepoints" on main repository and jupiter.
> >>
> >> Info on trace-cmd, kernelshark, and ftrace are available here:
> >>
> >> http://lwn.net/Articles/341902/
> >> http://lwn.net/Articles/425583/
> >> http://rostedt.homelinux.com/kernelshark/
> >> http://lwn.net/Articles/365835/
> >> http://lwn.net/Articles/366796/
> >
> >
> > I saw these tracing tools at RTLWS this year and thought it would be
> nice to
> leverage the OS tracing and visualization tools. The validation methods of
> unit-trace are nice, but have fallen out of use. Unit-trace is mostly used
> for
> visual inspection/validation and I think kernelshark is probably more
> robust
> than unit-trace, right?
>
> Umm, I think the major strength of this approach is that it's easier to
> correlate (also visually) Linux tasks and Litmus tasks. It also enable a
> quick
> way to visualize schedule traces, but ATM:
>
> - unit-trace schedule plots are prettier! :)
> When you visualize plots with kernelshark you also get (if you don't
> disable
> them) all the "spam" from other events/tracing points.
>
> - unit-trace can automatically check for deadline misses
>
> > Questions:
> > (1) I guess this would completely remove the feather-trace
> under-pinnings to sched_trace in favor of this?
>
> Nope, as I said in a previous email, it adds to sched_trace_XXX(). You can
> have
> both enabled, both disabled, or one enabled and the other disabled. The
> defines
> in [include/litmus/sched_trace.h] do the enable/disable trick.
>
> > (2) How might this affect the analysis tools we use in sched_trace.git?
>  Can
> > we merely update to new struct formats, or is it more complicated than
> that?
>
> Umm, you're always more than welcome to update them if you want! :) I
> don't see
> problems in using both methods. It's always nice to have Litmus-only traces
> without all the spam that can be generated by kernel function tracers.
> (You can
> play with "./trace-cmd record -e all /bin/ls" to get an idea on how many
> events
> will be recorded... and you're just tracing events, not all the functions!)
>
> > (3) How big is the buffer used by the Linux tracing?  Using
> > feather-trace-based tracing, I've seen dropped events in systems that are
> > temporarily overutilized.  This is because ft-trace gets starved for CPU
> > time.  I've made the sched_trace buffers huge to counter this, but this
> "fix"
> > doesn't always work.  Would Linux tracing make dropped events more or
> less
> > likely?  What recourse do we have if we find that events are being
> dropped?
>
> [snip]
> > Info on trace-cmd, kernelshark, and ftrace are available here:
> >
> [snip]
> > http://lwn.net/Articles/366796/
>
> buffer_size_kb; and perhaps starting/stopping the trace from the kernel
> may work.
>
> Thanks,
> - Andrea
>
>
> > -Glenn
> >
> >
> > _______________________________________________
> > litmus-dev mailing list
> > litmus-dev at lists.litmus-rt.org
> > https://lists.litmus-rt.org/listinfo/litmus-dev
> >
>
>
> _______________________________________________
> litmus-dev mailing list
> litmus-dev at lists.litmus-rt.org
> https://lists.litmus-rt.org/listinfo/litmus-dev
>

-- 
Jonathan Herman
Department of Computer Science at UNC Chapel Hill
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.litmus-rt.org/pipermail/litmus-dev/attachments/20120215/767591cf/attachment.html>