[LITMUS^RT] Outlier

Thu Aug 8 18:11:46 CEST 2013

Hi Björn,

Thank you for your in-depth explanation.
When I measure the SEND-RESCHED overhead in the G-EDF plugin disabling
everything related to power management in the BIOS and the kernel
configuration,
the maximum overhead is quite low.

- Hiro

On Tue, Aug 6, 2013 at 2:54 AM, Björn Brandenburg <bbb at mpi-sws.org> wrote:

>
> On Aug 5, 2013, at 9:14 PM, Hiroyuki Chishiro <chishiro at cs.unc.edu> wrote:
>
> > I measured overheads in the recent developer version of LITMUS^RT, which
> I got in github (linux-kernel version 3.0 and modified version of 2012.3).
> > However, I found an outlier though "statistical outlier filtering is no
> longer required" in https://wiki.litmus-rt.org/litmus/Releases.
>
> Let me clarify. Here's what I meant by "statistical outlier filtering is
> no longer required".
>
> Previously, we had measurement errors that produced outliers that did not
> correspond to true events that actually happened. Let's call those
> "erroneous outliers", because they were created by weaknesses in the
> measurement process.
>
> Of course there is always the possibility of "true outliers", that is,
> samples that are much higher than the average or median or xth percentile
> or whatever your definition of "typical" is. True outliers reflect
> unpredictability or bad worst-case behavior of the actual system.
>
> The difference is that erroneous outliers *should* be filtered, as they
> distort the observations, whereas true outliers *must not* be filtered, as
> discarding them introduces inaccuracy (essentially, you are underestimating
> worst-case costs when accidentally filtering true outliers).
>
> Because we had no good way of identifying which samples where true
> outliers and which samples where erroneous outliers, and because the rate
> of erroneous outliers was high enough to prevent drawing useful conclusions
> w.r.t. maxima from the collected data, in the past we had to accept the
> risk of removing true outliers due to statistical outlier removal
> techniques.
>
> With the changes introduced as part of my RTAS'13 work, it is now possible
> to (I believe) reliably tell apart erroneous outliers from true outliers.
> This is accomplished by tracking and identifying the causes of erroneous
> outliers (interrupts, out-of-order samples, gaps in the traces), and by
> filtering ONLY samples for which it is known that they were disturbed.
>
> Outlier filtering is still required, but no *statistical*, indiscriminate
> outlier filtering. Erroneous outliers are thus removed, whereas true
> outliers are left untouched.
>
> > For example, when I measure SEND-RESCHED overhead using many task sets
> in the G-EDF plugin, most of worst-case overheads are less than 2us but an
> outlier is about 120us.
> > I think that this is an outlier.
>
> Yes, but it is it a true or an erroneous outlier? If there are long code
> segments that disable interrupts, it is entirely possible that true
> outliers of that magnitude exist.
>
> > Overhead data are collected by experiment-scripts in
> https://github.com/brandenburg/experiment-scripts.
>
> These are not my scripts; they are all due to Jonathan afaik. From all
> that I've heard, they are really great, but I have yet had no time to study
> them and don't know how they work or how they process sampling data. By
> default, ft2csv should reject samples that correspond to erroneous
> outliers, but it's possible to suppress interrupt filtering.
>
> Note that you need to run ftsort *prior* to running ft2csv. I don't know
> if Jonathan's scripts do this yet, as my RTAS'13 work happened in parallel
> with Jonathan's work.
>
> So there are several possibilities:
>
> 1) The outlier that you observed is a true outlier (likely, in my opinion).
>
> 2) There's a bug in ft2csv, ftsort, or the kernel that make some erroneous
> outliers appear to be true outliers (possible, but I have collected ~500GB
> of data over 24h+ hours without ever encountering such a problem).
>
> 3) Jonathan's scripts disable interrupt filtering (unlikely).
>
> 4) Jonathan's scripts do not sort the trace files (I have no idea).
> However, even if this is the case, I would expect this to mask true
> outliers, not generate erroneous outliers.
>
> I hope this clarifies the situation. Obviously, I can't claim that
> LITMUS^RT or any other Linux-based system is free from true outliers. I
> meant exactly what I said: *statistical* outlier filtering is no longer
> required, because we now have better, more accurate methods of identifying
> erroneous outliers.
>
> Btw, if I had to guess, I'd bet your outlier is caused by sleep states.
> Waking a core that went into a deep sleep state can easily take 100+
> microseconds. I'd try disabling everything related to power management in
> the BIOS and the kernel configuration.
>
> - Björn
>
>
>
>
>
> _______________________________________________
> litmus-dev mailing list
> litmus-dev at lists.litmus-rt.org
> https://lists.litmus-rt.org/listinfo/litmus-dev
>

-- 

Hiroyuki Chishiro

Visiting Scholar

Department of Computer Science

The University of North Carolina at Chapel Hill
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.litmus-rt.org/pipermail/litmus-dev/attachments/20130808/af70ab93/attachment.html>