[LITMUS^RT] Question about the scheduling overhead of GSN-EDF scheduler in LITMUS

Sun Oct 11 16:57:44 CEST 2015

Hi Bjorn,

Thank you so much for your detailed reply! They are really helpful!
I actually have one question about your answer. I will reply below your
answer.

2015-10-11 5:26 GMT-04:00 Björn Brandenburg <bbb at mpi-sws.org>:

>
> On 11 Oct 2015, at 05:50, Meng Xu <xumengpanda at gmail.com> wrote:
>
> We are measuring the scheduling overhead (SCHED and SCHED2 event) with the
> feather trace tool in LITMUS. When we randomly generate 450 tasks (rtspin)
> as a taskset and release them with arbitrary offset, we found that the
> worst-case scheduling overhead of GSN-EDF plugin is less than 40us.
> The hardware we use is Freescale IMX6 ARM board, which has 4 cores.
> (We did generate multiple such tasksets and vary the number of tasks from
> 50 to 450 as Dr. Brandenburg did in his RTSS09 paper[1], the scheduling
> overhead is from 12us to 20us when task number increases from 50 to 450.)
>
> However, the data in Dr. Brandenburg's RTSS09 paper [1] shows that the
> worst-case scheduling overhead of GSN-EDF is at least 80us (Fig. 6 on page
> 13 of [1]) when task number is 450.
>
>
> The implementation included in mainline LITMUS^RT corresponds to the
> CEm/CE1 graphs in [1], so actually the observed maximum costs go up to
> ~200us.
>
> *My general question is:*
> Is the scheduling overhead we measured reasonable?
>
>
> Obviously, specific measured numbers are going to depend on the hardware
> platform. For a four-core platform, the magnitude of your measurements
> sounds ok.
>
>
> *My specific question is:*
> 1) Do we have to release tasks at the same time to reproduce the similar
> overhead value for GSN-EDF in [1]?
>
>
> Yes. If you have a synchronous task set release (= all tasks release the
> first job at the same time) and periodic arrivals, you are going to see
> much higher peak contention. If you are interested in observing
> near-worst-case behavior, your test workload should trigger such peak
> contention scenarios. With random arrivals, you are extremely unlikely to
> trigger scenarios in which all cores need to access scheduling data
> structures at the same time.
>
> Have a look at the ‘-w’ flag in rtspin and the ‘release_ts’ tool to set up
> synchronous task set releases.
>

> 2) Is it possible that LITMUS^RT has made some improvement since RTSS09
> paper. The improvement reduces the scheduling overhead and therefore the
> smaller overhead value we measured is reasonable?
>
>
> The GSN-EDF plugin has gotten better at avoid superfluous or “bad”
> migrations. Specifically, it schedules tasks locally if the
> interrupt-handling core is idle, and if it has to migrate, it tries to
> consider the cache topology to find a “nearby” core. The latter
> cache-topology-aware logic was contributed by Glenn Elliot and can be
> toggled with a configuration option.
>

Yes. I saw it in the source code and disabled the cache-topology-aware
logic. Actually, since all 4 cores share the same cache, it should not
matter.

>
> However, for a “small” single-socket, 4-core platform, it shouldn’t make a
> difference.
>
> 3) The platform we use only has 4 cores and therefore the lock contention
> among cores may be (much) smaller, compared to the experiment Dr.
> Brandenburg did in 2009? (We are not quite sure how many cores are used in
> the RTSS09 paper's overhead experiment.) Is it possible that we may observe
> much smaller overhead value like 20us due to less lock contention?
>
>
> Yes. The platform used in the 2009 paper was a SUN Niagara with 32
> hardware threads (= CPUs from Linux’s point of view). (Further hardware
> details a given in the first paragraph of section 4 of the paper.)
> Obviously, contention is *much* higher with 32 cores than with 4 cores, so
> I would indeed expect to see much lower overheads on your four-core
> platform.
>
> 4) Do we have to use other RT tasks instead of rtspin to run the overhead
> measurement in order to reproduce the result in [1]?
>
>
> One thing that makes a big difference is whether you have heavy
> contention, or even thrashing, in the memory hierarchy. In an attempt to
> trigger “near-worst-case” conditions, I collected overheads while running
> 32 background tasks (= one per core) that randomly read and write a large
> array (performing bogus computations). Basically, it drives up the cost of
> cache misses. In my experience, this has a large effect on the observed
> maxima.
>

*I actually have one question about this*:
Why the background cache-intensive task will affect the scheduling
overhead?
Is it because the ready_queue data may be polluted by the background tasks
so that it will take longer time to take one task from ready_queue and
therefore increases the critical section?
I'm not sure if there is any other reason why the background tasks can
increase the scheduling overhead. 

>
> I hope that helps answer your questions. Let me know if not.
>

Thank you very much for your time in answering this question!

Best regards,

Meng
-- 

-----------
Meng Xu
PhD Student in Computer and Information Science
University of Pennsylvania
http://www.cis.upenn.edu/~mengxu/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.litmus-rt.org/pipermail/litmus-dev/attachments/20151011/65c850b0/attachment.html>