<div dir="ltr"><div class="gmail_default" style="font-size:small">Hi Bjorn,</div><div class="gmail_default" style="font-size:small"><br></div><div class="gmail_default" style="font-size:small">Thank you so much for your detailed reply! They are really helpful!</div><div class="gmail_default" style="font-size:small">I actually have one question about your answer. I will reply below your answer.</div><div class="gmail_extra"><br><div class="gmail_quote">2015-10-11 5:26 GMT-04:00 Björn Brandenburg <span dir="ltr"><<a href="mailto:bbb@mpi-sws.org" target="_blank">bbb@mpi-sws.org</a>></span>:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div style="word-wrap:break-word"><br><div><span class=""><blockquote type="cite"><div>On 11 Oct 2015, at 05:50, Meng Xu <<a href="mailto:xumengpanda@gmail.com" target="_blank">xumengpanda@gmail.com</a>> wrote:</div><br><div><div style="font-family:Helvetica;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;font-size:small">We are measuring the scheduling overhead (SCHED and SCHED2 event) with the feather trace tool in LITMUS. When we randomly generate 450 tasks (rtspin) as a taskset and release them with arbitrary offset, we found that the worst-case scheduling overhead of GSN-EDF plugin is less than 40us.</div><div style="font-family:Helvetica;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;font-size:small">The hardware we use is Freescale IMX6 ARM board, which has 4 cores. </div><div style="font-family:Helvetica;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;font-size:small">(We did generate multiple such tasksets and vary the number of tasks from 50 to 450 as Dr. Brandenburg did in his RTSS09 paper[1], the scheduling overhead is from 12us to 20us when task number increases from 50 to 450.)</div><div style="font-family:Helvetica;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;font-size:small"><br></div><div style="font-family:Helvetica;font-size:13px;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px">However, the data in Dr. Brandenburg's RTSS09 paper [1] shows that the worst-case scheduling overhead of GSN-EDF is at least 80us (Fig. 6 on page 13 of [1]) when task number is 450.</div></div></blockquote><div><br></div></span><div>The implementation included in mainline LITMUS^RT corresponds to the CEm/CE1 graphs in [1], so actually the observed maximum costs go up to ~200us.</div><span class=""><br><blockquote type="cite"><div><div style="font-family:Helvetica;font-size:13px;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px"><b>My general question is:</b></div><div style="font-family:Helvetica;font-size:13px;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px">Is the scheduling overhead we measured reasonable?</div></div></blockquote><div><br></div></span><div>Obviously, specific measured numbers are going to depend on the hardware platform. For a four-core platform, the magnitude of your measurements sounds ok.</div><span class=""><br><blockquote type="cite"><div><div style="font-family:Helvetica;font-size:13px;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px"><br></div><div style="font-family:Helvetica;font-size:13px;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px"><b>My specific question is:</b></div><div style="font-family:Helvetica;font-size:13px;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px">1) Do we have to release tasks at the same time to reproduce the similar overhead value for GSN-EDF in [1]?</div></div></blockquote><div><br></div></span><div>Yes. If you have a synchronous task set release (= all tasks release the first job at the same time) and periodic arrivals, you are going to see much higher peak contention. If you are interested in observing near-worst-case behavior, your test workload should trigger such peak contention scenarios. With random arrivals, you are extremely unlikely to trigger scenarios in which all cores need to access scheduling data structures at the same time.</div><div><br></div><div>Have a look at the ‘-w’ flag in rtspin and the ‘release_ts’ tool to set up synchronous task set releases. </div></div></div></blockquote><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div style="word-wrap:break-word"><div><span class=""><br><blockquote type="cite"><div><div style="font-family:Helvetica;font-size:13px;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px">2) Is it possible that LITMUS^RT has made some improvement since RTSS09 paper. The improvement reduces the scheduling overhead and therefore the smaller overhead value we measured is reasonable?</div></div></blockquote><div><br></div></span><div>The GSN-EDF plugin has gotten better at avoid superfluous or “bad” migrations. Specifically, it schedules tasks locally if the interrupt-handling core is idle, and if it has to migrate, it tries to consider the cache topology to find a “nearby” core. The latter cache-topology-aware logic was contributed by Glenn Elliot and can be toggled with a configuration option.</div></div></div></blockquote><div><br></div><div><div class="gmail_default" style="font-size:small;display:inline">Yes. I saw it in the source code and disabled the cache-topology-aware logic. Actually, since all 4 cores share the same cache, it should not matter. </div></div><div><div class="gmail_default" style="font-size:small;display:inline"></div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div style="word-wrap:break-word"><div><div><br></div><div>However, for a “small” single-socket, 4-core platform, it shouldn’t make a difference.</div><span class=""><br><blockquote type="cite"><div><div style="font-family:Helvetica;font-size:13px;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px">3) The platform we use only has 4 cores and therefore the lock contention among cores may be (much) smaller, compared to the experiment Dr. Brandenburg did in 2009? (We are not quite sure how many cores are used in the RTSS09 paper's overhead experiment.) Is it possible that we may observe much smaller overhead value like 20us due to less lock contention?</div></div></blockquote><div><br></div></span><div>Yes. The platform used in the 2009 paper was a SUN Niagara with 32 hardware threads (= CPUs from Linux’s point of view). (Further hardware details a given in the first paragraph of section 4 of the paper.) Obviously, contention is *much* higher with 32 cores than with 4 cores, so I would indeed expect to see much lower overheads on your four-core platform.</div><span class=""><br><blockquote type="cite"><div><div style="font-family:Helvetica;font-size:13px;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px">4) Do we have to use other RT tasks instead of rtspin to run the overhead measurement in order to reproduce the result in [1]?</div></div></blockquote><div><br></div></span><div>One thing that makes a big difference is whether you have heavy contention, or even thrashing, in the memory hierarchy. In an attempt to trigger “near-worst-case” conditions, I collected overheads while running 32 background tasks (= one per core) that randomly read and write a large array (performing bogus computations). Basically, it drives up the cost of cache misses. In my experience, this has a large effect on the observed maxima. </div></div></div></blockquote><div><br></div><div><div class="gmail_default" style="font-size:small"><b>I actually have one question about this</b>: </div><div class="gmail_default" style="font-size:small">Why the background cache-intensive task will affect the scheduling overhead? </div><div class="gmail_default" style="font-size:small">Is it because the ready_queue data may be polluted by the background tasks so that it will take longer time to take one task from ready_queue and therefore increases the critical section? </div><div class="gmail_default" style="font-size:small">I'm not sure if there is any other reason why the background tasks can increase the scheduling overhead. </div><br></div><div class="gmail_default" style="font-size:small"></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div style="word-wrap:break-word"><div><br></div><div>I hope that helps answer your questions. Let me know if not.</div></div></blockquote><div><br></div><div class="gmail_default">Thank you very much for your time in answering this question! </div><div class="gmail_default"><br></div><div>Best regards,</div><div><br></div><div><div class="gmail_default" style="font-size:small">Meng</div></div></div>-- <br><div class="gmail_signature"><br><br>-----------<br>Meng Xu<br>PhD Student in Computer and Information Science<br>University of Pennsylvania<br><a href="http://www.cis.upenn.edu/~mengxu/" target="_blank">http://www.cis.upenn.edu/~mengxu/</a></div>
</div></div>