<html><head><meta http-equiv="Content-Type" content="text/html charset=windows-1252"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;"><br><div><div>On Apr 22, 2014, at 2:03 PM, Thijs Raets <<a href="mailto:thijsraets@gmail.com">thijsraets@gmail.com</a>> wrote:</div><br class="Apple-interchange-newline"><blockquote type="cite"><div dir="ltr">Ok unit-trace is maybe not the best solution but right now my problem is not unit-trace. So there isn't an easy way to get the start time of my task? If I would use the completion time of job 3 as a reference (since this is the last one indicating task set release preparation) then I would have to wait almost a full second for job 4 (the real first job) to start, rendering my absolute reference useless...</div>
<div class="gmail_extra"><br><br><div class="gmail_quote">2014-04-22 18:41 GMT+02:00 Glenn Elliott <span dir="ltr"><<a href="mailto:gelliott@cs.unc.edu" target="_blank">gelliott@cs.unc.edu</a>></span>:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div class="HOEnZb"><div class="h5"><br>
On Apr 22, 2014, at 12:05 PM, Mac Mollison <<a href="mailto:mollison@cs.unc.edu">mollison@cs.unc.edu</a>> wrote:<br>
<br>
> I can't answer your specific question, but I do want to raise a separate<br>
> issue.<br>
><br>
> I am the former maintainer of unit-trace and stopped maintaining<br>
> it when I switched to working on a non-Litmus project.<br>
><br>
> My understanding is that there are now better tools available to do the<br>
> same thing that actually are being maintained. Perhaps someone in the<br>
> know can comment on these tools and their availability? I thought there<br>
> was some tool that uses KernelShark.<br>
><br>
> In fact, I see some relevant code here, though I couldn't find any<br>
> documentation about this on the LITMUS wiki:<br>
> <a href="https://github.com/LITMUS-RT/rt-kernelshark" target="_blank">https://github.com/LITMUS-RT/rt-kernelshark</a><br>
><br>
> If that is not a good option for some reason, perhaps we should look at<br>
> reviving unit-trace. Since it appers to have users who are suffering<br>
> from its lack of maintenance.<br>
><br>
> Another (probably better) option would be to just make a new tool.<br>
><br>
> I have found that visualizing scheduler traces by creating an SVG file<br>
> (which has HTML-like syntax) makes writing these tools trivial in<br>
> comparison to the way we did it for unit-trace.<br>
><br>
> I have such a tool for my userspace work and forking it to draw LITMUS<br>
> schedules would be a very easy project for someone who understands<br>
> the LITMUS trace format. The code is all in Python and it's only 379<br>
> lines.<br>
><br>
> - Mac<br>
><br>
> On Tue, 22 Apr 2014 16:50:48 +0200<br>
> Thijs Raets <<a href="mailto:thijsraets@gmail.com">thijsraets@gmail.com</a>> wrote:<br>
><br>
>> Hi,<br>
>><br>
>> The sys_release wasn't recognized by unit trace, I had to add the<br>
>> sys_release type a second time in the array of the unit-trace script<br>
>> since the type number was 11. I can record the task set release now<br>
>> but I have a new problem now. This is my output:<br>
>><br>
>><br>
>> .<br>
>> .<br>
>> .<br>
>><br>
>> Event ID: 7<br>
>> Job: 0.0<br>
>> Type: sys_release2<br>
>> Time: 672784686695748<br>
>><br>
>> Event ID: 8<br>
>> Job: 6904.1<br>
>> Type: resume<br>
>> Time: 672784686702656<br>
>><br>
>> Event ID: 9<br>
>> Job: 6904.2<br>
>> Type: release<br>
>> Time: 672784686705186<br>
>><br>
>> Event ID: 10<br>
>> Job: 6904.2<br>
>> Type: switch_to<br>
>> Time: 672784686727392<br>
>><br>
>> Event ID: 11<br>
>> Job: 6904.3<br>
>> Type: switch_away<br>
>> Time: 672784686731107<br>
>><br>
>> Event ID: 12<br>
>> Job: 6904.3<br>
>> Type: completion<br>
>> Time: 672784686736223<br>
>><br>
>> Event ID: 13<br>
>> Job: 6904.4<br>
>> Type: release<br>
>> Time: 672785686000000<br>
>><br>
>> Rtspin has a period of 100ms. The resume of job 1 is ignored I only<br>
>> look at release and completion. Job 2 is released but had no<br>
>> completion(no worries I want soft real-time results). Job 3 only has<br>
>> completion time. The real problem for me is the time difference<br>
>> between the release of job 4 and job 2. It's nowhere near 200ms the<br>
>> other jobs are released just fine, but they are all shifted because<br>
>> of job 4 releasing late.(This problem only occurs when using<br>
>> synchronous release)<br>
>><br>
>> Any ideas?<br>
>><br>
>> Thanks,<br>
>> Thijs<br>
>><br>
>><br>
>><br>
>><br>
>> 2014-04-17 12:48 GMT+02:00 Thijs Raets <<a href="mailto:thijsraets@gmail.com">thijsraets@gmail.com</a>>:<br>
>><br>
>>> Hi,<br>
>>><br>
>>> That was exactly what I was looking for thank you! Is it possible<br>
>>> that this sys_release event appears as type "resume" in my<br>
>>> unit-trace output?<br>
>>><br>
>>> Kind regards,<br>
>>> Thijs<br>
>>><br>
>>><br>
>>> 2014-04-17 0:15 GMT+02:00 Glenn Elliott <<a href="mailto:gelliott@cs.unc.edu">gelliott@cs.unc.edu</a>>:<br>
>>><br>
>>> Hi Thijs,<br>
>>>><br>
>>>> Are you using synchronous releases? (The ‘-w’ option for<br>
>>>> rtspin.) With ‘-w’, rtspin will block until the user execute’s<br>
>>>> liblitmus’s ‘release_ts’ (“release task set”). Task set release<br>
>>>> records a timestamped ’sys_release’ event in sched_trace. You can<br>
>>>> then discard any releases/completions that occur before the<br>
>>>> sys_release timestamp. I believe this will happen naturally if<br>
>>>> you tell unit-trace to discard the first job of every task with<br>
>>>> “-s 1”.<br>
>>>><br>
>>>> The “experiment-scripts” package (<br>
>>>> <a href="https://github.com/LITMUS-RT/experiment-scripts" target="_blank">https://github.com/LITMUS-RT/experiment-scripts</a>) can also be used<br>
>>>> to analyze traces. However, it also requires you to use its<br>
>>>> framework to set up experiments (this can take some time if you<br>
>>>> need to do something out of the ordinary). The tool currently<br>
>>>> only gives you information on job deadline misses, tardiness, and<br>
>>>> the amount of time tasks send blocked. However, it’s fairly easy<br>
>>>> to extend parse/sched.py to do other forms of analysis, such as<br>
>>>> job response time.<br>
>>>><br>
>>>> -Glenn<br>
>>>><br>
>>>><br>
>>>> On Apr 16, 2014, at 12:14 PM, Thijs Raets <<a href="mailto:thijsraets@gmail.com">thijsraets@gmail.com</a>><br>
>>>> wrote:<br>
>>>><br>
>>>> Hi,<br>
>>>><br>
>>>> I'm using feather trace to monitor the rtspin task. I parsed<br>
>>>> the .bin file with unit-trace and thus extracted release and<br>
>>>> completion. The problem is that I also would like to record the<br>
>>>> time at which I give the order to execute the rtspin task. I need<br>
>>>> this time to calculate the theoretical deadlines of the different<br>
>>>> jobs since the release time of the first job could have been<br>
>>>> postponed. I however do not know how time is registered within<br>
>>>> feather trace. I would like to use the same reference to record<br>
>>>> the start time. Where do I look?<br>
>>>><br>
>>>> Thanks in advance,<br>
>>>><br>
>>>> Thijs<br>
<br>
<br>
<br>
</div></div>Adding more to what Mac has said, rt-kernelshark is good for visualizing traces. kernelshark is a tool designed for Linux that we extend for visualizing Litmus traces. Owing to it’s Linux support, rt-kernelshark can give you a lot more information on what is going on within the Linux kernel than unit-trace’s visualization tool.<br>
<div class="HOEnZb"><div class="h5"><br>
<br>
_______________________________________________<br>
litmus-dev mailing list<br>
<a href="mailto:litmus-dev@lists.litmus-rt.org">litmus-dev@lists.litmus-rt.org</a><br>
<a href="https://lists.litmus-rt.org/listinfo/litmus-dev" target="_blank">https://lists.litmus-rt.org/listinfo/litmus-dev</a><br>
</div></div></blockquote></div><br></div>
_______________________________________________<br>litmus-dev mailing list<br><a href="mailto:litmus-dev@lists.litmus-rt.org">litmus-dev@lists.litmus-rt.org</a><br>https://lists.litmus-rt.org/listinfo/litmus-dev<br></blockquote></div><div><br></div><div><br></div><br><div>I’m not sure where your extra second is coming from. Here’s the code path in Litmus:</div><div><br></div><div>1) Here is where the “release_ts” system call wakes up all tasks waiting for a synchronous release. Observe that it sets “wait->ts_release_time=start”. This tells the tasks the absolute time of when they should be released. Note that the same ‘start’ value is logged in sched_trace. <a href="https://github.com/LITMUS-RT/litmus-rt/blob/master/litmus/sync.c#L107">https://github.com/LITMUS-RT/litmus-rt/blob/master/litmus/sync.c#L107</a></div><div><br></div><div>2) Here is waiting tasks go when woken up for release. Observe that the code uses the ’ts_release_time’ value set by the caller of release_ts. <a href="https://github.com/LITMUS-RT/litmus-rt/blob/master/litmus/sync.c#L54">https://github.com/LITMUS-RT/litmus-rt/blob/master/litmus/sync.c#L54</a></div><div><br></div><div>3) #2 above takes you to here. It is at this point that you may add code to reset the job counter if you so wish. <a href="https://github.com/LITMUS-RT/litmus-rt/blob/master/litmus/jobs.c#L50">https://github.com/LITMUS-RT/litmus-rt/blob/master/litmus/jobs.c#L50</a></div><div><br></div><div>4) The above then takes you to complete_job(), which calls schedule(). <a href="https://github.com/LITMUS-RT/litmus-rt/blob/master/litmus/jobs.c#L75">https://github.com/LITMUS-RT/litmus-rt/blob/master/litmus/jobs.c#L75</a></div><div><br></div><div>5) schedule() will trigger a call into the active Litmus plugin’s schedule(). The plugin will compare the current time and the release time of the job and send the job to either the ready-queue or release-queue, accordingly. Here’s schedule() for the G-EDF plugin: <a href="https://github.com/LITMUS-RT/litmus-rt/blob/master/litmus/sched_gsn_edf.c#L427">https://github.com/LITMUS-RT/litmus-rt/blob/master/litmus/sched_gsn_edf.c#L427</a></div><div><br></div><div>I don’t see where between Steps 2 & 3 a second delay could sneak in. Completions (the synchronization method used to wake up waiting tasks) are executed immediately—that is, the work isn’t deferred to a tasklet/work-queue—waiting tasks get woken up. Furthermore, the absolute release time is passed from 2 to 3.</div><div><br></div><div><br></div><div>Three questions:</div><div><br></div><div>1) Do you have debug tracing (“TRACE()”) enabled? (<a href="https://github.com/LITMUS-RT/litmus-rt/blob/master/litmus/Kconfig#L297">https://github.com/LITMUS-RT/litmus-rt/blob/master/litmus/Kconfig#L297</a>) Heavy debug tracing can really give you screwy timing behaviors. It should only be enabled if you’re actively debugging a plugin. Your kernel should never be compiled with this debug tracing enabled when you are measuring overheads, job response times, etc.</div><div><br></div><div>2) Is your system heavily loaded? That is, is system utilization > ~95%?</div><div><br></div><div>3) Which release policy are your tasks using? (List of policies: <a href="https://github.com/LITMUS-RT/litmus-rt/blob/master/include/litmus/rt_param.h#L39">https://github.com/LITMUS-RT/litmus-rt/blob/master/include/litmus/rt_param.h#L39</a>) liblitmus will use TASK_SPORADIC by default. However, if you trust your applications to call sleep_next_period() at the end of each application-level job, then I suggest you use TASK_PERIODIC. This ensures that your application and the kernel maintain consistent job counts. (TASK_SPORADIC can cause a job to finish “early" (job counter increment) unbeknownst to the application if/when you have budget enforcement enabled and budget has exhausted. I think it may also cause a job to complete when the task wakes up from blocking I/O and its deadline has already passed… but I’d have to look into that to be sure.</div><div><br></div><div>One last thing to be aware of: I suggest you make your sched_trace buffer as large as your system can tolerate (<a href="https://github.com/LITMUS-RT/litmus-rt/blob/master/litmus/Kconfig#L280">https://github.com/LITMUS-RT/litmus-rt/blob/master/litmus/Kconfig#L280</a>). This is because trace events can be lost if the kernel’s ring buffer overflows before events can be streamed to disk (or RAM disk). This will help ensure that you capture all events, even if the system becomes temporarily overutilized (starving the non-real-time task that is streaming events to disk).</div><div><br></div><div>-Glenn</div></body></html>