[LITMUS^RT] recording start real-time task

Wed Apr 23 15:33:17 CEST 2014

Thanks again Glenn for your time, what a great answer (it eases my quest to
understanding litmus-rt :))! My focus was all wrong, looking at the trace
events I did not think an extra default delay of 1s would be set in
userspace. Sorry I should have seen this.

PS: I recompiled my kernel without TRACE() and with a larger buffer,
unit-trace was unable to create an ouput file from the bin trace files (the
bin files did contain data)

2014-04-23 6:00 GMT+02:00 Björn Brandenburg <bbb at mpi-sws.org>:

> On 22 Apr 2014, at 20:44, Glenn Elliott <gelliott at cs.unc.edu> wrote:
>
>
> On Apr 22, 2014, at 2:03 PM, Thijs Raets <thijsraets at gmail.com> wrote:
>
> Ok unit-trace is maybe not the best solution but right now my problem is
> not unit-trace. So there isn't an easy way to get the start time of my
> task? If I would use the completion time of job 3 as a reference (since
> this is the last one indicating task set release preparation) then I would
> have to wait almost a full second for job 4 (the real first job) to start,
> rendering my absolute reference useless...
>
>
> 2014-04-22 18:41 GMT+02:00 Glenn Elliott <gelliott at cs.unc.edu>:
>
>>
>> On Apr 22, 2014, at 12:05 PM, Mac Mollison <mollison at cs.unc.edu> wrote:
>>
>> > I can't answer your specific question, but I do want to raise a separate
>> > issue.
>> >
>> > I am the former maintainer of unit-trace and stopped maintaining
>> > it when I switched to working on a non-Litmus project.
>> >
>> > My understanding is that there are now better tools available to do the
>> > same thing that actually are being maintained. Perhaps someone in the
>> > know can comment on these tools and their availability? I thought there
>> > was some tool that uses KernelShark.
>> >
>> > In fact, I see some relevant code here, though I couldn't find any
>> > documentation about this on the LITMUS wiki:
>> > https://github.com/LITMUS-RT/rt-kernelshark
>> >
>> > If that is not a good option for some reason, perhaps we should look at
>> > reviving unit-trace. Since it appers to have users who are suffering
>> > from its lack of maintenance.
>> >
>> > Another (probably better) option would be to just make a new tool.
>> >
>> > I have found that visualizing scheduler traces by creating an SVG file
>> > (which has HTML-like syntax) makes writing these tools trivial in
>> > comparison to the way we did it for unit-trace.
>> >
>> > I have such a tool for my userspace work and forking it to draw LITMUS
>> > schedules would be a very easy project for someone who understands
>> > the LITMUS trace format. The code is all in Python and it's only 379
>> > lines.
>> >
>> > - Mac
>> >
>> > On Tue, 22 Apr 2014 16:50:48 +0200
>> > Thijs Raets <thijsraets at gmail.com> wrote:
>> >
>> >> Hi,
>> >>
>> >> The sys_release wasn't recognized by unit trace, I had to add the
>> >> sys_release type a second time in the array of the unit-trace script
>> >> since the type number was 11. I can record the task set release now
>> >> but I have a new problem now. This is my output:
>> >>
>> >>
>> >> .
>> >> .
>> >> .
>> >>
>> >> Event ID: 7
>> >> Job: 0.0
>> >> Type: sys_release2
>> >> Time: 672784686695748
>> >>
>> >> Event ID: 8
>> >> Job: 6904.1
>> >> Type: resume
>> >> Time: 672784686702656
>> >>
>> >> Event ID: 9
>> >> Job: 6904.2
>> >> Type: release
>> >> Time: 672784686705186
>> >>
>> >> Event ID: 10
>> >> Job: 6904.2
>> >> Type: switch_to
>> >> Time: 672784686727392
>> >>
>> >> Event ID: 11
>> >> Job: 6904.3
>> >> Type: switch_away
>> >> Time: 672784686731107
>> >>
>> >> Event ID: 12
>> >> Job: 6904.3
>> >> Type: completion
>> >> Time: 672784686736223
>> >>
>> >> Event ID: 13
>> >> Job: 6904.4
>> >> Type: release
>> >> Time: 672785686000000
>> >>
>> >> Rtspin has a period of 100ms. The resume of job 1 is ignored I only
>> >> look at release and completion. Job 2 is released but had no
>> >> completion(no worries I want soft real-time results). Job 3 only has
>> >> completion time. The real problem for me is the time difference
>> >> between the release of job 4 and job 2. It's nowhere near 200ms the
>> >> other jobs are released just fine, but they are all shifted because
>> >> of job 4 releasing late.(This problem only occurs when using
>> >> synchronous release)
>> >>
>> >> Any ideas?
>> >>
>> >> Thanks,
>> >> Thijs
>> >>
>> >>
>> >>
>> >>
>> >> 2014-04-17 12:48 GMT+02:00 Thijs Raets <thijsraets at gmail.com>:
>> >>
>> >>> Hi,
>> >>>
>> >>> That was exactly what I was looking for thank you! Is it possible
>> >>> that this sys_release event appears as type "resume" in my
>> >>> unit-trace output?
>> >>>
>> >>> Kind regards,
>> >>> Thijs
>> >>>
>> >>>
>> >>> 2014-04-17 0:15 GMT+02:00 Glenn Elliott <gelliott at cs.unc.edu>:
>> >>>
>> >>> Hi Thijs,
>> >>>>
>> >>>> Are you using synchronous releases?  (The ‘-w’ option for
>> >>>> rtspin.)  With ‘-w’, rtspin will block until the user execute’s
>> >>>> liblitmus’s ‘release_ts’ (“release task set”).  Task set release
>> >>>> records a timestamped ’sys_release’ event in sched_trace.  You can
>> >>>> then discard any releases/completions that occur before the
>> >>>> sys_release timestamp.  I believe this will happen naturally if
>> >>>> you tell unit-trace to discard the first job of every task with
>> >>>> “-s 1”.
>> >>>>
>> >>>> The “experiment-scripts” package (
>> >>>> https://github.com/LITMUS-RT/experiment-scripts) can also be used
>> >>>> to analyze traces.  However, it also requires you to use its
>> >>>> framework to set up experiments (this can take some time if you
>> >>>> need to do something out of the ordinary).  The tool currently
>> >>>> only gives you information on job deadline misses, tardiness, and
>> >>>> the amount of time tasks send blocked. However, it’s fairly easy
>> >>>> to extend parse/sched.py to do other forms of analysis, such as
>> >>>> job response time.
>> >>>>
>> >>>> -Glenn
>> >>>>
>> >>>>
>> >>>> On Apr 16, 2014, at 12:14 PM, Thijs Raets <thijsraets at gmail.com>
>> >>>> wrote:
>> >>>>
>> >>>> Hi,
>> >>>>
>> >>>> I'm using feather trace to monitor the rtspin task. I parsed
>> >>>> the .bin file with unit-trace and thus extracted release and
>> >>>> completion. The problem is that I also would like to record the
>> >>>> time at which I give the order to execute the rtspin task. I need
>> >>>> this time to calculate the theoretical deadlines of the different
>> >>>> jobs since the release time of the first job could have been
>> >>>> postponed. I however do not know how time is registered within
>> >>>> feather trace. I would like to use the same reference to record
>> >>>> the start time. Where do I look?
>> >>>>
>> >>>> Thanks in advance,
>> >>>>
>> >>>> Thijs
>>
>>
>>
>> Adding more to what Mac has said, rt-kernelshark is good for visualizing
>> traces.  kernelshark is a tool designed for Linux that we extend for
>> visualizing Litmus traces.  Owing to it’s Linux support, rt-kernelshark can
>> give you a lot more information on what is going on within the Linux kernel
>> than unit-trace’s visualization tool.
>>
>>
>> _______________________________________________
>> litmus-dev mailing list
>> litmus-dev at lists.litmus-rt.org
>> https://lists.litmus-rt.org/listinfo/litmus-dev
>>
>
> _______________________________________________
> litmus-dev mailing list
> litmus-dev at lists.litmus-rt.org
> https://lists.litmus-rt.org/listinfo/litmus-dev
>
>
>
>
> I’m not sure where your extra second is coming from.  Here’s the code path
> in Litmus:
>
> 1) Here is where the “release_ts” system call wakes up all tasks waiting
> for a synchronous release.  Observe that it sets
> “wait->ts_release_time=start”. This tells the tasks the absolute time of
> when they should be released.  Note that the same ‘start’ value is logged
> in sched_trace.
> https://github.com/LITMUS-RT/litmus-rt/blob/master/litmus/sync.c#L107
>
> 2) Here is waiting tasks go when woken up for release. Observe that the
> code uses the ’ts_release_time’ value set by the caller of release_ts.
> https://github.com/LITMUS-RT/litmus-rt/blob/master/litmus/sync.c#L54
>
> 3) #2 above takes you to here.  It is at this point that you may add code
> to reset the job counter if you so wish.
> https://github.com/LITMUS-RT/litmus-rt/blob/master/litmus/jobs.c#L50
>
> 4) The above then takes you to complete_job(), which calls schedule().
> https://github.com/LITMUS-RT/litmus-rt/blob/master/litmus/jobs.c#L75
>
> 5) schedule() will trigger a call into the active Litmus plugin’s
> schedule().  The plugin will compare the current time and the release time
> of the job and send the job to either the ready-queue or release-queue,
> accordingly.  Here’s schedule() for the G-EDF plugin:
> https://github.com/LITMUS-RT/litmus-rt/blob/master/litmus/sched_gsn_edf.c#L427
>
> I don’t see where between Steps 2 & 3 a second delay could sneak in.
>  Completions (the synchronization method used to wake up waiting tasks) are
> executed immediately—that is, the work isn’t deferred to a
> tasklet/work-queue—waiting tasks get woken up.  Furthermore, the absolute
> release time is passed from 2 to 3.
>
>
> Three questions:
>
> 1) Do you have debug tracing (“TRACE()”) enabled?  (
> https://github.com/LITMUS-RT/litmus-rt/blob/master/litmus/Kconfig#L297)
>  Heavy debug tracing can really give you screwy timing behaviors.  It
> should only be enabled if you’re actively debugging a plugin.  Your kernel
> should never be compiled with this debug tracing enabled when you are
> measuring overheads, job response times, etc.
>
> 2) Is your system heavily loaded?  That is, is system utilization > ~95%?
>
> 3) Which release policy are your tasks using?  (List of policies:
> https://github.com/LITMUS-RT/litmus-rt/blob/master/include/litmus/rt_param.h#L39)
>  liblitmus will use TASK_SPORADIC by default.  However, if you trust your
> applications to call sleep_next_period() at the end of each
> application-level job, then I suggest you use TASK_PERIODIC.  This ensures
> that your application and the kernel maintain consistent job counts.
>  (TASK_SPORADIC can cause a job to finish “early" (job counter increment)
> unbeknownst to the application if/when you have budget enforcement enabled
> and budget has exhausted.  I think it may also cause a job to complete when
> the task wakes up from blocking I/O and its deadline has already passed…
> but I’d have to look into that to be sure.
>
> One last thing to be aware of: I suggest you make your sched_trace buffer
> as large as your system can tolerate (
> https://github.com/LITMUS-RT/litmus-rt/blob/master/litmus/Kconfig#L280).
>  This is because trace events can be lost if the kernel’s ring buffer
> overflows before events can be streamed to disk (or RAM disk).  This will
> help ensure that you capture all events, even if the system becomes
> temporarily overutilized (starving the non-real-time task that is streaming
> events to disk).
>
>
> Thanks for the excellent explanation, Glenn.
>
> Thijs, further note that userspace may add an arbitrary delay to the
> synchronous release time.
>
> https://github.com/LITMUS-RT/litmus-rt/blob/master/litmus/sync.c#L134
>
> release_ts sets this to one second by default:
>
> https://github.com/LITMUS-RT/liblitmus/blob/master/bin/release_ts.c#L46
>
> Regards,
> Björn
>
>
> _______________________________________________
> litmus-dev mailing list
> litmus-dev at lists.litmus-rt.org
> https://lists.litmus-rt.org/listinfo/litmus-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.litmus-rt.org/pipermail/litmus-dev/attachments/20140423/fab363ff/attachment.html>