[LITMUS^RT] Missing st_trace records

Glenn Elliott gelliott at cs.unc.edu
Thu Nov 6 03:41:10 CET 2014


Hi Mikyung,

I have comments inline below.

-Glenn

> On Nov 5, 2014, at 4:59 PM, Mikyung Kang <mkkang01 at gmail.com> wrote:
> 
> Thanks, Björn. After changing from rtspin to rt_launch, I could see that there are no missing records w/o changing anything.
> 
> 
> I have 3 simple questions about the st_job_stats data. Any comments are welcome!
> 
> *** Example: 8*(Period, WCET)=8*(200,180)ms on 8 Cores (both bare-metal and VM cases) [8 "same" tasks using rt_launch]
> 
> Using st_job_stats,  I could see [Task,   Job,     Period,   Response, DL Miss?,   Lateness,  Tardiness] records. 
> 
> 
> (1) Some files describe right period (200ms) but some files describe 0 period as follows. Does it mean that PID#13162 is not schedulable and PID#13166 is only schedulable? But, the Lateness/DL_Miss? of PID#13162 shows no deadline missing.
> 
> # task NAME=<unknown> PID=13162 COST=0 PERIOD=0 CPU=-1
>  13162,     2,          0,  180031469,        0,  -19968531,          0
>  13162,     3,          0,  180026058,        0,  -19973942,          0
>  13162,     4,          0,  180029476,        0,  -19970524,          0
>  13162,     5,          0,  180027542,        0,  -19972458,          0
> ....
> 
> # task NAME=rt_launch PID=13166 COST=180000000 PERIOD=200000000 CPU=0
>  13166,     2,  200000000,  180019319,        0,  -19980681,          0
>  13166,     3,  200000000,  180022003,        0,  -19977997,          0
>  13166,     4,  200000000,  180022586,        0,  -19977414,          0
>  13166,     5,  200000000,  180021609,        0,  -19978391,          0

It looks like COST is also zero.  This information is recorded in the “st_param_data” struct.  There should be one per real-time task.  Make sure that you begin tracing events _before_ launching any real-time tasks.  You may want to sleep for a second or two between the commencement of tracing and launching of real-time tasks.  If you are doing this, can you confirm whether or not st_param_data records are missing for the tasks with reported zero COST and PERIOD?

> (2) When I checked the total lines (total number of jobs) for each PID, each task has the exactly same number of jobs in some cases, but sometimes the number of jobs is slightly different among 8 tasks as follows. Is this expected or not? There is no missed record among total lines. Some tasks have 1 or 2 more jobs. Is it possible?
> 
> 116  116  115  115  114  115  114  114

rtspin executes for a configured duration of time, not for a configured number of jobs.  Due to the various sources of “noise" in the system, you may observe slight variations in the number of completed jobs.  You can modify the rtspin source code to compute the number of jobs that should be executed within the configured time interval (i.e., njobs = duration / period) and then execute that many jobs, instead of exiting with the elapsed time.

> (3) I want to repeat test-case 20 times and then average their schedulability. In either case (whether including period=0 jobs are included to scheduled job or not), I could see that inter-run variation happened a lot as follows. Is this expected or not? Can you get consistent traced records (consistent fraction of schedulable task sets) any time??
>  
> 1.00 1.00 1.00 1.00 1.00 .13 1.00 1.00 1.00 .13 .13 1.00 .25 .13 .13 .13 .13 1.00 .25 1.00 

What is the task set utilization?  Which scheduler do you use?  Under partition scheduling, you can still over-utilize a single processor even if task set utilization is not much more than 1.0 when your task partitioning is too imbalanced.  That is, you can overload one partition while all others are idle. Also, LitmusRT, being based upon Linux, may not support hard real-time scheduling all that well when task set utilization is high.  You may observe deadline misses from time to time.  You may want to examine the maximum amount by which a deadline is missed (perhaps normalized by relative deadline or period), rather than whether a deadline was ever missed.  

> Could you please comment for those 3 questions or even 1?
> Thanks for your help in advance!
> 
> Mikyung
> 
> 
> 
> On Fri, Sep 19, 2014 at 4:21 AM, Björn Brandenburg <bbb at mpi-sws.org <mailto:bbb at mpi-sws.org>> wrote:
> 
> On 18 Sep 2014, at 06:29, Mikyung Kang <mkkang01 at gmail.com <mailto:mkkang01 at gmail.com>> wrote:
> >
> > I'm trying to get whole tracing information of RT task sets using LITMUS-RT Version 2014.1.
> >
> > * System has 8 Cores, no hyper-threading, 16G memory
> > * Tested both Bare-metal case and Virtualization case (Xen): similar result
> > * Ubuntu 12.04 (Linux 3.10.5)
> > * Generated 10 tasks w/ Utilization=[1.0, 8.0] using rtspin
> > * Run 10 seconds using GSN-EDF scheduler
> >
> > When I spawned only 1 task (Period=100ms, WCET=10ms) during 10 seconds, all records are being saved into .bin file correctly w/o missing records.
> > But, more than 1 task, always records are being missed a lot.
> 
> This sounds like something is broken. Even with 8x(10, 100) tasks you should have no tracing problems at all as there should be more than enough time for st_trace to catch up. Your system must be overutilized somehow.
> 
> >
> > To avoid record-loss, I tried the following options based on the thread: https://lists.litmus-rt.org/pipermail/litmus-dev/2013/000480.html <https://lists.litmus-rt.org/pipermail/litmus-dev/2013/000480.html>.
> >
> > * Kernel config: CONFIG_SCHED_TASK_TRACE_SHIFT=13 (up to 8K events)
> > * Used /dev/shm/* instead of disk for the binary record file
> > * Removed unnecessary events for the calculation of deadline miss ratio (switch_to/from, block, resume, action, np_enter/exit)
> > * Current KERNEL_IMAGE_SIZE 512*1024*1024
> >
> > Then, around 4K events are being saved into one task-assigned core (st-*.bin).
> > When I got the information through st_job_stats, I could see that the number of recorded events per task is very different even though tasks have the same period.
> > Moreover, usually 5~20% records are being missed for each task set, even though utilization is very low. Sometimes, more than that.
> 
> This indicates that your system suffers from intervals of overload during which the tracing tools are starved. Are you sure this happens already with only two tasks?
> 
> >
> > Is this expected record-loss ratio using st_trace tool?
> 
> No, unless the st_trace tool is being starved there should be no records lost.
> 
> > What should I check more? Is there any other way to reduce/remove record-loss?
> 
> You can try editing litmus/Kconfig to raise the limit for CONFIG_SCHED_TASK_TRACE_SHIFT. You can also try running st_trace as a real-time task (with rt_launch).
> 
> - Björn
> 
> 
> _______________________________________________
> litmus-dev mailing list
> litmus-dev at lists.litmus-rt.org <mailto:litmus-dev at lists.litmus-rt.org>
> https://lists.litmus-rt.org/listinfo/litmus-dev <https://lists.litmus-rt.org/listinfo/litmus-dev>
> 
> _______________________________________________
> litmus-dev mailing list
> litmus-dev at lists.litmus-rt.org
> https://lists.litmus-rt.org/listinfo/litmus-dev

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.litmus-rt.org/pipermail/litmus-dev/attachments/20141105/1fc0229e/attachment.html>


More information about the litmus-dev mailing list