[LITMUS^RT] Help with Understanding Scheduling Results

Fri Mar 13 04:39:09 CET 2015

> On 12 Mar 2015, at 23:06, Geoffrey Tran <gtran at isi.edu> wrote:
> 
> I have been working with LITMUS-RT, but have come across some weird results
> and was hoping someone could help clarify it please.

Hi Geoffrey,

thanks for your interest in LITMUS^RT. I’ve not observed any similar behavior; it indeed looks pretty strange. Some questions/comments inline below.

> 
> The machine used is a 16-core system.  8 cores are dedicated to the host, 
> while up to the remainig 8 cores are used to run a Xen virtual machine.  
> Just in case more background is needed, Xen is a popular hypervisor.  The
> RTDS scheduler is a new real-time scheduler that was originally the RT-Xen 
> project.  It is now a part of the upstream Xen project.  The scheduler
> gives each VCPU a guaranteed capacity based on the input parameters.
> 
> LITMUS-RT v2014.2 is installed in the guest.  The hypervisor is using the
> rtds scheduler, with each VCPU receiving the full allocation (Budget: 10000, 
> Period: 10000).

Which LITMUS^RT scheduler are you using?

Can you reproduce the issue with LITMUS^RT running on bare metal?

> 
> The issue is as follows:  when running a simple two tasks with less than 50%
> single core utilization each (Cost=200ms, Period=500ms), jobs are still missing
> in certain cases.  
> - When we use only 2VCPUs for the guest, success in all jobs is observed
> - When we use 3 or more VCPUs, misses are observed

Do you use budget enforcement ?

> The results from st_job_stats and an exerpt of the logs for the 3 VCPU case
> are below:
> 	# Task,   Job,     Period,   Response, DL Miss?,   Lateness,  Tardiness
> 	# task NAME=myapp PID=14442 COST=250000000 PERIOD=500000000 CPU=0
> 	 14442,     2,  500000000,      32592,        0, -499967408,          0
> 	 14442,     3,  500000000,      61805,        0, -499938195,          0
> 	 14442,     4,  500000000,  291581439,        0, -208418561,          0
> 	 14442,     5,  500000000,  251964671,        0, -248035329,          0
> 	 14442,     6,  500000000,  252424182,        0, -247575818,          0
> 	 14442,     7,  500000000,  251938074,        0, -248061926,          0
> 	 14442,     8,  500000000,  252145862,        0, -247854138,          0
> 	 14442,     9,  500000000,  251845811,        0, -248154189,          0
> 	 14442,    10,  500000000,  257706935,        0, -242293065,          0
> 	 14442,    11,  500000000,  251850581,        0, -248149419,          0
> 	 14442,    12,  500000000,  252553597,        0, -247446403,          0
> 	 14442,    13,  500000000,  251765063,        0, -248234937,          0
> 	 14442,    14,  500000000,  252902538,        0, -247097462,          0
> 	 14442,    15,  500000000, 1009091185,        1,  509091185,  509091185
> 	 14442,    16,  500000000,  760966632,        1,  260966632,  260966632
> 	 14442,    17,  500000000,  512866266,        1,   12866266,   12866266
> 	 14442,    18,  500000000,  264818921,        0, -235181079,          0
> 	 14442,    19,  500000000,  253024397,        0, -246975603,          0
> 	 14442,    20,  500000000,  252785150,        0, -247214850,          0
> 	 14442,    21,  500000000,  252466946,        0, -247533054,          0
> 	 14442,    22,  500000000, 1459862887,        1,  959862887,  959862887
> 	 14442,    23,  500000000, 1211903080,        1,  711903080,  711903080
> 	 14442,    24,  500000000,  963919848,        1,  463919848,  463919848
> 	# task NAME=myapp PID=14443 COST=200000000 PERIOD=500000000 CPU=0
> 	 14443,     2,  500000000,      58150,        0, -499941850,          0
> 	 14443,     3,  500000000,    3202178,        0, -496797822,          0
> 	 14443,     4,  500000000,  201662924,        0, -298337076,          0
> 	 14443,     5,  500000000,  213828161,        0, -286171839,          0
> 	 14443,     6,  500000000,  202532002,        0, -297467998,          0
> 	 14443,     7,  500000000,  961643647,        1,  461643647,  461643647
> 	 14443,     8,  500000000,  663707479,        1,  163707479,  163707479
> 	 14443,     9,  500000000,  365603701,        0, -134396299,          0
> 	 14443,    10,  500000000,  201910605,        0, -298089395,          0
> 	 14443,    11,  500000000,  209025099,        0, -290974901,          0
> 	 14443,    12,  500000000,  210602663,        0, -289397337,          0
> 	 14443,    13,  500000000,  247544048,        0, -252455952,          0
> 	 14443,    14,  500000000,  459680759,        0,  -40319241,          0
> 	 14443,    15,  500000000,  202763167,        0, -297236833,          0
> 	 14443,    16,  500000000,  202368570,        0, -297631430,          0
> 	 14443,    17,  500000000,  201857551,        0, -298142449,          0
> 	 14443,    18,  500000000,  201999263,        0, -298000737,          0
> 	 14443,    19,  500000000,  958909178,        1,  458909178,  458909178
> 	 14443,    20,  500000000,  660591085,        1,  160591085,  160591085
> 	 14443,    21,  500000000,  362025653,        0, -137974347,          0
> 	 14443,    22,  500000000,  202565748,        0, -297434252,          0
> 	 14443,    23,  500000000,  202456104,        0, -297543896,          0
> 	 14443,    24,  500000000,  202528628,        0, -297471372,          0
> 
> 	 .....
> 	[ 328638446] RELEASE        14443/14    on CPU  0 328638946.00ms
> 	[ 328638446] RELEASE        14442/14    on CPU  1 328638946.00ms
> 	[ 328638446] SWITCH_TO      14442/14    on CPU  0
> 	[ 328638698] SWITCH_FROM    14442/14    on CPU  0
> 	[ 328638698] COMPLETION     14442/14    on CPU  0
> 	[ 328638704] SWITCH_TO      14443/14    on CPU  1
> 	[ 328638905] SWITCH_FROM    14443/14    on CPU  1
> 	[ 328638905] COMPLETION     14443/14    on CPU  1
> 	[ 328638946] RELEASE        14442/15    on CPU  0 328639446.00ms
> 	[ 328638946] RELEASE        14443/15    on CPU  1 328639446.00ms
> 	[ 328638946] SWITCH_TO      14443/15    on CPU  0
> 	[ 328639148] SWITCH_FROM    14443/15    on CPU  0
> 	[ 328639148] COMPLETION     14443/15    on CPU  0
> 	[ 328639446] RELEASE        14443/16    on CPU  0 328639946.00ms
> 	[ 328639446] RELEASE        14442/16    on CPU  1 328639946.00ms

Random observation: the next release of 14443 happens on CPU 0, whereas 14442 is still being released on CPU 1. This means CPU 1 must have set a timer for the release to occur. Usually, it shouldn’t do that unless it either (a) processed a completion or (b) prevented a budget overrun. Either event should have left a trace record (I think). Are you sure the trace is complete (i.e., no missing writes)?

> 	[ 328639446] SWITCH_TO      14443/16    on CPU  0
> 	[ 328639648] SWITCH_FROM    14443/16    on CPU  0
> 	[ 328639648] COMPLETION     14443/16    on CPU  0
> 	[ 328639703] SWITCH_TO      14442/15    on CPU  1
> 	[ 328639946] RELEASE        14442/17    on CPU  1 328640446.00ms
> 	[ 328639946] RELEASE        14443/17    on CPU  0 328640446.00ms
> 	[ 328639946] SWITCH_TO      14443/17    on CPU  0
> 	[ 328639955] SWITCH_FROM    14442/15    on CPU  1
> 	[ 328639955] COMPLETION     14442/15    on CPU  1
> 	[ 328639955] SWITCH_TO      14442/16    on CPU  1
> 	[ 328640147] SWITCH_FROM    14443/17    on CPU  0
> 	[ 328640147] COMPLETION     14443/17    on CPU  0
> 	[ 328640206] SWITCH_FROM    14442/16    on CPU  1
> 	[ 328640206] COMPLETION     14442/16    on CPU  1
> 	[ 328640206] SWITCH_TO      14442/17    on CPU  1
> 	.....
> 
> 
> It is interesting to note that when both job 15's are released, the job for PID 14443 is 
> allowed to run. however, for PID 14442, the job does not run until the next period!
> 
> It has also been observed that running two tasks on a guest with 8 VCPUs sees the same
> task misses.  But by artificailly loading the system by running 6 instances of "yes > 
> /dev/null", all jobs meet their deadlines.

Do these instances of yes run as LITMUS^RT tasks or just as regular SCHED_OTHER tasks?

Also, can you provide matching output of the debug trace log (the TRACE() stream)?

I’ve not used LITMUS^RT on Xen. I don’t mean to blame Xen, but it would make it easier to for us if you could reproduce this behavior on bare metal.

Thanks,
Björn

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.litmus-rt.org/pipermail/litmus-dev/attachments/20150313/b540a2d0/attachment.html>