[LITMUS^RT] Help with Understanding Scheduling Results

Thu Mar 12 23:06:09 CET 2015

Hi,

I have been working with LITMUS-RT, but have come across some weird results
and was hoping someone could help clarify it please.

The machine used is a 16-core system.  8 cores are dedicated to the host, 
while up to the remainig 8 cores are used to run a Xen virtual machine.  
Just in case more background is needed, Xen is a popular hypervisor.  The
RTDS scheduler is a new real-time scheduler that was originally the RT-Xen 
project.  It is now a part of the upstream Xen project.  The scheduler
gives each VCPU a guaranteed capacity based on the input parameters.

LITMUS-RT v2014.2 is installed in the guest.  The hypervisor is using the
 rtds scheduler, with each VCPU receiving the full allocation (Budget: 10000, 
 Period: 10000).

The issue is as follows:  when running a simple two tasks with less than 50%
single core utilization each (Cost=200ms, Period=500ms), jobs are still missing
in certain cases.  
- When we use only 2VCPUs for the guest, success in all jobs is observed
- When we use 3 or more VCPUs, misses are observed

The results from st_job_stats and an exerpt of the logs for the 3 VCPU case
are below:
	# Task,   Job,     Period,   Response, DL Miss?,   Lateness,  Tardiness
	# task NAME=myapp PID=14442 COST=250000000 PERIOD=500000000 CPU=0
	 14442,     2,  500000000,      32592,        0, -499967408,          0
	 14442,     3,  500000000,      61805,        0, -499938195,          0
	 14442,     4,  500000000,  291581439,        0, -208418561,          0
	 14442,     5,  500000000,  251964671,        0, -248035329,          0
	 14442,     6,  500000000,  252424182,        0, -247575818,          0
	 14442,     7,  500000000,  251938074,        0, -248061926,          0
	 14442,     8,  500000000,  252145862,        0, -247854138,          0
	 14442,     9,  500000000,  251845811,        0, -248154189,          0
	 14442,    10,  500000000,  257706935,        0, -242293065,          0
	 14442,    11,  500000000,  251850581,        0, -248149419,          0
	 14442,    12,  500000000,  252553597,        0, -247446403,          0
	 14442,    13,  500000000,  251765063,        0, -248234937,          0
	 14442,    14,  500000000,  252902538,        0, -247097462,          0
	 14442,    15,  500000000, 1009091185,        1,  509091185,  509091185
	 14442,    16,  500000000,  760966632,        1,  260966632,  260966632
	 14442,    17,  500000000,  512866266,        1,   12866266,   12866266
	 14442,    18,  500000000,  264818921,        0, -235181079,          0
	 14442,    19,  500000000,  253024397,        0, -246975603,          0
	 14442,    20,  500000000,  252785150,        0, -247214850,          0
	 14442,    21,  500000000,  252466946,        0, -247533054,          0
	 14442,    22,  500000000, 1459862887,        1,  959862887,  959862887
	 14442,    23,  500000000, 1211903080,        1,  711903080,  711903080
	 14442,    24,  500000000,  963919848,        1,  463919848,  463919848
	# task NAME=myapp PID=14443 COST=200000000 PERIOD=500000000 CPU=0
	 14443,     2,  500000000,      58150,        0, -499941850,          0
	 14443,     3,  500000000,    3202178,        0, -496797822,          0
	 14443,     4,  500000000,  201662924,        0, -298337076,          0
	 14443,     5,  500000000,  213828161,        0, -286171839,          0
	 14443,     6,  500000000,  202532002,        0, -297467998,          0
	 14443,     7,  500000000,  961643647,        1,  461643647,  461643647
	 14443,     8,  500000000,  663707479,        1,  163707479,  163707479
	 14443,     9,  500000000,  365603701,        0, -134396299,          0
	 14443,    10,  500000000,  201910605,        0, -298089395,          0
	 14443,    11,  500000000,  209025099,        0, -290974901,          0
	 14443,    12,  500000000,  210602663,        0, -289397337,          0
	 14443,    13,  500000000,  247544048,        0, -252455952,          0
	 14443,    14,  500000000,  459680759,        0,  -40319241,          0
	 14443,    15,  500000000,  202763167,        0, -297236833,          0
	 14443,    16,  500000000,  202368570,        0, -297631430,          0
	 14443,    17,  500000000,  201857551,        0, -298142449,          0
	 14443,    18,  500000000,  201999263,        0, -298000737,          0
	 14443,    19,  500000000,  958909178,        1,  458909178,  458909178
	 14443,    20,  500000000,  660591085,        1,  160591085,  160591085
	 14443,    21,  500000000,  362025653,        0, -137974347,          0
	 14443,    22,  500000000,  202565748,        0, -297434252,          0
	 14443,    23,  500000000,  202456104,        0, -297543896,          0
	 14443,    24,  500000000,  202528628,        0, -297471372,          0

	 .....
	[ 328638446] RELEASE        14443/14    on CPU  0 328638946.00ms
	[ 328638446] RELEASE        14442/14    on CPU  1 328638946.00ms
	[ 328638446] SWITCH_TO      14442/14    on CPU  0
	[ 328638698] SWITCH_FROM    14442/14    on CPU  0
	[ 328638698] COMPLETION     14442/14    on CPU  0
	[ 328638704] SWITCH_TO      14443/14    on CPU  1
	[ 328638905] SWITCH_FROM    14443/14    on CPU  1
	[ 328638905] COMPLETION     14443/14    on CPU  1
	[ 328638946] RELEASE        14442/15    on CPU  0 328639446.00ms
	[ 328638946] RELEASE        14443/15    on CPU  1 328639446.00ms
	[ 328638946] SWITCH_TO      14443/15    on CPU  0
	[ 328639148] SWITCH_FROM    14443/15    on CPU  0
	[ 328639148] COMPLETION     14443/15    on CPU  0
	[ 328639446] RELEASE        14443/16    on CPU  0 328639946.00ms
	[ 328639446] RELEASE        14442/16    on CPU  1 328639946.00ms
	[ 328639446] SWITCH_TO      14443/16    on CPU  0
	[ 328639648] SWITCH_FROM    14443/16    on CPU  0
	[ 328639648] COMPLETION     14443/16    on CPU  0
	[ 328639703] SWITCH_TO      14442/15    on CPU  1
	[ 328639946] RELEASE        14442/17    on CPU  1 328640446.00ms
	[ 328639946] RELEASE        14443/17    on CPU  0 328640446.00ms
	[ 328639946] SWITCH_TO      14443/17    on CPU  0
	[ 328639955] SWITCH_FROM    14442/15    on CPU  1
	[ 328639955] COMPLETION     14442/15    on CPU  1
	[ 328639955] SWITCH_TO      14442/16    on CPU  1
	[ 328640147] SWITCH_FROM    14443/17    on CPU  0
	[ 328640147] COMPLETION     14443/17    on CPU  0
	[ 328640206] SWITCH_FROM    14442/16    on CPU  1
	[ 328640206] COMPLETION     14442/16    on CPU  1
	[ 328640206] SWITCH_TO      14442/17    on CPU  1
	.....

It is interesting to note that when both job 15's are released, the job for PID 14443 is 
allowed to run. however, for PID 14442, the job does not run until the next period!

It has also been observed that running two tasks on a guest with 8 VCPUs sees the same
task misses.  But by artificailly loading the system by running 6 instances of "yes > 
/dev/null", all jobs meet their deadlines.

Any help with understanding this would be very much appreciated.  

Thank you,
Geoffrey