[LITMUS^RT] Help with Understanding Scheduling Results
Geoffrey Tran
gtran at isi.edu
Thu Mar 12 23:06:09 CET 2015
Hi,
I have been working with LITMUS-RT, but have come across some weird results
and was hoping someone could help clarify it please.
The machine used is a 16-core system. 8 cores are dedicated to the host,
while up to the remainig 8 cores are used to run a Xen virtual machine.
Just in case more background is needed, Xen is a popular hypervisor. The
RTDS scheduler is a new real-time scheduler that was originally the RT-Xen
project. It is now a part of the upstream Xen project. The scheduler
gives each VCPU a guaranteed capacity based on the input parameters.
LITMUS-RT v2014.2 is installed in the guest. The hypervisor is using the
rtds scheduler, with each VCPU receiving the full allocation (Budget: 10000,
Period: 10000).
The issue is as follows: when running a simple two tasks with less than 50%
single core utilization each (Cost=200ms, Period=500ms), jobs are still missing
in certain cases.
- When we use only 2VCPUs for the guest, success in all jobs is observed
- When we use 3 or more VCPUs, misses are observed
The results from st_job_stats and an exerpt of the logs for the 3 VCPU case
are below:
# Task, Job, Period, Response, DL Miss?, Lateness, Tardiness
# task NAME=myapp PID=14442 COST=250000000 PERIOD=500000000 CPU=0
14442, 2, 500000000, 32592, 0, -499967408, 0
14442, 3, 500000000, 61805, 0, -499938195, 0
14442, 4, 500000000, 291581439, 0, -208418561, 0
14442, 5, 500000000, 251964671, 0, -248035329, 0
14442, 6, 500000000, 252424182, 0, -247575818, 0
14442, 7, 500000000, 251938074, 0, -248061926, 0
14442, 8, 500000000, 252145862, 0, -247854138, 0
14442, 9, 500000000, 251845811, 0, -248154189, 0
14442, 10, 500000000, 257706935, 0, -242293065, 0
14442, 11, 500000000, 251850581, 0, -248149419, 0
14442, 12, 500000000, 252553597, 0, -247446403, 0
14442, 13, 500000000, 251765063, 0, -248234937, 0
14442, 14, 500000000, 252902538, 0, -247097462, 0
14442, 15, 500000000, 1009091185, 1, 509091185, 509091185
14442, 16, 500000000, 760966632, 1, 260966632, 260966632
14442, 17, 500000000, 512866266, 1, 12866266, 12866266
14442, 18, 500000000, 264818921, 0, -235181079, 0
14442, 19, 500000000, 253024397, 0, -246975603, 0
14442, 20, 500000000, 252785150, 0, -247214850, 0
14442, 21, 500000000, 252466946, 0, -247533054, 0
14442, 22, 500000000, 1459862887, 1, 959862887, 959862887
14442, 23, 500000000, 1211903080, 1, 711903080, 711903080
14442, 24, 500000000, 963919848, 1, 463919848, 463919848
# task NAME=myapp PID=14443 COST=200000000 PERIOD=500000000 CPU=0
14443, 2, 500000000, 58150, 0, -499941850, 0
14443, 3, 500000000, 3202178, 0, -496797822, 0
14443, 4, 500000000, 201662924, 0, -298337076, 0
14443, 5, 500000000, 213828161, 0, -286171839, 0
14443, 6, 500000000, 202532002, 0, -297467998, 0
14443, 7, 500000000, 961643647, 1, 461643647, 461643647
14443, 8, 500000000, 663707479, 1, 163707479, 163707479
14443, 9, 500000000, 365603701, 0, -134396299, 0
14443, 10, 500000000, 201910605, 0, -298089395, 0
14443, 11, 500000000, 209025099, 0, -290974901, 0
14443, 12, 500000000, 210602663, 0, -289397337, 0
14443, 13, 500000000, 247544048, 0, -252455952, 0
14443, 14, 500000000, 459680759, 0, -40319241, 0
14443, 15, 500000000, 202763167, 0, -297236833, 0
14443, 16, 500000000, 202368570, 0, -297631430, 0
14443, 17, 500000000, 201857551, 0, -298142449, 0
14443, 18, 500000000, 201999263, 0, -298000737, 0
14443, 19, 500000000, 958909178, 1, 458909178, 458909178
14443, 20, 500000000, 660591085, 1, 160591085, 160591085
14443, 21, 500000000, 362025653, 0, -137974347, 0
14443, 22, 500000000, 202565748, 0, -297434252, 0
14443, 23, 500000000, 202456104, 0, -297543896, 0
14443, 24, 500000000, 202528628, 0, -297471372, 0
.....
[ 328638446] RELEASE 14443/14 on CPU 0 328638946.00ms
[ 328638446] RELEASE 14442/14 on CPU 1 328638946.00ms
[ 328638446] SWITCH_TO 14442/14 on CPU 0
[ 328638698] SWITCH_FROM 14442/14 on CPU 0
[ 328638698] COMPLETION 14442/14 on CPU 0
[ 328638704] SWITCH_TO 14443/14 on CPU 1
[ 328638905] SWITCH_FROM 14443/14 on CPU 1
[ 328638905] COMPLETION 14443/14 on CPU 1
[ 328638946] RELEASE 14442/15 on CPU 0 328639446.00ms
[ 328638946] RELEASE 14443/15 on CPU 1 328639446.00ms
[ 328638946] SWITCH_TO 14443/15 on CPU 0
[ 328639148] SWITCH_FROM 14443/15 on CPU 0
[ 328639148] COMPLETION 14443/15 on CPU 0
[ 328639446] RELEASE 14443/16 on CPU 0 328639946.00ms
[ 328639446] RELEASE 14442/16 on CPU 1 328639946.00ms
[ 328639446] SWITCH_TO 14443/16 on CPU 0
[ 328639648] SWITCH_FROM 14443/16 on CPU 0
[ 328639648] COMPLETION 14443/16 on CPU 0
[ 328639703] SWITCH_TO 14442/15 on CPU 1
[ 328639946] RELEASE 14442/17 on CPU 1 328640446.00ms
[ 328639946] RELEASE 14443/17 on CPU 0 328640446.00ms
[ 328639946] SWITCH_TO 14443/17 on CPU 0
[ 328639955] SWITCH_FROM 14442/15 on CPU 1
[ 328639955] COMPLETION 14442/15 on CPU 1
[ 328639955] SWITCH_TO 14442/16 on CPU 1
[ 328640147] SWITCH_FROM 14443/17 on CPU 0
[ 328640147] COMPLETION 14443/17 on CPU 0
[ 328640206] SWITCH_FROM 14442/16 on CPU 1
[ 328640206] COMPLETION 14442/16 on CPU 1
[ 328640206] SWITCH_TO 14442/17 on CPU 1
.....
It is interesting to note that when both job 15's are released, the job for PID 14443 is
allowed to run. however, for PID 14442, the job does not run until the next period!
It has also been observed that running two tasks on a guest with 8 VCPUs sees the same
task misses. But by artificailly loading the system by running 6 instances of "yes >
/dev/null", all jobs meet their deadlines.
Any help with understanding this would be very much appreciated.
Thank you,
Geoffrey
More information about the litmus-dev
mailing list