[LITMUS^RT] Help with Understanding Scheduling Results

Mon Mar 16 20:48:56 CET 2015

Hi Geoffrey,

2015-03-16 15:07 GMT-04:00 Geoffrey Tran <gtran at isi.edu>:
> Hi Björn,
>
> Thanks for your reply. I've answered your questions below. I hope the formatting did not get disrupted.
>
>>----- Original Message -----
>>From: "Björn Brandenburg" <bbb at mpi-sws.org>
>>To: "Geoffrey Tran" <gtran at isi.edu>, litmus-dev at lists.litmus-rt.org
>>Cc: "Mikyung Kang" <mkkang at isi.edu>, "Stephen Crago" <crago at isi.edu>, "John Walters" <jwalters at isi.edu>
>>Sent: Thursday, March 12, 2015 5:39:09 PM
>>Subject: Re: [LITMUS^RT] Help with Understanding Scheduling Results
>>
>>
>>> On 12 Mar 2015, at 23:06, Geoffrey Tran <gtran at isi.edu> wrote:
>>>
>>> I have been working with LITMUS-RT, but have come across some weird results
>>> and was hoping someone could help clarify it please.
>>
>>Hi Geoffrey,
>>
>>thanks for your interest in LITMUS^RT. I’ve not observed any similar behavior; it indeed looks pretty strange. Some questions/comments inline below.
>>
>>>
>>> The machine used is a 16-core system.  8 cores are dedicated to the host,
>>> while up to the remainig 8 cores are used to run a Xen virtual machine.
>>> Just in case more background is needed, Xen is a popular hypervisor.  The
>>> RTDS scheduler is a new real-time scheduler that was originally the RT-Xen
>>> project.  It is now a part of the upstream Xen project.  The scheduler
>>> gives each VCPU a guaranteed capacity based on the input parameters.
>>>
>>> LITMUS-RT v2014.2 is installed in the guest.  The hypervisor is using the
>>> rtds scheduler, with each VCPU receiving the full allocation (Budget: 10000,
>>> Period: 10000).
>
>>Which LITMUS^RT scheduler are you using?
>
> We are using the GSN-EDF scheduler.
>
>>Can you reproduce the issue with LITMUS^RT running on bare metal?
>
> When running on bare metal, the issue is not observed. All jobs for each task
> meet the deadlines.
>
>>>
>>> The issue is as follows:  when running a simple two tasks with less than 50%
>>> single core utilization each (Cost=200ms, Period=500ms), jobs are still missing
>>> in certain cases.
>>> - When we use only 2VCPUs for the guest, success in all jobs is observed
>>> - When we use 3 or more VCPUs, misses are observed
>>
>>Do you use budget enforcement ?
>
> No, there is no budget enforcement. This could explain the "rollover" on
> why each missed job comes in batches, but not the reason why the first job
> missed, right?
>
>>
>>> The results from st_job_stats and an exerpt of the logs for the 3 VCPU case
>>> are below:
>>>         # Task,   Job,     Period,   Response, DL Miss?,   Lateness,  Tardiness
>>>         # task NAME=myapp PID=14442 COST=250000000 PERIOD=500000000 CPU=0
>>>          14442,     2,  500000000,      32592,        0, -499967408,          0
>>>          14442,     3,  500000000,      61805,        0, -499938195,          0
>>>          14442,     4,  500000000,  291581439,        0, -208418561,          0
>>>          14442,     5,  500000000,  251964671,        0, -248035329,          0
>>>          14442,     6,  500000000,  252424182,        0, -247575818,          0
>>>          14442,     7,  500000000,  251938074,        0, -248061926,          0
>>>          14442,     8,  500000000,  252145862,        0, -247854138,          0
>>>          14442,     9,  500000000,  251845811,        0, -248154189,          0
>>>          14442,    10,  500000000,  257706935,        0, -242293065,          0
>>>          14442,    11,  500000000,  251850581,        0, -248149419,          0
>>>          14442,    12,  500000000,  252553597,        0, -247446403,          0
>>>          14442,    13,  500000000,  251765063,        0, -248234937,          0
>>>          14442,    14,  500000000,  252902538,        0, -247097462,          0
>>>          14442,    15,  500000000, 1009091185,        1,  509091185,  509091185
>>>          14442,    16,  500000000,  760966632,        1,  260966632,  260966632
>>>          14442,    17,  500000000,  512866266,        1,   12866266,   12866266
>>>          14442,    18,  500000000,  264818921,        0, -235181079,          0
>>>          14442,    19,  500000000,  253024397,        0, -246975603,          0
>>>          14442,    20,  500000000,  252785150,        0, -247214850,          0
>>>          14442,    21,  500000000,  252466946,        0, -247533054,          0
>>>          14442,    22,  500000000, 1459862887,        1,  959862887,  959862887
>>>          14442,    23,  500000000, 1211903080,        1,  711903080,  711903080
>>>          14442,    24,  500000000,  963919848,        1,  463919848,  463919848
>>>         # task NAME=myapp PID=14443 COST=200000000 PERIOD=500000000 CPU=0
>>>          14443,     2,  500000000,      58150,        0, -499941850,          0
>>>          14443,     3,  500000000,    3202178,        0, -496797822,          0
>>>          14443,     4,  500000000,  201662924,        0, -298337076,          0
>>>          14443,     5,  500000000,  213828161,        0, -286171839,          0
>>>          14443,     6,  500000000,  202532002,        0, -297467998,          0
>>>          14443,     7,  500000000,  961643647,        1,  461643647,  461643647
>>>          14443,     8,  500000000,  663707479,        1,  163707479,  163707479
>>>          14443,     9,  500000000,  365603701,        0, -134396299,          0
>>>          14443,    10,  500000000,  201910605,        0, -298089395,          0
>>>          14443,    11,  500000000,  209025099,        0, -290974901,          0
>>>          14443,    12,  500000000,  210602663,        0, -289397337,          0
>>>          14443,    13,  500000000,  247544048,        0, -252455952,          0
>>>          14443,    14,  500000000,  459680759,        0,  -40319241,          0
>>>          14443,    15,  500000000,  202763167,        0, -297236833,          0
>>>          14443,    16,  500000000,  202368570,        0, -297631430,          0
>>>          14443,    17,  500000000,  201857551,        0, -298142449,          0
>>>          14443,    18,  500000000,  201999263,        0, -298000737,          0
>>>          14443,    19,  500000000,  958909178,        1,  458909178,  458909178
>>>          14443,    20,  500000000,  660591085,        1,  160591085,  160591085
>>>          14443,    21,  500000000,  362025653,        0, -137974347,          0
>>>          14443,    22,  500000000,  202565748,        0, -297434252,          0
>>>          14443,    23,  500000000,  202456104,        0, -297543896,          0
>>>          14443,    24,  500000000,  202528628,        0, -297471372,          0
>>>
>>>          .....
>>>         [ 328638446] RELEASE        14443/14    on CPU  0 328638946.00ms
>>>         [ 328638446] RELEASE        14442/14    on CPU  1 328638946.00ms
>>>         [ 328638446] SWITCH_TO      14442/14    on CPU  0
>>>         [ 328638698] SWITCH_FROM    14442/14    on CPU  0
>>>         [ 328638698] COMPLETION     14442/14    on CPU  0
>>>         [ 328638704] SWITCH_TO      14443/14    on CPU  1
>>>         [ 328638905] SWITCH_FROM    14443/14    on CPU  1
>>>         [ 328638905] COMPLETION     14443/14    on CPU  1
>>>         [ 328638946] RELEASE        14442/15    on CPU  0 328639446.00ms
>>>         [ 328638946] RELEASE        14443/15    on CPU  1 328639446.00ms
>>>         [ 328638946] SWITCH_TO      14443/15    on CPU  0
>>>         [ 328639148] SWITCH_FROM    14443/15    on CPU  0
>>>         [ 328639148] COMPLETION     14443/15    on CPU  0
>>>         [ 328639446] RELEASE        14443/16    on CPU  0 328639946.00ms
>>>         [ 328639446] RELEASE        14442/16    on CPU  1 328639946.00ms
>>
>>Random observation: the next release of 14443 happens on CPU 0, whereas 14442 is still being released on CPU 1. This means CPU 1 must have set a timer for the release to occur. Usually, it shouldn’t do that unless it either (a) processed a completion or (b) prevented a budget overrun. Either event should have left a trace record (I think). Are you sure the trace is complete (i.e., no missing writes)?
>>
>
> Yes, as far as I know, there were no reported failed writes at the completion of the tracing.
>
>>
>>>         [ 328639446] SWITCH_TO      14443/16    on CPU  0
>>>         [ 328639648] SWITCH_FROM    14443/16    on CPU  0
>>>         [ 328639648] COMPLETION     14443/16    on CPU  0
>>>         [ 328639703] SWITCH_TO      14442/15    on CPU  1
>>>         [ 328639946] RELEASE        14442/17    on CPU  1 328640446.00ms
>>>         [ 328639946] RELEASE        14443/17    on CPU  0 328640446.00ms
>>>         [ 328639946] SWITCH_TO      14443/17    on CPU  0
>>>         [ 328639955] SWITCH_FROM    14442/15    on CPU  1
>>>         [ 328639955] COMPLETION     14442/15    on CPU  1
>>>         [ 328639955] SWITCH_TO      14442/16    on CPU  1
>>>         [ 328640147] SWITCH_FROM    14443/17    on CPU  0
>>>         [ 328640147] COMPLETION     14443/17    on CPU  0
>>>         [ 328640206] SWITCH_FROM    14442/16    on CPU  1
>>>         [ 328640206] COMPLETION     14442/16    on CPU  1
>>>         [ 328640206] SWITCH_TO      14442/17    on CPU  1
>>>         .....
>>>
>>>
>>> It is interesting to note that when both job 15's are released, the job for PID 14443 is
>>> allowed to run. however, for PID 14442, the job does not run until the next period!
>>>
>>> It has also been observed that running two tasks on a guest with 8 VCPUs sees the same
>>> task misses.  But by artificailly loading the system by running 6 instances of "yes >
>>> /dev/null", all jobs meet their deadlines.
>>
>>Do these instances of yes run as LITMUS^RT tasks or just as regular SCHED_OTHER tasks?
>
> The instances of yes run as regular tasks.  I just launched them from the shell without any calls to liblitmus.
>
>>
>>Also, can you provide matching output of the debug trace log (the TRACE() stream)?
>
> I've attached two text documents containing the output from st_job_stats followed by the
> trace logs you've requested. One is for the bare-metal case, one is for the VM. Sorry,
> we don't have the logs for that exact run available, but this is from a run with the
> same parameters. The bare metal case is a bit cluttered due to the current system having
> the preemption tracing enabled.
>
>>
>>I’ve not used LITMUS^RT on Xen. I don’t mean to blame Xen, but it would make it easier to for us if you could reproduce this behavior on bare metal.
>
> Again, thanks for your help and comments so far!
>
> To provide a little bit of context, We're virtualizing in this case because it
> looks to be a very interesting area.  The weirdness that we are observing is the
> fact that the above issues do not occur on bare-metal.  It is very odd becase we
> do not observe this on KVM either.

How did you configure the KVM? I'm not sure if KVM has the real-time
scheduler or not. If not, it will not be an apple-to-apple comparison.
(I'm not trying to defend Xen, I'm just trying to figure out what is
going wrong.)

>>> The issue is as follows:  when running a simple two tasks with less than 50%
>>> single core utilization each (Cost=200ms, Period=500ms), jobs are still missing
>>> in certain cases.
>>> - When we use only 2VCPUs for the guest, success in all jobs is observed
>>> - When we use 3 or more VCPUs, misses are observed

The weird thing is why allocating more resource to the VM make the
schedulable taskset unschedulable? This is more like a scheduling
anomaly.

Is it possible that the release timer is configured on VCPU 1, but
VCPU 1 is scheduled out right before the release timer will be
triggered. This will cause the release delay.  In addition, if you
just dedicate a cpu to a VCPU and does not allow that cpu to run any
other VCPUs, will the same weird situation happen?

Thanks,

Meng

-- 

-----------
Meng Xu
PhD Student in Computer and Information Science
University of Pennsylvania