[LITMUS^RT] Help with Understanding Scheduling Results

Wed Apr 1 04:22:26 CEST 2015

Hi Meng, 

I received this email on the KVM setup:
<each VM>
8 cores
100% budget
raw qemu image
vda virtio
with the following kernel: Linux localhost 3.4.43-WR5.0.1.10_standard #1 SMP PREEMPT Sat Dec 21 16:28:51 EST 2013 x86_64 GNU/Linux

Also, we observe similar anomalies with the credit scheduler.  

Hello again Björn,

Would you happen to have any more comments or suggestions we could try?

Thank you very much,
Geoffrey

----- Original Message -----
From: "Meng Xu" <xumengpanda at gmail.com>
To: "Geoffrey Tran" <gtran at isi.edu>
Cc: litmus-dev at lists.litmus-rt.org, "Björn Brandenburg" <bbb at mpi-sws.org>, "Mikyung Kang" <mkkang at isi.edu>, "Stephen Crago" <crago at isi.edu>, "John Walters" <jwalters at isi.edu>
Sent: Thursday, March 19, 2015 6:04:14 AM
Subject: Re: [LITMUS^RT] Help with Understanding Scheduling Results

2015-03-19 2:48 GMT-04:00 Geoffrey Tran <gtran at isi.edu>:
> Hi Meng,
>
> Sorry for the late reply.
>
> I didn't do the setup for the KVM instance, so I can't comment on the characteristics. I will try to find out though. I agree that if it isn't a RT scheduler, we cannot make a fair comparison. However, I mentioned it as an example of the case not occurring on a virtualized system.

Another way to compare Xen and KVM is to use the credit or credit2
scheduler on Xen and see if the scheduling anomaly occurs.

Thanks,

Meng

>
> It indeed is very strange that allocating more resources to the VM makes the jobs start missing. I did try pinning VCPUs to their own dedicated PCPUs, and still see the same results.
>
> Thanks,
> Geoffrey
>
> ----- Original Message -----
> From: "Meng Xu" <xumengpanda at gmail.com>
> To: "Geoffrey Tran" <gtran at isi.edu>, litmus-dev at lists.litmus-rt.org
> Cc: "Björn Brandenburg" <bbb at mpi-sws.org>, "Mikyung Kang" <mkkang at isi.edu>, "Stephen Crago" <crago at isi.edu>, "John Walters" <jwalters at isi.edu>
> Sent: Monday, March 16, 2015 9:48:56 AM
> Subject: Re: [LITMUS^RT] Help with Understanding Scheduling Results
>
> Hi Geoffrey,
>
> 2015-03-16 15:07 GMT-04:00 Geoffrey Tran <gtran at isi.edu>:
>> Hi Björn,
>>
>> Thanks for your reply. I've answered your questions below. I hope the formatting did not get disrupted.
>>
>>>----- Original Message -----
>>>From: "Björn Brandenburg" <bbb at mpi-sws.org>
>>>To: "Geoffrey Tran" <gtran at isi.edu>, litmus-dev at lists.litmus-rt.org
>>>Cc: "Mikyung Kang" <mkkang at isi.edu>, "Stephen Crago" <crago at isi.edu>, "John Walters" <jwalters at isi.edu>
>>>Sent: Thursday, March 12, 2015 5:39:09 PM
>>>Subject: Re: [LITMUS^RT] Help with Understanding Scheduling Results
>>>
>>>
>>>> On 12 Mar 2015, at 23:06, Geoffrey Tran <gtran at isi.edu> wrote:
>>>>
>>>> I have been working with LITMUS-RT, but have come across some weird results
>>>> and was hoping someone could help clarify it please.
>>>
>>>Hi Geoffrey,
>>>
>>>thanks for your interest in LITMUS^RT. I’ve not observed any similar behavior; it indeed looks pretty strange. Some questions/comments inline below.
>>>
>>>>
>>>> The machine used is a 16-core system.  8 cores are dedicated to the host,
>>>> while up to the remainig 8 cores are used to run a Xen virtual machine.
>>>> Just in case more background is needed, Xen is a popular hypervisor.  The
>>>> RTDS scheduler is a new real-time scheduler that was originally the RT-Xen
>>>> project.  It is now a part of the upstream Xen project.  The scheduler
>>>> gives each VCPU a guaranteed capacity based on the input parameters.
>>>>
>>>> LITMUS-RT v2014.2 is installed in the guest.  The hypervisor is using the
>>>> rtds scheduler, with each VCPU receiving the full allocation (Budget: 10000,
>>>> Period: 10000).
>>
>>>Which LITMUS^RT scheduler are you using?
>>
>> We are using the GSN-EDF scheduler.
>>
>>>Can you reproduce the issue with LITMUS^RT running on bare metal?
>>
>> When running on bare metal, the issue is not observed. All jobs for each task
>> meet the deadlines.
>>
>>>>
>>>> The issue is as follows:  when running a simple two tasks with less than 50%
>>>> single core utilization each (Cost=200ms, Period=500ms), jobs are still missing
>>>> in certain cases.
>>>> - When we use only 2VCPUs for the guest, success in all jobs is observed
>>>> - When we use 3 or more VCPUs, misses are observed
>>>
>>>Do you use budget enforcement ?
>>
>> No, there is no budget enforcement. This could explain the "rollover" on
>> why each missed job comes in batches, but not the reason why the first job
>> missed, right?
>>
>>>
>>>> The results from st_job_stats and an exerpt of the logs for the 3 VCPU case
>>>> are below:
>>>>         # Task,   Job,     Period,   Response, DL Miss?,   Lateness,  Tardiness
>>>>         # task NAME=myapp PID=14442 COST=250000000 PERIOD=500000000 CPU=0
>>>>          14442,     2,  500000000,      32592,        0, -499967408,          0
>>>>          14442,     3,  500000000,      61805,        0, -499938195,          0
>>>>          14442,     4,  500000000,  291581439,        0, -208418561,          0
>>>>          14442,     5,  500000000,  251964671,        0, -248035329,          0
>>>>          14442,     6,  500000000,  252424182,        0, -247575818,          0
>>>>          14442,     7,  500000000,  251938074,        0, -248061926,          0
>>>>          14442,     8,  500000000,  252145862,        0, -247854138,          0
>>>>          14442,     9,  500000000,  251845811,        0, -248154189,          0
>>>>          14442,    10,  500000000,  257706935,        0, -242293065,          0
>>>>          14442,    11,  500000000,  251850581,        0, -248149419,          0
>>>>          14442,    12,  500000000,  252553597,        0, -247446403,          0
>>>>          14442,    13,  500000000,  251765063,        0, -248234937,          0
>>>>          14442,    14,  500000000,  252902538,        0, -247097462,          0
>>>>          14442,    15,  500000000, 1009091185,        1,  509091185,  509091185
>>>>          14442,    16,  500000000,  760966632,        1,  260966632,  260966632
>>>>          14442,    17,  500000000,  512866266,        1,   12866266,   12866266
>>>>          14442,    18,  500000000,  264818921,        0, -235181079,          0
>>>>          14442,    19,  500000000,  253024397,        0, -246975603,          0
>>>>          14442,    20,  500000000,  252785150,        0, -247214850,          0
>>>>          14442,    21,  500000000,  252466946,        0, -247533054,          0
>>>>          14442,    22,  500000000, 1459862887,        1,  959862887,  959862887
>>>>          14442,    23,  500000000, 1211903080,        1,  711903080,  711903080
>>>>          14442,    24,  500000000,  963919848,        1,  463919848,  463919848
>>>>         # task NAME=myapp PID=14443 COST=200000000 PERIOD=500000000 CPU=0
>>>>          14443,     2,  500000000,      58150,        0, -499941850,          0
>>>>          14443,     3,  500000000,    3202178,        0, -496797822,          0
>>>>          14443,     4,  500000000,  201662924,        0, -298337076,          0
>>>>          14443,     5,  500000000,  213828161,        0, -286171839,          0
>>>>          14443,     6,  500000000,  202532002,        0, -297467998,          0
>>>>          14443,     7,  500000000,  961643647,        1,  461643647,  461643647
>>>>          14443,     8,  500000000,  663707479,        1,  163707479,  163707479
>>>>          14443,     9,  500000000,  365603701,        0, -134396299,          0
>>>>          14443,    10,  500000000,  201910605,        0, -298089395,          0
>>>>          14443,    11,  500000000,  209025099,        0, -290974901,          0
>>>>          14443,    12,  500000000,  210602663,        0, -289397337,          0
>>>>          14443,    13,  500000000,  247544048,        0, -252455952,          0
>>>>          14443,    14,  500000000,  459680759,        0,  -40319241,          0
>>>>          14443,    15,  500000000,  202763167,        0, -297236833,          0
>>>>          14443,    16,  500000000,  202368570,        0, -297631430,          0
>>>>          14443,    17,  500000000,  201857551,        0, -298142449,          0
>>>>          14443,    18,  500000000,  201999263,        0, -298000737,          0
>>>>          14443,    19,  500000000,  958909178,        1,  458909178,  458909178
>>>>          14443,    20,  500000000,  660591085,        1,  160591085,  160591085
>>>>          14443,    21,  500000000,  362025653,        0, -137974347,          0
>>>>          14443,    22,  500000000,  202565748,        0, -297434252,          0
>>>>          14443,    23,  500000000,  202456104,        0, -297543896,          0
>>>>          14443,    24,  500000000,  202528628,        0, -297471372,          0
>>>>
>>>>          .....
>>>>         [ 328638446] RELEASE        14443/14    on CPU  0 328638946.00ms
>>>>         [ 328638446] RELEASE        14442/14    on CPU  1 328638946.00ms
>>>>         [ 328638446] SWITCH_TO      14442/14    on CPU  0
>>>>         [ 328638698] SWITCH_FROM    14442/14    on CPU  0
>>>>         [ 328638698] COMPLETION     14442/14    on CPU  0
>>>>         [ 328638704] SWITCH_TO      14443/14    on CPU  1
>>>>         [ 328638905] SWITCH_FROM    14443/14    on CPU  1
>>>>         [ 328638905] COMPLETION     14443/14    on CPU  1
>>>>         [ 328638946] RELEASE        14442/15    on CPU  0 328639446.00ms
>>>>         [ 328638946] RELEASE        14443/15    on CPU  1 328639446.00ms
>>>>         [ 328638946] SWITCH_TO      14443/15    on CPU  0
>>>>         [ 328639148] SWITCH_FROM    14443/15    on CPU  0
>>>>         [ 328639148] COMPLETION     14443/15    on CPU  0
>>>>         [ 328639446] RELEASE        14443/16    on CPU  0 328639946.00ms
>>>>         [ 328639446] RELEASE        14442/16    on CPU  1 328639946.00ms
>>>
>>>Random observation: the next release of 14443 happens on CPU 0, whereas 14442 is still being released on CPU 1. This means CPU 1 must have set a timer for the release to occur. Usually, it shouldn’t do that unless it either (a) processed a completion or (b) prevented a budget overrun. Either event should have left a trace record (I think). Are you sure the trace is complete (i.e., no missing writes)?
>>>
>>
>> Yes, as far as I know, there were no reported failed writes at the completion of the tracing.
>>
>>>
>>>>         [ 328639446] SWITCH_TO      14443/16    on CPU  0
>>>>         [ 328639648] SWITCH_FROM    14443/16    on CPU  0
>>>>         [ 328639648] COMPLETION     14443/16    on CPU  0
>>>>         [ 328639703] SWITCH_TO      14442/15    on CPU  1
>>>>         [ 328639946] RELEASE        14442/17    on CPU  1 328640446.00ms
>>>>         [ 328639946] RELEASE        14443/17    on CPU  0 328640446.00ms
>>>>         [ 328639946] SWITCH_TO      14443/17    on CPU  0
>>>>         [ 328639955] SWITCH_FROM    14442/15    on CPU  1
>>>>         [ 328639955] COMPLETION     14442/15    on CPU  1
>>>>         [ 328639955] SWITCH_TO      14442/16    on CPU  1
>>>>         [ 328640147] SWITCH_FROM    14443/17    on CPU  0
>>>>         [ 328640147] COMPLETION     14443/17    on CPU  0
>>>>         [ 328640206] SWITCH_FROM    14442/16    on CPU  1
>>>>         [ 328640206] COMPLETION     14442/16    on CPU  1
>>>>         [ 328640206] SWITCH_TO      14442/17    on CPU  1
>>>>         .....
>>>>
>>>>
>>>> It is interesting to note that when both job 15's are released, the job for PID 14443 is
>>>> allowed to run. however, for PID 14442, the job does not run until the next period!
>>>>
>>>> It has also been observed that running two tasks on a guest with 8 VCPUs sees the same
>>>> task misses.  But by artificailly loading the system by running 6 instances of "yes >
>>>> /dev/null", all jobs meet their deadlines.
>>>
>>>Do these instances of yes run as LITMUS^RT tasks or just as regular SCHED_OTHER tasks?
>>
>> The instances of yes run as regular tasks.  I just launched them from the shell without any calls to liblitmus.
>>
>>>
>>>Also, can you provide matching output of the debug trace log (the TRACE() stream)?
>>
>> I've attached two text documents containing the output from st_job_stats followed by the
>> trace logs you've requested. One is for the bare-metal case, one is for the VM. Sorry,
>> we don't have the logs for that exact run available, but this is from a run with the
>> same parameters. The bare metal case is a bit cluttered due to the current system having
>> the preemption tracing enabled.
>>
>>>
>>>I’ve not used LITMUS^RT on Xen. I don’t mean to blame Xen, but it would make it easier to for us if you could reproduce this behavior on bare metal.
>>
>> Again, thanks for your help and comments so far!
>>
>> To provide a little bit of context, We're virtualizing in this case because it
>> looks to be a very interesting area.  The weirdness that we are observing is the
>> fact that the above issues do not occur on bare-metal.  It is very odd becase we
>> do not observe this on KVM either.
>
> How did you configure the KVM? I'm not sure if KVM has the real-time
> scheduler or not. If not, it will not be an apple-to-apple comparison.
> (I'm not trying to defend Xen, I'm just trying to figure out what is
> going wrong.)
>
>>>> The issue is as follows:  when running a simple two tasks with less than 50%
>>>> single core utilization each (Cost=200ms, Period=500ms), jobs are still missing
>>>> in certain cases.
>>>> - When we use only 2VCPUs for the guest, success in all jobs is observed
>>>> - When we use 3 or more VCPUs, misses are observed
>
> The weird thing is why allocating more resource to the VM make the
> schedulable taskset unschedulable? This is more like a scheduling
> anomaly.
>
> Is it possible that the release timer is configured on VCPU 1, but
> VCPU 1 is scheduled out right before the release timer will be
> triggered. This will cause the release delay.  In addition, if you
> just dedicate a cpu to a VCPU and does not allow that cpu to run any
> other VCPUs, will the same weird situation happen?
>
> Thanks,
>
> Meng
>
>
>
> --
>
>
> -----------
> Meng Xu
> PhD Student in Computer and Information Science
> University of Pennsylvania

-- 

-----------
Meng Xu
PhD Student in Computer and Information Science
University of Pennsylvania