[LITMUS^RT] Help with Understanding Scheduling Results

Thu Apr 2 18:48:04 CEST 2015

> On 01 Apr 2015, at 04:22, Geoffrey Tran <gtran at isi.edu> wrote:
> 
> Hi Meng, 
> 
> I received this email on the KVM setup:
> <each VM>
> 8 cores
> 100% budget
> raw qemu image
> vda virtio
> with the following kernel: Linux localhost 3.4.43-WR5.0.1.10_standard #1 SMP PREEMPT Sat Dec 21 16:28:51 EST 2013 x86_64 GNU/Linux
> 
> Also, we observe similar anomalies with the credit scheduler.  
> 
> 
> 
> Hello again Björn,
> 
> Would you happen to have any more comments or suggestions we could try?
> 
> Thank you very much,
> Geoffrey

Hi Geoffrey,

(CC'ing Glenn, who knows the affinity code best)

sorry for the long pause. I finally had some time to look at the traces.

First, it looks like your bm_log.txt trace contains ONLY messages related to SCHED_STATE tracing, which is not very useful.

Second, could you please retrace what happens under Xen with CONFIG_SCHED_CPU_AFFINITY disabled? Something goes wrong around the affinity check. I'm not quite sure what's happening, but let's try disabling that code to narrow down the issue.

The first job to miss a deadline is myapp/1535:15, so let's look at that. 

The predecessor myapp/1535:14 completes before its deadline; the task is hence added to the release queue.

278 P1 [gsnedf_schedule at litmus/sched_gsn_edf.c:442]: (myapp/1535:14) blocks:0 out_of_time:0 np:0 sleep:1 preempt:0 state:0 sig:0
279 P1 [job_completion at litmus/sched_gsn_edf.c:364]: (myapp/1535:14) job_completion().
280 P1 [__add_release at litmus/rt_domain.c:348]: (myapp/1535:15) add_release(), rel=620115000000
281 P1 [gsnedf_schedule at litmus/sched_gsn_edf.c:490]: (myapp/1535:15) scheduled_on = NO_CPU

Some time later, the release interrupt is handled by CPU 1. It links myapp/1536:15 to itself...

286 P1 [check_for_preemptions at litmus/sched_gsn_edf.c:302]: (myapp/1536:15) linking to local CPU 1 to avoid IPI

... and then tries initially to link myapp/1535:15 to CPU 4, which however we never see in the trace.

287 P1 [check_for_preemptions at litmus/sched_gsn_edf.c:314]: check_for_preemptions: attempting to link task 1535 to 4

Then the affinity code kicks in and determines that CPU 0 is much closer to CPU 1, which scheduled the task last.

288 P1 [gsnedf_get_nearest_available_cpu at litmus/sched_gsn_edf.c:275]: P0 is closest available CPU to P1

So far so good. Now myapp/1535:15 gets linked to P0. However, we never hear from P0 in the trace until much later.

289 P1 [gsnedf_schedule at litmus/sched_gsn_edf.c:445]: (swapper/1/0:0) will be preempted by myapp/1536
290 P1 [gsnedf_schedule at litmus/sched_gsn_edf.c:485]: (myapp/1536:15) scheduled_on = P1
291 P1 [litmus_schedule at kernel/sched/litmus.c:51]: (myapp/1536:15) migrate from 0
292 P1 [litmus_schedule at kernel/sched/litmus.c:65]: (myapp/1536:15) stack_in_use=-1
293 P1 [gsnedf_schedule at litmus/sched_gsn_edf.c:442]: (myapp/1536:15) blocks:0 out_of_time:0 np:0 sleep:1 preempt:0 state:0 sig:0
294 P1 [job_completion at litmus/sched_gsn_edf.c:364]: (myapp/1536:15) job_completion().
295 P1 [__add_release at litmus/rt_domain.c:348]: (myapp/1536:16) add_release(), rel=620615000000
296 P1 [gsnedf_schedule at litmus/sched_gsn_edf.c:490]: (myapp/1536:16) scheduled_on = NO_CPU
297 P1 [check_for_preemptions at litmus/sched_gsn_edf.c:302]: (myapp/1536:16) linking to local CPU 1 to avoid IPI
298 P1 [gsnedf_schedule at litmus/sched_gsn_edf.c:445]: (swapper/1/0:0) will be preempted by myapp/1536
299 P1 [gsnedf_schedule at litmus/sched_gsn_edf.c:485]: (myapp/1536:16) scheduled_on = P1
300 P0 [gsnedf_schedule at litmus/sched_gsn_edf.c:445]: (swapper/0/0:0) will be preempted by myapp/1535
301 P0 [gsnedf_schedule at litmus/sched_gsn_edf.c:485]: (myapp/1535:15) scheduled_on = P0

At this point, the **next** job of myapp/1536 has already been released, i.e., P0 "was out to lunch" for an entire period and so naturally myapp/1535 has a couple of tardy jobs while it catches up.

The more I think about it, I don't think it has anything to do with the affinity code (but try disabling it anyway). Rather, this could be explained by a "lost" IPI, in which case P0 simply idled until it received an interrupt for some other reason. Are you sure Xen always delivers all IPIs?

By the way, this would also explain why adding 'yes' background tasks helps: they prevent the CPUs from going to sleep and frequently cause I/O, which triggers the scheduler, so a lost IPI wouldn't be noticed.

(Alternatively, it could also be the case that P0 was not scheduled by the hypervisor for an extended amount of time, but I'm sure you've checked this already.)

To further investigate this issue, it would help to enable SCHED_STATE tracing in the kernel running under Xen. I'm happy to take another look once you've collected further traces.

Thanks,
Björn