[LITMUS^RT] Help with Understanding Scheduling Results

Thu Apr 9 04:05:57 CEST 2015

Hi Björn,

We have gone ahead and done the following modification to the linux kernel:

In arch/x86/xen/smp.c:
In the function: static irqreturn_t xen_reschedule_interrupt(int irq, void *dev_id) we add a call to sched_state_ipi(), so
the code looks as follows

/*
 * Reschedule call back.
 */
static irqreturn_t xen_reschedule_interrupt(int irq, void *dev_id)
{
        inc_irq_stat(irq_resched_count);
        scheduler_ipi();

        sched_state_ipi();

        return IRQ_HANDLED;
}

Is there a process to submit a patch?  

>From what we can tell, this seems to have solved the scheduling anomaly. We ran the same 2 task case as previously and
all jobs meet their deadline. Furthermore, we tested one of our workloads in a single VM, and it looks like everything
is behaving as expected.

Thanks again for your help!
Geoffrey

----- Original Message -----
From: "Björn Brandenburg" <bbb at mpi-sws.org>
To: "Geoffrey Tran" <gtran at isi.edu>, litmus-dev at lists.litmus-rt.org
Cc: "Stephen Crago" <crago at isi.edu>, "John Walters" <jwalters at isi.edu>, "Mikyung Kang" <mkkang at isi.edu>
Sent: Tuesday, April 7, 2015 1:25:24 AM
Subject: Re: [LITMUS^RT] Help with Understanding Scheduling Results

Hi everyone,

the issue is caused either by (i) IPIs being dropped or (ii) IPIs being processed incorrectly.

Looking at the trace "2task_Affinity":

1) Job 8 of task 1432 is the first job to miss a deadline.

2) The prior job completes normally on P1.

> 153734 P1 [gsnedf_schedule at litmus/sched_gsn_edf.c:442]: (myapp/1432:7) blocks:0 out_of_time:0 np:0 sleep:1 preempt:0 state:0 sig:0
> 153735 P1 [job_completion at litmus/sched_gsn_edf.c:364]: (myapp/1432:7) job_completion().
> 153736 P1 [__add_release at litmus/rt_domain.c:348]: (myapp/1432:8) add_release(), rel=42135000000
> 153737 P1 [gsnedf_schedule at litmus/sched_gsn_edf.c:490]: (myapp/1432:8) scheduled_on = NO_CPU

3) Processor P1 handles the release of the jobs. First, myapp/1431 is linked locally.

> 153913 P1 [check_for_preemptions at litmus/sched_gsn_edf.c:302]: (myapp/1431:8) linking to local CPU 1 to avoid IPI
> 153914 P1 [sched_state_transition_on at include/litmus/preempt.h:99]: SCHED_STATE [P1] 0x1 (TASK_SCHEDULED) -> 0x2 (SHOULD_SCHEDULE)
> 153915 P1 [set_sched_state at include/litmus/preempt.h:75]: SCHED_STATE [P1] 0x2 (SHOULD_SCHEDULE) -> 0x4 (WILL_SCHEDULE)
> 153916 P1 [sched_state_will_schedule at litmus/preempt.c:36]: (swapper/1/0:0) set_tsk_need_resched() ret:ffffffff8133cf8f
> 153917 P1 [litmus_reschedule at litmus/preempt.c:99]: SCHED_STATE litmus_reschedule picked-ok:0 sched-ok:1

Second, task myapp/1432 is linked to P0.  An IPI is required to notify P0 that it needs to reschedule. The preemption state machine realizes this correctly.

> 153918 P1 [check_for_preemptions at litmus/sched_gsn_edf.c:314]: check_for_preemptions: attempting to link task 1432 to 2
> 153919 P1 [gsnedf_get_nearest_available_cpu at litmus/sched_gsn_edf.c:275]: P0 is closest available CPU to P1

Here, P0 is moved to SHOULD_SCHEDULE state by the *remote* P1.

> 153920 P1 [sched_state_transition_on at include/litmus/preempt.h:99]: SCHED_STATE [P0] 0x1 (TASK_SCHEDULED) -> 0x2 (SHOULD_SCHEDULE)
> 153921 P1 [litmus_reschedule at litmus/preempt.c:99]: SCHED_STATE litmus_reschedule picked-ok:0 sched-ok:1

Crucially, this should trigger an IPI to be sent:

	https://github.com/LITMUS-RT/litmus-rt/blob/master/litmus/preempt.c#L92

4) When an IPI is received by a CPU, it should log either "IPI -> set_tsk_need_resched" or "ignoring IPI in state", as evident in sched_state_ipi():

	https://github.com/LITMUS-RT/litmus-rt/blob/master/litmus/preempt.c#L41

> 
> /* Called by the IPI handler after another CPU called smp_send_resched(). */
> void sched_state_ipi(void)
> {
> 	/* If the IPI was slow, we might be in any state right now. The IPI is
> 	 * only meaningful if we are in SHOULD_SCHEDULE. */
> 	if (is_in_sched_state(SHOULD_SCHEDULE)) {
> 		/* Cause scheduler to be invoked.
> 		 * This will cause a transition to WILL_SCHEDULE. */
> 		set_tsk_need_resched(current);
> 		TRACE_STATE("IPI -> set_tsk_need_resched(%s/%d)\n",
> 			    current->comm, current->pid);
> 		TS_SEND_RESCHED_END;
> 	} else {
> 		/* ignore */
> 		TRACE_STATE("ignoring IPI in state %x (%s)\n",
> 			    get_sched_state(),
> 			    sched_state_name(get_sched_state()));
> 	}
> }

5) However, we see neither message in the trace, ever.

6) When we hear from P0, it realizes that it is in state SHOULD_SCHEDULE and correctly moves to WILL_SCHEDULE.

> 154266 P0 [set_sched_state at include/litmus/preempt.h:75]: SCHED_STATE [P0] 0x2 (SHOULD_SCHEDULE) -> 0x4 (WILL_SCHEDULE)
> 154267 P0 [sched_state_will_schedule at litmus/preempt.c:36]: (swapper/0/0:0) set_tsk_need_resched() ret:ffffffff8108e257
> 154268 P0 [set_sched_state at include/litmus/preempt.h:75]: SCHED_STATE [P0] 0x4 (WILL_SCHEDULE) -> 0x4 (WILL_SCHEDULE)
> 154269 P0 [gsnedf_schedule at litmus/sched_gsn_edf.c:445]: (swapper/0/0:0) will be preempted by myapp/1432
> 154270 P0 [gsnedf_schedule at litmus/sched_gsn_edf.c:485]: (myapp/1432:8) scheduled_on = P0
> 154271 P0 [set_sched_state at include/litmus/preempt.h:75]: SCHED_STATE [P0] 0x4 (WILL_SCHEDULE) -> 0x8 (TASK_PICKED)
> 154272 P0 [litmus_schedule at kernel/sched/litmus.c:51]: (myapp/1432:8) migrate from 1
> 154273 P0 [litmus_schedule at kernel/sched/litmus.c:65]: (myapp/1432:8) stack_in_use=-1
> 154274 P0 [sched_state_transition at include/litmus/preempt.h:85]: SCHED_STATE [P0] 0x8 (TASK_PICKED) -> 0x1 (TASK_SCHEDULED)

=> The scheduler worked ok **once it was notified there is something to do**.

7) set_tsk_need_resched() is called from resched_task() according to the provided .map file. However, resched_task() is not called by LITMUS^RT code, so this is an "coincidental" scheduler activation by some Linux background activity. It was definitely NOT triggered by an IPI, as it should have been.

=>  Conclusion: the IPI supposedly sent in (3) was never processed correctly.

Possible explanation 1: the IPI was dropped by the Xen hypervisor.

Possible explanation 2: the IPI was delivered, but sched_state_ipi() was not called by the IPI handler.

My guess is the explanation 2 is the right one. As part of porting LITMUS^RT to a new architecture, the call to sched_state_ipi() needs to be *manually inserted* into the IPI handler. On *native* x86, this can be seen here:

	https://github.com/LITMUS-RT/litmus-rt/blob/master/arch/x86/kernel/smp.c#L277

Mainline LITMUS^RT has *not* been ported to Xen (otherwise we would list it as a supported platform). A casual check shows that Xen has its own smp.c file (https://github.com/LITMUS-RT/litmus-rt/blob/master/arch/x86/xen/smp.c), which I'm not familiar with.

So... did you verify that in the branch of LITMUS^RT that you are using all architecture-specific changes required by LITMUS^RT have been ported to the Xen architecture-specific files?

- Björn