[LITMUS^RT] (SOLVED?) Re: linked but never scheduled

Mon Sep 17 12:12:56 CEST 2012

On Sep 16, 2012, at 9:16 PM, Glenn Elliott wrote:

> All right.  I believe I am getting closer to understanding what is going wrong here.  I think I can generalize it enough that others may be able to understand it from a higher level.  Perhaps they can shed additional light on the subject.
> 
> Things are breaking because I am exercising Litmus in a new way: A parent thread P forces a child thread C to become real-time.  C does not make the request to become real-time itself.

Yes, that's a very likely possibility. As far as I know, Litmus has not seen much use where one task causes another to become a real-time task.

> sched_state_validate_switch() is a part of the Litmus patch and is what transitions the CPU state from TASK_PICKED to TASK_SCHEDULED.  The function is called in schedule(), but not schedule_tail().

It does appear to be missing. I'm surprised this didn't cause problems earlier. 

> 
> SO: Should we call sched_state_validate_switch() in schedule_tail()?  What do we do if sched_state_validate_switch() fails?

If the validation fails, it means that the local scheduling decision raced with a preemption initiated by a remote processor. The right thing to do is to reschedule immediately to see if we picked the wrong task.

>  In schedule(), if sched_state_validate_switch() fails, we perform a retry.  Can we silently ignore a failure from sched_state_validate_switch()?  

No, I suspect that could lead to incorrect results.

> Also note that preempt_enable() is ifdef'ed.  I don't think it is called on x86 because this code appears to execute preemptively.  I get "BUG: using smp_processor_id() in preemptible" errors when I call smp_processor_id() within schedule_tail().  Unfortunately, I believe sched_state_validate_switch() assumes preemption is disabled so it can rule out the possibility of migrations mid-execution.

Why not add an extra layer of preempt_disable()/preempt_enable() before finish_task_switch() in schedule_tail()?

>  We may have to map the run-queue of of the lock held in schedule_tail() to a CPU and then transition the state of that CPU remotely.  Is that safe?

Sorry, I'm not sure what you mean by this.

>  Do we need a whole new approach to handle schedule_tail()?

I think it should suffice to call sched_state_validate_switch() in schedule_tail() and adding the required non-premptivity. If sched_state_validate_switch() fails, then call litmus_reschedule_local(), which will cause the scheduler to be invoked to reconsider its decision at the earliest possible moment.

I hope this helps; please let us know how it goes.

Thanks,
Björn