[LITMUS^RT] (SOLVED?) Re: linked but never scheduled
Björn Brandenburg
bbb at mpi-sws.org
Mon Sep 17 12:12:56 CEST 2012
On Sep 16, 2012, at 9:16 PM, Glenn Elliott wrote:
> All right. I believe I am getting closer to understanding what is going wrong here. I think I can generalize it enough that others may be able to understand it from a higher level. Perhaps they can shed additional light on the subject.
>
> Things are breaking because I am exercising Litmus in a new way: A parent thread P forces a child thread C to become real-time. C does not make the request to become real-time itself.
Yes, that's a very likely possibility. As far as I know, Litmus has not seen much use where one task causes another to become a real-time task.
> sched_state_validate_switch() is a part of the Litmus patch and is what transitions the CPU state from TASK_PICKED to TASK_SCHEDULED. The function is called in schedule(), but not schedule_tail().
It does appear to be missing. I'm surprised this didn't cause problems earlier.
>
> SO: Should we call sched_state_validate_switch() in schedule_tail()? What do we do if sched_state_validate_switch() fails?
If the validation fails, it means that the local scheduling decision raced with a preemption initiated by a remote processor. The right thing to do is to reschedule immediately to see if we picked the wrong task.
> In schedule(), if sched_state_validate_switch() fails, we perform a retry. Can we silently ignore a failure from sched_state_validate_switch()?
No, I suspect that could lead to incorrect results.
> Also note that preempt_enable() is ifdef'ed. I don't think it is called on x86 because this code appears to execute preemptively. I get "BUG: using smp_processor_id() in preemptible" errors when I call smp_processor_id() within schedule_tail(). Unfortunately, I believe sched_state_validate_switch() assumes preemption is disabled so it can rule out the possibility of migrations mid-execution.
Why not add an extra layer of preempt_disable()/preempt_enable() before finish_task_switch() in schedule_tail()?
> We may have to map the run-queue of of the lock held in schedule_tail() to a CPU and then transition the state of that CPU remotely. Is that safe?
Sorry, I'm not sure what you mean by this.
> Do we need a whole new approach to handle schedule_tail()?
I think it should suffice to call sched_state_validate_switch() in schedule_tail() and adding the required non-premptivity. If sched_state_validate_switch() fails, then call litmus_reschedule_local(), which will cause the scheduler to be invoked to reconsider its decision at the earliest possible moment.
I hope this helps; please let us know how it goes.
Thanks,
Björn
More information about the litmus-dev
mailing list