[LITMUS^RT] linked but never scheduled

Glenn Elliott gelliott at cs.unc.edu
Sat Sep 15 20:34:18 CEST 2012


On Sep 15, 2012, at 12:17 PM, Glenn Elliott <gelliott at cs.unc.edu> wrote:

> 
> On Sep 14, 2012, at 8:21 PM, Jonathan Herman <hermanjl at cs.unc.edu> wrote:
> 
>> 
>> On Fri, Sep 14, 2012 at 7:57 PM, Glenn Elliott <gelliott at cs.unc.edu> wrote:
>>> Well, I am stuck in a strange case where a task is linked but never
>>> scheduled (G-EDF):
>>> 
>>> 3903 P0 [check_for_preemptions at litmus/sched_gsn_edf.c:365]:
>>> check_for_preemptions: attempting to link task 1795 to 0
>>> 3904 P0 [gsnedf_get_nearest_available_cpu at litmus/sched_gsn_edf.c:347]: Could
>>> not find an available CPU close to P0
>>> 3905 P0 [__add_ready at litmus/rt_domain.c:312]: rt: adding aux_threads/1799
>>> (0, 4611686018427387903, 4611686018427387903) [inh_task: (nil)/0 (0, 0 0)]
>>> rel=59812157846 to ready queue at 60312058176
>>> 3907 P0 [link_task_to_cpu at litmus/sched_gsn_edf.c:218]: (aux_threads/1799:1)
>>> linked = aux_threads/1795
>>> 3908 P0 [link_task_to_cpu at litmus/sched_gsn_edf.c:219]: (aux_threads/1799:1)
>>> entry->linked = aux_threads/1799
>>> 3909 P0 [link_task_to_cpu at litmus/sched_gsn_edf.c:261]: (aux_threads/1795:2)
>>> linked to 0.
>>> 3914 P1 [gsnedf_tick at litmus/sched_gsn_edf.c:472]: (aux_threads/1796:1) tick
>>> 3915 P2 [gsnedf_tick at litmus/sched_gsn_edf.c:472]: (aux_threads/1797:1) tick
>>> 3916 P3 [gsnedf_tick at litmus/sched_gsn_edf.c:472]: (aux_threads/1798:1) tick
>>> 3917 P0 [gsnedf_tick at litmus/sched_gsn_edf.c:472]: (aux_threads/1799:1) tick
>>> 3918 P1 [gsnedf_tick at litmus/sched_gsn_edf.c:472]: (aux_threads/1796:1) tick
>>> 3919 P2 [gsnedf_tick at litmus/sched_gsn_edf.c:472]: (aux_threads/1797:1) tick
>>> 3920 P3 [gsnedf_tick at litmus/sched_gsn_edf.c:472]: (aux_threads/1798:1) tick
>>> 3921 P0 [gsnedf_tick at litmus/sched_gsn_edf.c:472]: (aux_threads/1799:1) tick
>>> 3922 P1 [gsnedf_tick at litmus/sched_gsn_edf.c:472]: (aux_threads/1796:1) tick
>>> 3923 P2 [gsnedf_tick at litmus/sched_gsn_edf.c:472]: (aux_threads/1797:1) tick
>>> 3924 P3 [gsnedf_tick at litmus/sched_gsn_edf.c:472]: (aux_threads/1798:1) tick
>>> 3925 P0 [gsnedf_tick at litmus/sched_gsn_edf.c:472]: (aux_threads/1799:1) tick
>>> 3926 P1 [gsnedf_tick at litmus/sched_gsn_edf.c:472]: (aux_threads/1796:1) tick
>>> 3927 P2 [gsnedf_tick at litmus/sched_gsn_edf.c:472]: (aux_threads/1797:1) tick
>>> 3928 P3 [gsnedf_tick at litmus/sched_gsn_edf.c:472]: (aux_threads/1798:1) tick
>>> 
>>> 
>>> I updated gsnedf_tick to print the currently scheduled real-time task.  Task
>>> 1795 should be running on P0 instead of 1799.
>>> 
>>> Has anyone ever seen something like this before?
>>> 
>>> Thanks,
>>> Glenn
>>> 
>>> p.s. Ignore the crazy period/relative_deadline for 1799.  Those are just
>>> place holders.  1799 is really just a worker thread that has a statically
>>> low priority and should only run when the system is idle (or according to an
>>> inherited priority).
>>> 
>>> p.p.s. I think there is a bug in litmus.c::litmus_fork().  is_realtime() is
>>> returning false for forked children of a real-time task (expected), so the
>>> fork-copied rt_param state is not reset.  Thus, the child gets the same
>>> rt_params as the parent.  This is probably harmless if we're sure to re-init
>>> any children that transition to real-time.  However, for the sake of
>>> completeness, I wonder if we should execute "reinit_litmus_state(p,0);" for
>>> all children.  I am doing some where tasks are forced to become real-time
>>> from within the kernel (worker threads), and this possible bug bit me.
>>> 
>>> _______________________________________________
>>> litmus-dev mailing list
>>> litmus-dev at lists.litmus-rt.org
>>> https://lists.litmus-rt.org/listinfo/litmus-dev
>>> 
>> 
>> I would bet cashmoney that your task is not preemptable. Your
>> check_for_preempt calls preempt()->preempt_if_preemptable after
>> linking the task, which calls litmus_reschedule if the task is not
>> preemptable. litmus_reschedule ALWAYS hits line 98 of your preempt.c
>> and traces state.
>> 
>> -- 
>> Jonathan Herman
>> Department of Computer Science at UNC Chapel Hill
> 
> 
> I think something else could be going wrong.  I believe the task is preemptible.  litmus_reschedule() is indeed being called.  Here's the code, with some trace statements added in.  I've put the code path that is being followed in bold.
> 
>  66 void litmus_reschedule(int cpu)
>  67 {
>  68     int picked_transition_ok = 0;
>  69     int scheduled_transition_ok = 0;
>  70 
>  71     /* The (remote) CPU could be in any state. */
>  72 
>  73     /* The critical states are TASK_PICKED and TASK_SCHEDULED, as the CPU
>  74      * is not aware of the need to reschedule at this point. */
>  75 
>  76     /* is a context switch in progress? */
>  77     if (cpu_is_in_sched_state(cpu, TASK_PICKED)) {
>  78         picked_transition_ok = sched_state_transition_on(
>  79             cpu, TASK_PICKED, PICKED_WRONG_TASK);
>  80 
>  81         TRACE_CUR("cpu %d: picked_transition_ok = %d\n", cpu, picked_transition_ok);
>  82     }
>  83     else {
>  84         TRACE_CUR("cpu %d: picked_transition_ok = 0 (static)\n", cpu);
>  85     }
>  86 
>  87     if (!picked_transition_ok &&
>  88         cpu_is_in_sched_state(cpu, TASK_SCHEDULED)) {
>  89         /* We either raced with the end of the context switch, or the
>  90          * CPU was in TASK_SCHEDULED anyway. */
>  91         scheduled_transition_ok = sched_state_transition_on(
>  92             cpu, TASK_SCHEDULED, SHOULD_SCHEDULE);
>  93         TRACE_CUR("cpu %d: scheduled_transition_ok = %d\n", cpu, scheduled_transition_ok);
>  94     }
>  95     else {
>  96         TRACE_CUR("cpu %d: scheduled_transition_ok = 0 (static)\n", cpu);
>  97     }
>  98 
>  99     /* If the CPU was in state TASK_SCHEDULED, then we need to cause the
> 100      * scheduler to be invoked. */
> 101     if (scheduled_transition_ok) {
> 102         if (smp_processor_id() == cpu) {
> 103             set_tsk_need_resched(current);
> 104         }
> 105         else {
> 106             smp_send_reschedule(cpu);
> 107         }
> 108     }
> 109 
> 110     TRACE_STATE("%s picked-ok:%d sched-ok:%d\n",
> 111             __FUNCTION__,
> 112             picked_transition_ok,
> 113             scheduled_transition_ok);
> 114 }
> 
> Trace output:
> 3956 P1 [link_task_to_cpu at litmus/sched_gsn_edf.c:218]: (aux_threads/1820:1) linked = aux_threads/1816
> 3957 P1 [link_task_to_cpu at litmus/sched_gsn_edf.c:219]: (aux_threads/1820:1) entry->linked = aux_threads/1820
> 3958 P1 [link_task_to_cpu at litmus/sched_gsn_edf.c:261]: (aux_threads/1816:2) linked to 1. 
> 3962 P1 [preempt_if_preemptable at litmus/sched_plugin.c:36]: (aux_threads/1820:1) preempt_if_preemptable: aux_threads/1820
> 3963 P1 [preempt_if_preemptable at litmus/sched_plugin.c:45]: (aux_threads/1820:1) preempt local cpu.
> 3964 P1 [preempt_if_preemptable at litmus/sched_plugin.c:71]: (aux_threads/1820:1) calling litmus_reschedule()
> 3965 P1 [litmus_reschedule at litmus/preempt.c:81]: (aux_threads/1820:1) cpu 1: picked_transition_ok = 1
> 3966 P1 [litmus_reschedule at litmus/preempt.c:96]: (aux_threads/1820:1) cpu 1: scheduled_transition_ok = 0 (static)
> 3968 P1 [gsnedf_tick at litmus/sched_gsn_edf.c:474]: (aux_threads/1820:1) tick 67358153479
> 3972 P1 [gsnedf_tick at litmus/sched_gsn_edf.c:474]: (aux_threads/1820:1) tick 67359153359
> 
> 
> 
> So it looks like my linked task is not scheduled because the CPU state is TASK_PICKED and picked_transition_ok is set to 1.  I've never worked with this part of Litmus before, so I don't know what is "normal."  Is there supposed to be some sort of deferred safety-net that will schedule the linked task when picked_transition_ok==1 that is not being invoked?
> 
> -Glenn
> 


Following up, is there any way that a task could be scheduled while the CPU state remains TASK_PICKED?  Thread 1785 calls job_completion() on P0.  1789 is linked as the next task to run on P0 and the state of changes from WILL_SCHEDULE to TASK_PICKED.  As I understand it, the state of P0 should soon switch to TASK_SCHEDULED when kernel/sched.c::schedule() calls sched_state_validate_switch().  However, I'm not sure that this is ever being called.  As you see in the log below, gsnedf_tick() on P0 shows that 1789 is indeed scheduled, but the state of P0 never switched to TASK_SCHEDULED.  Is this the correct behavior?  I believe what may be happening is that P0 has an incorrect state; thus, when I try to link a new task to P0, the state transitions from TASK_PICKED to PICKED_WRONG_TASK.  Note that 1789 is in a user-space busy-wait loop, so it will never yield the processor-- it must always be preempted.

91576 P0 [gsnedf_schedule at litmus/sched_gsn_edf.c:857]: (aux_threads/1785:1) blocks:0 out_of_time:0 signal_budget: 0 np:0 sleep:1 preempt:0 state:0 sig:0 
91577 P0 [job_completion at litmus/sched_gsn_edf.c:423]: (aux_threads/1785:1) job_completion(). // 1785 calls job_completion() and wants to give up P0.
91578 P0 [link_task_to_cpu at litmus/sched_gsn_edf.c:218]: (aux_threads/1785:2) linked = (nil)/0
91579 P0 [link_task_to_cpu at litmus/sched_gsn_edf.c:219]: (aux_threads/1785:2) entry->linked = aux_threads/1785
91580 P0 [link_task_to_cpu at litmus/sched_gsn_edf.c:263]: NULL linked to 0. 
91583 P0 [__add_release at litmus/rt_domain.c:353]: (aux_threads/1785:2) add_release(), rel=62349575697
91584 P0 [check_for_preemptions at litmus/sched_gsn_edf.c:365]: check_for_preemptions: attempting to link task 1789 to 0
91585 P0 [gsnedf_get_nearest_available_cpu at litmus/sched_gsn_edf.c:347]: P0 is closest available CPU to P0 
91586 P0 [link_task_to_cpu at litmus/sched_gsn_edf.c:218]: (aux_threads/1785:2) linked = aux_threads/1789
91587 P0 [link_task_to_cpu at litmus/sched_gsn_edf.c:219]: (aux_threads/1785:2) entry->linked = (nil)/0
91588 P0 [link_task_to_cpu at litmus/sched_gsn_edf.c:261]: (aux_threads/1789:1) linked to 0.  // 1789 linked to P0
91593 P0 [preempt_if_preemptable at litmus/sched_plugin.c:36]: (aux_threads/1785:2) preempt_if_preemptable: aux_threads/1785
91594 P0 [preempt_if_preemptable at litmus/sched_plugin.c:45]: (aux_threads/1785:2) preempt local cpu.
91595 P0 [preempt_if_preemptable at litmus/sched_plugin.c:72]: (aux_threads/1785:2) calling litmus_reschedule()
91596 P0 [litmus_reschedule at litmus/preempt.c:84]: (aux_threads/1785:2) cpu 0: picked_transition_ok = 0 (static)
91597 P0 [litmus_reschedule at litmus/preempt.c:96]: (aux_threads/1785:2) cpu 0: scheduled_transition_ok = 0 (static)
91598 P0 [litmus_reschedule at litmus/preempt.c:113]: SCHED_STATE litmus_reschedule picked-ok:0 sched-ok:0
91600 P0 [gsnedf_schedule at litmus/sched_gsn_edf.c:914]: (aux_threads/1789:1) scheduled_on = P0
91601 P0 [gsnedf_schedule at litmus/sched_gsn_edf.c:919]: (aux_threads/1785:2) scheduled_on = NO_CPU
91602 P0 [set_sched_state at include/litmus/preempt.h:75]: SCHED_STATE [P0] 0x4 (WILL_SCHEDULE) -> 0x8 (TASK_PICKED)  // 1789 picked to run.
91603 P0 [gsnedf_tick at litmus/sched_gsn_edf.c:474]: (aux_threads/1789:1) tick 61850029047  // 1789 is scheduled, but P0 has not transitioned to TASK_SCHEDULED
91604 P1 [gsnedf_tick at litmus/sched_gsn_edf.c:474]: (aux_threads/1786:1) tick 61850150117
91605 P2 [gsnedf_tick at litmus/sched_gsn_edf.c:474]: (aux_threads/1787:1) tick 61850278487
91606 P3 [gsnedf_tick at litmus/sched_gsn_edf.c:474]: (aux_threads/1788:1) tick 61850399157
91607 P0 [gsnedf_tick at litmus/sched_gsn_edf.c:474]: (aux_threads/1789:1) tick 61851025147  // 1789 still scheduled.  Still no transition.
91608 P1 [gsnedf_tick at litmus/sched_gsn_edf.c:474]: (aux_threads/1786:1) tick 61851149077
91609 P2 [gsnedf_tick at litmus/sched_gsn_edf.c:474]: (aux_threads/1787:1) tick 61851278397

… lots and lots of ticks …

93599 P0 [gsnedf_tick at litmus/sched_gsn_edf.c:474]: (aux_threads/1789:1) tick 62349024417   // 1789 still ticking on P0
93600 P1 [gsnedf_tick at litmus/sched_gsn_edf.c:474]: (aux_threads/1786:1) tick 62349148547
93601 P2 [gsnedf_tick at litmus/sched_gsn_edf.c:474]: (aux_threads/1787:1) tick 62349278407
93602 P3 [gsnedf_tick at litmus/sched_gsn_edf.c:474]: (aux_threads/1788:1) tick 62349398267
93603 P0 [gsnedf_release_jobs at litmus/sched_gsn_edf.c:401]: (aux_threads/1789:1) gsnedf_release_jobs  // new higher prio job (1785) released
93606 P0 [check_for_preemptions at litmus/sched_gsn_edf.c:365]: check_for_preemptions: attempting to link task 1785 to 0
93607 P0 [gsnedf_get_nearest_available_cpu at litmus/sched_gsn_edf.c:347]: Could not find an available CPU close to P0
93608 P0 [__add_ready at litmus/rt_domain.c:312]: rt: adding aux_threads/1789 (0, 4611686018427387903, 4611686018427387903) [inh_task: (nil)/0 (0, 0 0)] rel=61849803767 to ready queue at 62349602827
93610 P0 [link_task_to_cpu at litmus/sched_gsn_edf.c:218]: (aux_threads/1789:1) linked = aux_threads/1785
93611 P0 [link_task_to_cpu at litmus/sched_gsn_edf.c:219]: (aux_threads/1789:1) entry->linked = aux_threads/1789
93612 P0 [link_task_to_cpu at litmus/sched_gsn_edf.c:261]: (aux_threads/1785:2) linked to 0.
93616 P0 [preempt_if_preemptable at litmus/sched_plugin.c:36]: (aux_threads/1789:1) preempt_if_preemptable: aux_threads/1789
93617 P0 [preempt_if_preemptable at litmus/sched_plugin.c:45]: (aux_threads/1789:1) preempt local cpu.
93618 P0 [preempt_if_preemptable at litmus/sched_plugin.c:72]: (aux_threads/1789:1) calling litmus_reschedule()
93619 P0 [sched_state_transition_on at include/litmus/preempt.h:99]: SCHED_STATE [P0] 0x8 (TASK_PICKED) -> 0x10 (PICKED_WRONG_TASK)  // SCHED_STATE of P0 bad?  Shouldn't it have been TASK_SCHEDULED?
93620 P0 [litmus_reschedule at litmus/preempt.c:81]: (aux_threads/1789:1) cpu 0: picked_transition_ok = 1
93621 P0 [litmus_reschedule at litmus/preempt.c:96]: (aux_threads/1789:1) cpu 0: scheduled_transition_ok = 0 (static)
93622 P0 [litmus_reschedule at litmus/preempt.c:113]: SCHED_STATE litmus_reschedule picked-ok:1 sched-ok:0
93623 P0 [__edf_higher_prio at litmus/edf_common.c:95]: (aux_threads/1789:1) aux_threads/1789 >> aux_threads/1788 --- 0
93624 P0 [gsnedf_tick at litmus/sched_gsn_edf.c:474]: (aux_threads/1789:1) tick 62350024927  // 1789 still ticking away. >:(
93625 P1 [gsnedf_tick at litmus/sched_gsn_edf.c:474]: (aux_threads/1786:1) tick 62350149077

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.litmus-rt.org/pipermail/litmus-dev/attachments/20120915/0b6afaad/attachment.html>


More information about the litmus-dev mailing list