<div>Hi Chris:</div><div><br></div>Thanks for your explanation. Although it is a global scheduler, it has a run queue for each core. Just like PFair scheduling algorithm, we do some calculation and arrange processes on each core for next quantum.<div>

<br></div><div>Let me first have a look at the  Björn's solution.</div><div><br></div><div>Thanks.<br><br><div class="gmail_quote">On Sat, Aug 4, 2012 at 6:02 PM, Christopher Kenna <span dir="ltr"><<a href="mailto:cjk@cs.unc.edu" target="_blank">cjk@cs.unc.edu</a>></span> wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi Hang,<br>

<br>

It is difficult for me to understand some of the code you have<br>

provided without more information about what your data structures are,<br>

which macros you are using (or if you've redefined them), what locks<br>

are taken when, etc. However, I might be able to provide you with some<br>

advise in general.<br>

<br>

You said that your global scheduler is protected by only a single<br>

global lock. Why do you have have one run queue per processor then?<br>

I'm not saying that this approach is wrong, but I am not sure that you<br>

are gaining additional concurrency in doing this. Also, it might make<br>

it more difficult for you to debug race conditions and deadlock in<br>

your code.<br>

<br>

A "retry-loop" is not the best way to handle process migration.<br>

<br>

The approach taken in LITMUS^RT for migrating processes is usually to<br>

use link-based scheduling. Björn's dissertation provides a good<br>

explanation of what this is in the "Lazy preemptions." section of his<br>

dissertation on page 201 (at the time of this writing, assuming that<br>

Björn does not change the PDF I link to):<br>

<a href="http://www.cs.unc.edu/~bbb/diss/brandenburg-diss.pdf" target="_blank">http://www.cs.unc.edu/~bbb/diss/brandenburg-diss.pdf</a><br>

Take a look at some of the link/unlink code in the existing LITMUS^RT<br>

plugins for another approach to handling migrations.<br>

<br>

I hope that this is helpful to you. Other group members might have<br>

additional advise beyond what I've given here.<br>

<br>

 -- Chris<br>

<div><div class="h5"><br>

On Sat, Aug 4, 2012 at 3:05 PM, Hang Su <<a href="mailto:hangsu.cs@gmail.com">hangsu.cs@gmail.com</a>> wrote:<br>

> Hi all:<br>

><br>

> I am implementing a global scheduler. However,  a process migration problem<br>

> makes me confused for a couple of days. Let me first describe my problem<br>

> first.<br>

><br>

> I have a processor with two cores. And each core maintains a rq. I want to<br>

> migrate a running processor(1401) from RQ1 to RQ0 in the following case.<br>

> When a tick occurs on CPU0, it goes into my scheduling algorithm, a piece of<br>

> code about process migration as followings cause my system frozen, since it<br>

> does not break while loop.<br>

><br>

>                  int this_cpu = smp_processor_id();<br>

>                  src_rq = task_rq(job->task) ; //(my Target Task with<br>

> PID:1401, which is currently running on and located at CPU1)<br>

>                  des_rq = cpu_rq(job->mapping_info[0].cpu); //(my target<br>

> task's target cpu is CPU0)<br>

><br>

>                  if(src_rq != des_rq){<br>

>                    //count  = 100<br>

>                     while( task_running(src_rq,job->task) /* && count ><br>

> 0*/){<br>

>                         smp_send_reschedule(src_rq->cpu);  or<br>

> //set_tsk_need_resched(src_rq->curr);<br>

>                         //count--;<br>

>                     }<br>

>                     /*if( task_running(src_rq,job->task) ){<br>

>                     snprintf(msg,MSG_SIZE,"zigzag_pack src_rq:%d des_rq:%d<br>

> src_rq_curr_pid:%d pid:%d ", src_rq->cpu, des_rq->cpu,src_rq->curr->pid,<br>

> job->task->pid);<br>

>                     registerr_event(sched_clock(), msg, this_cpu);<br>

>                     return 1;<br>

>                     }*/<br>

>                  }<br>

><br>

> However, my tracing info shows that my target task(PID:1401) has switched<br>

> out and replaced by a process with PID:23. At that time, rq(1)->curr is 23.<br>

> So I do not know why my piece of code on CPU0 can not break the while loop.<br>

> My scheduling algorithm is protected by a single and global spin_lock.<br>

><br>

> Time:63625164872 MSG:context_switch prev->(1401),next->(23)<br>

> Time:63625166671 MSG:rq:1  curr(23)<br>

><br>

><br>

> In order to avoid system frozen and print error information, I have to<br>

> change my code as:<br>

><br>

>                  int this_cpu = smp_processor_id();<br>

>                  src_rq = task_rq(job->task) ; //(my Target Task with<br>

> PID:1401, which is currently running on and at CPU1)<br>

>                  des_rq = cpu_rq(job->mapping_info[0].cpu); //(my target<br>

> task's target cpu is CPU0)<br>

><br>

>                  if(src_rq != des_rq){<br>

>                     count  = 1000;<br>

>                     while( task_running(src_rq,job->task)  && count > 0 ){<br>

>                         smp_send_reschedule(src_rq->cpu);  or<br>

> //set_tsk_need_resched(src_rq->curr);<br>

>                         count--;<br>

>                     }<br>

>                     if( task_running(src_rq,job->task) ){<br>

>                     snprintf(msg,MSG_SIZE,"zigzag_pack src_rq:%d des_rq:%d<br>

> src_rq_curr_pid:%d pid:%d ", src_rq->cpu, des_rq->cpu,src_rq->curr->pid,<br>

> job->task->pid);<br>

>                     register_event(sched_clock(), msg, this_cpu);<br>

>                     return 1;<br>

>                     }<br>

>                  }<br>

><br>

> The tracing info at CPU0 is:<br>

> Time:63625166136 MSG:zigzag_pack src_rq:1 des_rq:0 src_rq_curr_pid:1401<br>

> pid:1401<br>

><br>

> I try both of the solutions to trigger CPU1 to reschedule,<br>

> smp_send_reschedule(src_rq->cpu) and set_tsk_need_resched(src_rq->curr),<br>

> neither works.<br>

><br>

><br>

> If you guys who are experts on this issue, please help me some tips.<br>

><br>

> Thanks.<br>

><br>

><br>

</div></div>> _______________________________________________<br>

> litmus-dev mailing list<br>

> <a href="mailto:litmus-dev@lists.litmus-rt.org">litmus-dev@lists.litmus-rt.org</a><br>

> <a href="https://lists.litmus-rt.org/listinfo/litmus-dev" target="_blank">https://lists.litmus-rt.org/listinfo/litmus-dev</a><br>

><br>

<br>

_______________________________________________<br>

litmus-dev mailing list<br>

<a href="mailto:litmus-dev@lists.litmus-rt.org">litmus-dev@lists.litmus-rt.org</a><br>

<a href="https://lists.litmus-rt.org/listinfo/litmus-dev" target="_blank">https://lists.litmus-rt.org/listinfo/litmus-dev</a><br>

</blockquote></div><br></div>