<div>Hi Chris:</div><div><br></div>Thanks for your explanation. Although it is a global scheduler, it has a run queue for each core. Just like PFair scheduling algorithm, we do some calculation and arrange processes on each core for next quantum.<div>
<br></div><div>Let me first have a look at the Björn's solution.</div><div><br></div><div>Thanks.<br><br><div class="gmail_quote">On Sat, Aug 4, 2012 at 6:02 PM, Christopher Kenna <span dir="ltr"><<a href="mailto:cjk@cs.unc.edu" target="_blank">cjk@cs.unc.edu</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi Hang,<br>
<br>
It is difficult for me to understand some of the code you have<br>
provided without more information about what your data structures are,<br>
which macros you are using (or if you've redefined them), what locks<br>
are taken when, etc. However, I might be able to provide you with some<br>
advise in general.<br>
<br>
You said that your global scheduler is protected by only a single<br>
global lock. Why do you have have one run queue per processor then?<br>
I'm not saying that this approach is wrong, but I am not sure that you<br>
are gaining additional concurrency in doing this. Also, it might make<br>
it more difficult for you to debug race conditions and deadlock in<br>
your code.<br>
<br>
A "retry-loop" is not the best way to handle process migration.<br>
<br>
The approach taken in LITMUS^RT for migrating processes is usually to<br>
use link-based scheduling. Björn's dissertation provides a good<br>
explanation of what this is in the "Lazy preemptions." section of his<br>
dissertation on page 201 (at the time of this writing, assuming that<br>
Björn does not change the PDF I link to):<br>
<a href="http://www.cs.unc.edu/~bbb/diss/brandenburg-diss.pdf" target="_blank">http://www.cs.unc.edu/~bbb/diss/brandenburg-diss.pdf</a><br>
Take a look at some of the link/unlink code in the existing LITMUS^RT<br>
plugins for another approach to handling migrations.<br>
<br>
I hope that this is helpful to you. Other group members might have<br>
additional advise beyond what I've given here.<br>
<br>
-- Chris<br>
<div><div class="h5"><br>
On Sat, Aug 4, 2012 at 3:05 PM, Hang Su <<a href="mailto:hangsu.cs@gmail.com">hangsu.cs@gmail.com</a>> wrote:<br>
> Hi all:<br>
><br>
> I am implementing a global scheduler. However, a process migration problem<br>
> makes me confused for a couple of days. Let me first describe my problem<br>
> first.<br>
><br>
> I have a processor with two cores. And each core maintains a rq. I want to<br>
> migrate a running processor(1401) from RQ1 to RQ0 in the following case.<br>
> When a tick occurs on CPU0, it goes into my scheduling algorithm, a piece of<br>
> code about process migration as followings cause my system frozen, since it<br>
> does not break while loop.<br>
><br>
> int this_cpu = smp_processor_id();<br>
> src_rq = task_rq(job->task) ; //(my Target Task with<br>
> PID:1401, which is currently running on and located at CPU1)<br>
> des_rq = cpu_rq(job->mapping_info[0].cpu); //(my target<br>
> task's target cpu is CPU0)<br>
><br>
> if(src_rq != des_rq){<br>
> //count = 100<br>
> while( task_running(src_rq,job->task) /* && count ><br>
> 0*/){<br>
> smp_send_reschedule(src_rq->cpu); or<br>
> //set_tsk_need_resched(src_rq->curr);<br>
> //count--;<br>
> }<br>
> /*if( task_running(src_rq,job->task) ){<br>
> snprintf(msg,MSG_SIZE,"zigzag_pack src_rq:%d des_rq:%d<br>
> src_rq_curr_pid:%d pid:%d ", src_rq->cpu, des_rq->cpu,src_rq->curr->pid,<br>
> job->task->pid);<br>
> registerr_event(sched_clock(), msg, this_cpu);<br>
> return 1;<br>
> }*/<br>
> }<br>
><br>
> However, my tracing info shows that my target task(PID:1401) has switched<br>
> out and replaced by a process with PID:23. At that time, rq(1)->curr is 23.<br>
> So I do not know why my piece of code on CPU0 can not break the while loop.<br>
> My scheduling algorithm is protected by a single and global spin_lock.<br>
><br>
> Time:63625164872 MSG:context_switch prev->(1401),next->(23)<br>
> Time:63625166671 MSG:rq:1 curr(23)<br>
><br>
><br>
> In order to avoid system frozen and print error information, I have to<br>
> change my code as:<br>
><br>
> int this_cpu = smp_processor_id();<br>
> src_rq = task_rq(job->task) ; //(my Target Task with<br>
> PID:1401, which is currently running on and at CPU1)<br>
> des_rq = cpu_rq(job->mapping_info[0].cpu); //(my target<br>
> task's target cpu is CPU0)<br>
><br>
> if(src_rq != des_rq){<br>
> count = 1000;<br>
> while( task_running(src_rq,job->task) && count > 0 ){<br>
> smp_send_reschedule(src_rq->cpu); or<br>
> //set_tsk_need_resched(src_rq->curr);<br>
> count--;<br>
> }<br>
> if( task_running(src_rq,job->task) ){<br>
> snprintf(msg,MSG_SIZE,"zigzag_pack src_rq:%d des_rq:%d<br>
> src_rq_curr_pid:%d pid:%d ", src_rq->cpu, des_rq->cpu,src_rq->curr->pid,<br>
> job->task->pid);<br>
> register_event(sched_clock(), msg, this_cpu);<br>
> return 1;<br>
> }<br>
> }<br>
><br>
> The tracing info at CPU0 is:<br>
> Time:63625166136 MSG:zigzag_pack src_rq:1 des_rq:0 src_rq_curr_pid:1401<br>
> pid:1401<br>
><br>
> I try both of the solutions to trigger CPU1 to reschedule,<br>
> smp_send_reschedule(src_rq->cpu) and set_tsk_need_resched(src_rq->curr),<br>
> neither works.<br>
><br>
><br>
> If you guys who are experts on this issue, please help me some tips.<br>
><br>
> Thanks.<br>
><br>
><br>
</div></div>> _______________________________________________<br>
> litmus-dev mailing list<br>
> <a href="mailto:litmus-dev@lists.litmus-rt.org">litmus-dev@lists.litmus-rt.org</a><br>
> <a href="https://lists.litmus-rt.org/listinfo/litmus-dev" target="_blank">https://lists.litmus-rt.org/listinfo/litmus-dev</a><br>
><br>
<br>
_______________________________________________<br>
litmus-dev mailing list<br>
<a href="mailto:litmus-dev@lists.litmus-rt.org">litmus-dev@lists.litmus-rt.org</a><br>
<a href="https://lists.litmus-rt.org/listinfo/litmus-dev" target="_blank">https://lists.litmus-rt.org/listinfo/litmus-dev</a><br>
</blockquote></div><br></div>