[LITMUS^RT] Questions about Process Migration
Hang Su
hangsu.cs at gmail.com
Sun Aug 5 00:05:15 CEST 2012
Hi all:
I am implementing a global scheduler. However, a process migration problem
makes me confused for a couple of days. Let me first describe my problem
first.
I have a processor with two cores. And each core maintains a rq. I want to
migrate a running processor(1401) from RQ1 to RQ0 in the following case.
When a tick occurs on CPU0, it goes into my scheduling algorithm, a piece
of code about process migration as followings cause my system frozen, since
it does not break while loop.
int this_cpu = smp_processor_id();
src_rq = task_rq(job->task) ; //(my Target Task with
PID:1401, which is currently running on and located at CPU1)
des_rq = cpu_rq(job->mapping_info[0].cpu); //(my target
task's target cpu is CPU0)
if(src_rq != des_rq){
//count = 100
while( task_running(src_rq,job->task) /* && count >
0*/){
smp_send_reschedule(src_rq->cpu); or
//set_tsk_need_resched(src_rq->curr);
//count--;
}
/*if( task_running(src_rq,job->task) ){
snprintf(msg,MSG_SIZE,"zigzag_pack src_rq:%d des_rq:%d
src_rq_curr_pid:%d pid:%d ", src_rq->cpu, des_rq->cpu,src_rq->curr->pid,
job->task->pid);
registerr_event(sched_clock(), msg, this_cpu);
return 1;
}*/
}
However, my tracing info shows that my target task(PID:1401) has switched
out and replaced by a process with PID:23. At that time, rq(1)->curr is 23.
So I do not know why my piece of code on CPU0 can not break the while loop.
My scheduling algorithm is protected by a single and global spin_lock.
Time:63625164872 MSG:context_switch prev->(1401),next->(23)
Time:63625166671 MSG:rq:1 curr(23)
In order to avoid system frozen and print error information, I have to
change my code as:
int this_cpu = smp_processor_id();
src_rq = task_rq(job->task) ; //(my Target Task with
PID:1401, which is currently running on and at CPU1)
des_rq = cpu_rq(job->mapping_info[0].cpu); //(my target
task's target cpu is CPU0)
if(src_rq != des_rq){
count = 1000;
while( task_running(src_rq,job->task) && count > 0 ){
smp_send_reschedule(src_rq->cpu); or
//set_tsk_need_resched(src_rq->curr);
count--;
}
if( task_running(src_rq,job->task) ){
snprintf(msg,MSG_SIZE,"zigzag_pack src_rq:%d des_rq:%d
src_rq_curr_pid:%d pid:%d ", src_rq->cpu, des_rq->cpu,src_rq->curr->pid,
job->task->pid);
register_event(sched_clock(), msg, this_cpu);
return 1;
}
}
The tracing info at CPU0 is:
Time:63625166136 MSG:zigzag_pack src_rq:1 des_rq:0 src_rq_curr_pid:1401
pid:1401
I try both of the solutions to trigger CPU1 to reschedule,
smp_send_reschedule(src_rq->cpu)
and set_tsk_need_resched(src_rq->curr), neither works.
If you guys who are experts on this issue, please help me some tips.
Thanks.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.litmus-rt.org/pipermail/litmus-dev/attachments/20120804/0a7f143f/attachment.html>
More information about the litmus-dev
mailing list