[LITMUS^RT] Questions about Process Migration

Hang Su hangsu.cs at gmail.com
Sun Aug 5 00:05:15 CEST 2012


Hi all:

I am implementing a global scheduler. However,  a process migration problem
makes me confused for a couple of days. Let me first describe my problem
first.

I have a processor with two cores. And each core maintains a rq. I want to
migrate a running processor(1401) from RQ1 to RQ0 in the following case.
When a tick occurs on CPU0, it goes into my scheduling algorithm, a piece
of code about process migration as followings cause my system frozen, since
it does not break while loop.

                 int this_cpu = smp_processor_id();
                 src_rq = task_rq(job->task) ; //(my Target Task with
PID:1401, which is currently running on and located at CPU1)
                 des_rq = cpu_rq(job->mapping_info[0].cpu); //(my target
task's target cpu is CPU0)

                 if(src_rq != des_rq){
                   //count  = 100
                    while( task_running(src_rq,job->task) /* && count >
0*/){
                        smp_send_reschedule(src_rq->cpu);  or
//set_tsk_need_resched(src_rq->curr);
                        //count--;
                    }
                    /*if( task_running(src_rq,job->task) ){
                    snprintf(msg,MSG_SIZE,"zigzag_pack src_rq:%d des_rq:%d
src_rq_curr_pid:%d pid:%d ", src_rq->cpu, des_rq->cpu,src_rq->curr->pid,
job->task->pid);
                    registerr_event(sched_clock(), msg, this_cpu);
                    return 1;
                    }*/
                 }

However, my tracing info shows that my target task(PID:1401) has switched
out and replaced by a process with PID:23. At that time, rq(1)->curr is 23.
So I do not know why my piece of code on CPU0 can not break the while loop.
My scheduling algorithm is protected by a single and global spin_lock.

Time:63625164872 MSG:context_switch prev->(1401),next->(23)
Time:63625166671 MSG:rq:1  curr(23)


In order to avoid system frozen and print error information, I have to
change my code as:

                 int this_cpu = smp_processor_id();
                 src_rq = task_rq(job->task) ; //(my Target Task with
PID:1401, which is currently running on and at CPU1)
                 des_rq = cpu_rq(job->mapping_info[0].cpu); //(my target
task's target cpu is CPU0)

                 if(src_rq != des_rq){
                    count  = 1000;
                    while( task_running(src_rq,job->task)  && count > 0 ){
                        smp_send_reschedule(src_rq->cpu);  or
//set_tsk_need_resched(src_rq->curr);
                        count--;
                    }
                    if( task_running(src_rq,job->task) ){
                    snprintf(msg,MSG_SIZE,"zigzag_pack src_rq:%d des_rq:%d
src_rq_curr_pid:%d pid:%d ", src_rq->cpu, des_rq->cpu,src_rq->curr->pid,
job->task->pid);
                    register_event(sched_clock(), msg, this_cpu);
                    return 1;
                    }
                 }

The tracing info at CPU0 is:
Time:63625166136 MSG:zigzag_pack src_rq:1 des_rq:0 src_rq_curr_pid:1401
pid:1401

I try both of the solutions to trigger CPU1 to reschedule,
smp_send_reschedule(src_rq->cpu)
and set_tsk_need_resched(src_rq->curr), neither works.


If you guys who are experts on this issue, please help me some tips.

Thanks.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.litmus-rt.org/pipermail/litmus-dev/attachments/20120804/0a7f143f/attachment.html>


More information about the litmus-dev mailing list