[LITMUS^RT] litmus-dev Digest, Vol 60, Issue 2

Sun Feb 19 23:34:52 CET 2017

On Sun, Feb 19, 2017 at 5:31 PM, Shuai Zhao <zs673 at york.ac.uk> wrote:
> Hi Meng
>
> Thank you for your fast respond.
>
> Yes, the CPU prefetch is already disabled. But there isn't any other options
> of the hardware prefetching. But I guess its should be OK.
>
> The kmalloc() function can be one of the reasons. I will adjust the code.
>
> BTW, is there any other features or facilities that could be disabled to
> minimise the system interferences?

How about the cache prefecthing? Did you disable it?
You also need to keep the memory bus frequency constant.

But I think such large variance may mainly come from the kmalloc().

Best,

Meng

>
>
> On 19 February 2017 at 21:27, <litmus-dev-request at lists.litmus-rt.org>
> wrote:
>>
>> Send litmus-dev mailing list submissions to
>>         litmus-dev at lists.litmus-rt.org
>>
>> To subscribe or unsubscribe via the World Wide Web, visit
>>         https://lists.litmus-rt.org/listinfo/litmus-dev
>> or, via email, send a message with subject or body 'help' to
>>         litmus-dev-request at lists.litmus-rt.org
>>
>> You can reach the person managing the list at
>>         litmus-dev-owner at lists.litmus-rt.org
>>
>> When replying, please edit your Subject line so it is more specific
>> than "Re: Contents of litmus-dev digest..."
>>
>>
>> Today's Topics:
>>
>>    1. A question about feather-trace tool (Shuai Zhao)
>>    2. Re: A question about feather-trace tool (Shuai Zhao)
>>    3. Re: A question about feather-trace tool (Meng Xu)
>>
>>
>> ----------------------------------------------------------------------
>>
>> Message: 1
>> Date: Sun, 19 Feb 2017 20:32:17 +0000
>> From: Shuai Zhao <zs673 at york.ac.uk>
>> To: litmus-dev at lists.litmus-rt.org
>> Subject: [LITMUS^RT] A question about feather-trace tool
>> Message-ID:
>>
>> <CAA133hO+_4Svyx6dJUyUn_rsnY50AfUVxLyOHKcsUN6+vG8UiQ at mail.gmail.com>
>> Content-Type: text/plain; charset="utf-8"
>>
>> Hi Björn
>>
>> I am a student from the University of York working with Alan and Andy to
>> study MrsP nested behaviours now.
>>
>> We now have a full implementation of nested MrsP under Litmus P-FP
>> scheduler and we are now trying to do some evaluations of the
>> implementation overheads.
>>
>> We use the feather-trace tool to trace the overheads of the scheduler
>> (which includes the P-FP schedule function), context switch (includes
>> finish_switch function), mrsp_lock and mrsp_unlock function.
>>
>> During evaluation, we fixed the CPU clock speed, bounds interrupts to cpu
>> 0
>> and isolate other cpus for testing to minimise the interference from the
>> system.
>>
>>
>> However, the result seems werid. Use mrsp_lock as an example: we evaluated
>> the overhead of the following code using the timestamps "TS_LOCK_START"
>> and
>> "TS_LOCK_END".
>>
>> TS_LOCK_START;
>>
>> if (t->rt_param.task_params.helper == NULL) {
>> t->rt_param.task_params.priority = sem->prio_per_cpu[get_partition(t)] <
>> get_priority(t) ? sem->prio_per_cpu[get_partition(t)] : get_priority(t);
>> t->rt_param.task_params.migrated_time = -1;
>> }
>>
>> ticket = atomic_read(&sem->next_ticket);
>> t->rt_param.task_params.ticket = ticket;
>> atomic_inc(&sem->next_ticket);
>>
>> add_task(t, &(sem->tasks_queue->next));
>> t->rt_param.task_params.requesting_lock = sem;
>>
>> TS_LOCK_END;
>>
>> Where function add_task() is as follows:
>>
>> void add_task(struct task_struct* task, struct list_head *head) {
>> struct task_list *taskPtr = (struct task_list *) kmalloc(sizeof(struct
>> task_list), GFP_KERNEL);
>> BUG_ON(taskPtr == NULL);
>>
>> taskPtr->task = task;
>> INIT_LIST_HEAD(&taskPtr->next);
>> list_add_tail(&taskPtr->next, head);
>> }
>>
>> We expect the overheads of the code above to be stable as the time
>> complexity is order 1. However, the testing result gives us a different
>> story, as shown below:
>>
>> Overhead        Overhead        Unit        Samples        MAX
>>  99.9th perc.        99perc.        95th perc.        avg        med
>>  min        std        var
>> MRSP             LOCK             cycles    149985          20089
>> 2220.016            1584            1287            853.9508    939
>>  319     367.314  134918.7
>>
>> As we can see, the max overheads we have is 20089 cycles, which is far
>> bigger than the med/avg value and even the 99.9the perc value.
>>
>> I am confused about this result. I wonder have you meet this situation
>> before? Is there any explanation for the result like this? or is there any
>> way to avoid this?
>>
>> Thank you in advance.
>>
>> Best wishes
>> Shuai
>> -------------- next part --------------
>> An HTML attachment was scrubbed...
>> URL:
>> <http://lists.litmus-rt.org/pipermail/litmus-dev/attachments/20170219/d6859058/attachment-0001.html>
>>
>> ------------------------------
>>
>> Message: 2
>> Date: Sun, 19 Feb 2017 20:37:52 +0000
>> From: Shuai Zhao <zs673 at york.ac.uk>
>> To: litmus-dev at lists.litmus-rt.org
>> Subject: Re: [LITMUS^RT] A question about feather-trace tool
>> Message-ID:
>>
>> <CAA133hM4boX5fLX1+PERwbxtvymOFOonn0wwV2r3q9s3hehBog at mail.gmail.com>
>> Content-Type: text/plain; charset="utf-8"
>>
>> PS: I commented out the original "TS_LOCK_START" and "TS_LOCK_END" stamps
>> in sys_lock function. thanks.
>>
>> On 19 February 2017 at 20:32, Shuai Zhao <zs673 at york.ac.uk> wrote:
>>
>> > Hi Björn
>> >
>> > I am a student from the University of York working with Alan and Andy to
>> > study MrsP nested behaviours now.
>> >
>> > We now have a full implementation of nested MrsP under Litmus P-FP
>> > scheduler and we are now trying to do some evaluations of the
>> > implementation overheads.
>> >
>> > We use the feather-trace tool to trace the overheads of the scheduler
>> > (which includes the P-FP schedule function), context switch (includes
>> > finish_switch function), mrsp_lock and mrsp_unlock function.
>> >
>> > During evaluation, we fixed the CPU clock speed, bounds interrupts to
>> > cpu
>> > 0 and isolate other cpus for testing to minimise the interference from
>> > the
>> > system.
>> >
>> >
>> > However, the result seems werid. Use mrsp_lock as an example: we
>> > evaluated
>> > the overhead of the following code using the timestamps "TS_LOCK_START"
>> > and
>> > "TS_LOCK_END".
>> >
>> > TS_LOCK_START;
>> >
>> > if (t->rt_param.task_params.helper == NULL) {
>> > t->rt_param.task_params.priority = sem->prio_per_cpu[get_partition(t)] <
>> > get_priority(t) ? sem->prio_per_cpu[get_partition(t)] : get_priority(t);
>> > t->rt_param.task_params.migrated_time = -1;
>> > }
>> >
>> > ticket = atomic_read(&sem->next_ticket);
>> > t->rt_param.task_params.ticket = ticket;
>> > atomic_inc(&sem->next_ticket);
>> >
>> > add_task(t, &(sem->tasks_queue->next));
>> > t->rt_param.task_params.requesting_lock = sem;
>> >
>> > TS_LOCK_END;
>> >
>> > Where function add_task() is as follows:
>> >
>> > void add_task(struct task_struct* task, struct list_head *head) {
>> > struct task_list *taskPtr = (struct task_list *) kmalloc(sizeof(struct
>> > task_list), GFP_KERNEL);
>> > BUG_ON(taskPtr == NULL);
>> >
>> > taskPtr->task = task;
>> > INIT_LIST_HEAD(&taskPtr->next);
>> > list_add_tail(&taskPtr->next, head);
>> > }
>> >
>> > We expect the overheads of the code above to be stable as the time
>> > complexity is order 1. However, the testing result gives us a different
>> > story, as shown below:
>> >
>> > Overhead        Overhead        Unit        Samples        MAX
>> >  99.9th perc.        99perc.        95th perc.        avg        med
>> >  min        std        var
>> > MRSP             LOCK             cycles    149985          20089
>> > 2220.016            1584            1287            853.9508    939
>> >  319     367.314  134918.7
>> >
>> > As we can see, the max overheads we have is 20089 cycles, which is far
>> > bigger than the med/avg value and even the 99.9the perc value.
>> >
>> > I am confused about this result. I wonder have you meet this situation
>> > before? Is there any explanation for the result like this? or is there
>> > any
>> > way to avoid this?
>> >
>> > Thank you in advance.
>> >
>> > Best wishes
>> > Shuai
>> >
>> -------------- next part --------------
>> An HTML attachment was scrubbed...
>> URL:
>> <http://lists.litmus-rt.org/pipermail/litmus-dev/attachments/20170219/824d81ab/attachment-0001.html>
>>
>> ------------------------------
>>
>> Message: 3
>> Date: Sun, 19 Feb 2017 16:27:26 -0500
>> From: Meng Xu <xumengpanda at gmail.com>
>> To: litmus-dev at lists.litmus-rt.org
>> Subject: Re: [LITMUS^RT] A question about feather-trace tool
>> Message-ID:
>>
>> <CAENZ-+mfQQ+-5qxbJOyh2UEPFrhvPp2OzsB4dYXUG_r=tNj95A at mail.gmail.com>
>> Content-Type: text/plain; charset=UTF-8
>>
>> On Sun, Feb 19, 2017 at 3:32 PM, Shuai Zhao <zs673 at york.ac.uk> wrote:
>> >
>> > Hi Björn
>>
>>
>> Hi,
>>
>> Can I hijack the question? ;-)
>>
>> >
>> >
>> > I am a student from the University of York working with Alan and Andy to
>> > study MrsP nested behaviours now.
>> >
>> > We now have a full implementation of nested MrsP under Litmus P-FP
>> > scheduler and we are now trying to do some evaluations of the implementation
>> > overheads.
>> >
>> > We use the feather-trace tool to trace the overheads of the scheduler
>> > (which includes the P-FP schedule function), context switch (includes
>> > finish_switch function), mrsp_lock and mrsp_unlock function.
>> >
>> > During evaluation, we fixed the CPU clock speed, bounds interrupts to
>> > cpu 0 and isolate other cpus for testing to minimise the interference from
>> > the system.
>>
>>
>> Did you disable the hardware prefetching mechanisms in BIOS?
>> Maybe you want to disable them as well.
>>
>>
>> >
>> >
>> >
>> >
>> >
>> > However, the result seems werid. Use mrsp_lock as an example: we
>> > evaluated the overhead of the following code using the timestamps
>> > "TS_LOCK_START" and "TS_LOCK_END".
>> >
>> > TS_LOCK_START;
>> >
>> > if (t->rt_param.task_params.helper == NULL) {
>> > t->rt_param.task_params.priority = sem->prio_per_cpu[get_partition(t)] <
>> > get_priority(t) ? sem->prio_per_cpu[get_partition(t)] : get_priority(t);
>> > t->rt_param.task_params.migrated_time = -1;
>> > }
>> >
>> > ticket = atomic_read(&sem->next_ticket);
>> > t->rt_param.task_params.ticket = ticket;
>> > atomic_inc(&sem->next_ticket);
>> >
>> > add_task(t, &(sem->tasks_queue->next));
>> > t->rt_param.task_params.requesting_lock = sem;
>> >
>> > TS_LOCK_END;
>> >
>> > Where function add_task() is as follows:
>> >
>> > void add_task(struct task_struct* task, struct list_head *head) {
>> > struct task_list *taskPtr = (struct task_list *) kmalloc(sizeof(struct
>> > task_list),
>> >
>> > GFP_KERNEL);
>>
>>
>> I guess the variance comes from the kmalloc().
>> kmalloc() latency varies a lot.
>>
>> You should avoid kmalloc() in the critical section.
>> One approach is pre-allocating the space, re-initializing the value
>> every time when you want to use it.
>>
>> You can also measure the overhead value spent in this function to
>> validate my speculation.
>>
>>
>> >
>> > BUG_ON(taskPtr == NULL);
>> >
>> > taskPtr->task = task;
>> > INIT_LIST_HEAD(&taskPtr->next);
>> > list_add_tail(&taskPtr->next, head);
>> > }
>> >
>> > We expect the overheads of the code above to be stable as the time
>> > complexity is order 1. However, the testing result gives us a different
>> > story, as shown below:
>> >
>> > Overhead        Overhead        Unit        Samples        MAX
>> > 99.9th perc.        99perc.        95th perc.        avg        med
>> > min        std        var
>> > MRSP             LOCK             cycles    149985          20089
>> > 2220.016            1584            1287            853.9508    939
>> > 319     367.314  134918.7
>>
>>
>> This is a bit difficulty to read. Next time when you send the new
>> result, could you please send it in column. ;-)
>>
>>
>> >
>> >
>> > As we can see, the max overheads we have is 20089 cycles, which is far
>> > bigger than the med/avg value and even the 99.9the perc value.
>> >
>> > I am confused about this result. I wonder have you meet this situation
>> > before? Is there any explanation for the result like this? or is there any
>> > way to avoid this?
>>
>>
>> You should be able to avoid this, as I mentioned above.
>>
>> Best,
>>
>> Meng
>>
>>
>> --
>> -----------
>> Meng Xu
>> PhD Student in Computer and Information Science
>> University of Pennsylvania
>> http://www.cis.upenn.edu/~mengxu/
>>
>>
>>
>> ------------------------------
>>
>> Subject: Digest Footer
>>
>> _______________________________________________
>> litmus-dev mailing list
>> litmus-dev at lists.litmus-rt.org
>> https://lists.litmus-rt.org/listinfo/litmus-dev
>>
>>
>> ------------------------------
>>
>> End of litmus-dev Digest, Vol 60, Issue 2
>> *****************************************
>
>
>
> _______________________________________________
> litmus-dev mailing list
> litmus-dev at lists.litmus-rt.org
> https://lists.litmus-rt.org/listinfo/litmus-dev
>

-- 
-----------
Meng Xu
PhD Student in Computer and Information Science
University of Pennsylvania
http://www.cis.upenn.edu/~mengxu/