<div dir="ltr"><div dir="ltr"><div dir="ltr"><div>Hi Björn, thanks again.<br><br>The bug was partially fixed. An instruction was missing to verify if the task was blocked, before the call to requeue(), inside schedule(). After that some tasks remained zombies, yet. Now I'm putting more TRACE_TASK() calls to better understand the behavior.<br><br>I noticed that after a task be suspended, it does not execute anymore. I'll try the next_became_invalid() callback, as suggested. <br><br>Regards,<br><br>Ricardo<br></div><br><div class="gmail_quote"><div dir="ltr">Em sáb, 22 de dez de 2018 às 08:01, <<a href="mailto:litmus-dev-request@lists.litmus-rt.org">litmus-dev-request@lists.litmus-rt.org</a>> escreveu:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Send litmus-dev mailing list submissions to<br>
<a href="mailto:litmus-dev@lists.litmus-rt.org" target="_blank">litmus-dev@lists.litmus-rt.org</a><br>
<br>
To subscribe or unsubscribe via the World Wide Web, visit<br>
<a href="https://lists.litmus-rt.org/listinfo/litmus-dev" rel="noreferrer" target="_blank">https://lists.litmus-rt.org/listinfo/litmus-dev</a><br>
or, via email, send a message with subject or body 'help' to<br>
<a href="mailto:litmus-dev-request@lists.litmus-rt.org" target="_blank">litmus-dev-request@lists.litmus-rt.org</a><br>
<br>
You can reach the person managing the list at<br>
<a href="mailto:litmus-dev-owner@lists.litmus-rt.org" target="_blank">litmus-dev-owner@lists.litmus-rt.org</a><br>
<br>
When replying, please edit your Subject line so it is more specific<br>
than "Re: Contents of litmus-dev digest..."<br>
<br>
<br>
Today's Topics:<br>
<br>
1. Re: litmus-dev Digest, Vol 79, Issue 2 (Björn Brandenburg)<br>
<br>
<br>
----------------------------------------------------------------------<br>
<br>
Message: 1<br>
Date: Fri, 21 Dec 2018 12:58:06 +0100<br>
From: Björn Brandenburg <<a href="mailto:bbb@mpi-sws.org" target="_blank">bbb@mpi-sws.org</a>><br>
To: <a href="mailto:litmus-dev@lists.litmus-rt.org" target="_blank">litmus-dev@lists.litmus-rt.org</a><br>
Subject: Re: [LITMUS^RT] litmus-dev Digest, Vol 79, Issue 2<br>
Message-ID: <<a href="mailto:F48ECE7B-4375-4703-932E-CE487E500F3D@mpi-sws.org" target="_blank">F48ECE7B-4375-4703-932E-CE487E500F3D@mpi-sws.org</a>><br>
Content-Type: text/plain; charset="utf-8"<br>
<br>
On 21. Dec 2018, at 03:24, Ricardo Teixeira <<a href="mailto:ricardo.btxr@gmail.com" target="_blank">ricardo.btxr@gmail.com</a>> wrote:<br>
> <br>
> I am implementing a protocol based on MrsP, which involves the migration of tasks, but the protocol does not contemplate the suspension of tasks. I have not tried to reproduce this for the stock plugins. The plugin I'm using was implemented by a colleague, it also involves the migration of tasks.<br>
<br>
Sorry, not much I can say in that case. Migrations are tricky. There are many ways in which a race condition might arise such that LITMUS^RT’s migration sanity checker is triggered (which is the code you pointed to). <br>
<br>
Basically, the code that triggered observes that the ‘next’ task, which was picked by the plugin to be dispatched next, and which needs to be pulled from another core’s runqueue, somehow changed state while the migration path dropped the remote core’s runqueue lock. Maybe it self-suspended due to blocking I/O, maybe it received a signal, maybe it hit a page fault, maybe it tried calling a library function not yet resolved by the dynamic linker thus triggering a page fault, maybe something else? <br>
<br>
In all likelihood, this is a fatal bug in the scheduler or locking protocol, and you should just fix the bug. However, for edge cases, LITMUS^RT provides the next_became_invalid() callback in the plugin API, which allows you to do something more clever than just dropping all references to the task (which causes it to become an unkillable “zombie” task). That said, I would strongly urge you to not just work around the problem using next_became_invalid() unless you fully understand the root cause and realize that it’s a fundamental limitation and not just a simple bug. <br>
<br>
My approach to debugging a problem like this would be to simply “carpet-bomb” the scheduler plugin and locking protocol implementations with TRACE_TASK() and stare at debug traces until I understand exactly what sequence of events leads to the sanity checker triggering. Of course, with a sufficient number of TRACE_TASK() instances, you might just change the timing enough to hide the race, which would make it quite a bit harder to debug. Good luck. :-)<br>
<br>
Generally, protocols like the MrsP are very difficult to implement correctly precisely because it’s tricky to get the migrations right. This also applies to varying degrees to the MBWI, OMIP, and MC-IPC protocols. Just my 2 cents… <br>
<br>
- Björn<br>
<br>
-------------- next part --------------<br>
A non-text attachment was scrubbed...<br>
Name: smime.p7s<br>
Type: application/pkcs7-signature<br>
Size: 5041 bytes<br>
Desc: not available<br>
URL: <<a href="http://lists.litmus-rt.org/pipermail/litmus-dev/attachments/20181221/b76ba27a/attachment-0001.bin" rel="noreferrer" target="_blank">http://lists.litmus-rt.org/pipermail/litmus-dev/attachments/20181221/b76ba27a/attachment-0001.bin</a>><br>
<br>
------------------------------<br>
<br>
Subject: Digest Footer<br>
<br>
_______________________________________________<br>
litmus-dev mailing list<br>
<a href="mailto:litmus-dev@lists.litmus-rt.org" target="_blank">litmus-dev@lists.litmus-rt.org</a><br>
<a href="https://lists.litmus-rt.org/listinfo/litmus-dev" rel="noreferrer" target="_blank">https://lists.litmus-rt.org/listinfo/litmus-dev</a><br>
<br>
<br>
------------------------------<br>
<br>
End of litmus-dev Digest, Vol 79, Issue 4<br>
*****************************************<br>
</blockquote></div></div></div></div>