[LITMUS^RT] litmus-dev Digest, Vol 79, Issue 4

Ricardo Teixeira ricardo.btxr at gmail.com
Sat Dec 22 14:41:18 CET 2018


Hi Björn, thanks again.

The bug was partially fixed. An instruction was missing to verify if the
task was blocked, before the call to  requeue(), inside schedule(). After
that some tasks remained zombies, yet. Now I'm putting more TRACE_TASK()
calls to better understand the behavior.

I noticed that after a task be suspended, it does not execute anymore. I'll
try the next_became_invalid() callback, as suggested.

Regards,

Ricardo

Em sáb, 22 de dez de 2018 às 08:01, <litmus-dev-request at lists.litmus-rt.org>
escreveu:

> Send litmus-dev mailing list submissions to
>         litmus-dev at lists.litmus-rt.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
>         https://lists.litmus-rt.org/listinfo/litmus-dev
> or, via email, send a message with subject or body 'help' to
>         litmus-dev-request at lists.litmus-rt.org
>
> You can reach the person managing the list at
>         litmus-dev-owner at lists.litmus-rt.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of litmus-dev digest..."
>
>
> Today's Topics:
>
>    1. Re: litmus-dev Digest, Vol 79, Issue 2 (Björn Brandenburg)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Fri, 21 Dec 2018 12:58:06 +0100
> From: Björn Brandenburg <bbb at mpi-sws.org>
> To: litmus-dev at lists.litmus-rt.org
> Subject: Re: [LITMUS^RT] litmus-dev Digest, Vol 79, Issue 2
> Message-ID: <F48ECE7B-4375-4703-932E-CE487E500F3D at mpi-sws.org>
> Content-Type: text/plain; charset="utf-8"
>
> On 21. Dec 2018, at 03:24, Ricardo Teixeira <ricardo.btxr at gmail.com>
> wrote:
> >
> > I am implementing a protocol based on MrsP, which involves the migration
> of tasks, but the protocol does not contemplate the suspension of tasks. I
> have not tried to reproduce this for the stock plugins. The plugin I'm
> using was implemented by a colleague, it also involves the migration of
> tasks.
>
> Sorry, not much I can say in that case. Migrations are tricky. There are
> many ways in which a race condition might arise such that LITMUS^RT’s
> migration sanity checker is triggered (which is the code you pointed to).
>
> Basically, the code that triggered observes that the ‘next’ task, which
> was picked by the plugin to be dispatched next, and which needs to be
> pulled from another core’s runqueue, somehow changed state while the
> migration path dropped the remote core’s runqueue lock. Maybe it
> self-suspended due to blocking I/O, maybe it received a signal, maybe it
> hit a page fault, maybe it tried calling a library function not yet
> resolved by the dynamic linker thus triggering a page fault, maybe
> something else?
>
> In all likelihood, this is a fatal bug in the scheduler or locking
> protocol, and you should just fix the bug. However, for edge cases,
> LITMUS^RT provides the next_became_invalid() callback in the plugin API,
> which allows you to do something more clever than just dropping all
> references to the task (which causes it to become an unkillable “zombie”
> task). That said, I would strongly urge you to not just work around the
> problem using next_became_invalid() unless you fully understand the root
> cause and realize that it’s a fundamental limitation and not just a simple
> bug.
>
> My approach to debugging a problem like this would be to simply
> “carpet-bomb” the scheduler plugin and locking protocol implementations
> with TRACE_TASK() and stare at debug traces until I understand exactly what
> sequence of events leads to the sanity checker triggering. Of course, with
> a sufficient number of TRACE_TASK() instances, you might just change the
> timing enough to hide the race, which would make it quite a bit harder to
> debug. Good luck. :-)
>
> Generally, protocols like the MrsP are very difficult to implement
> correctly precisely because it’s tricky to get the migrations right. This
> also applies to varying degrees to the MBWI, OMIP, and MC-IPC protocols.
> Just my 2 cents…
>
> - Björn
>
> -------------- next part --------------
> A non-text attachment was scrubbed...
> Name: smime.p7s
> Type: application/pkcs7-signature
> Size: 5041 bytes
> Desc: not available
> URL: <
> http://lists.litmus-rt.org/pipermail/litmus-dev/attachments/20181221/b76ba27a/attachment-0001.bin
> >
>
> ------------------------------
>
> Subject: Digest Footer
>
> _______________________________________________
> litmus-dev mailing list
> litmus-dev at lists.litmus-rt.org
> https://lists.litmus-rt.org/listinfo/litmus-dev
>
>
> ------------------------------
>
> End of litmus-dev Digest, Vol 79, Issue 4
> *****************************************
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.litmus-rt.org/pipermail/litmus-dev/attachments/20181222/c67d2f3e/attachment.html>


More information about the litmus-dev mailing list