[LITMUS^RT] Announcing PGM^RT!

Mon Feb 24 16:47:26 CET 2014

On Feb 24, 2014, at 5:51 AM, Björn Brandenburg <bbb at mpi-sws.org> wrote:

> 
> On 19 Feb 2014, at 23:15, Glenn Elliott <gelliott at cs.unc.edu> wrote:
>> 
>> I want to share with you a project that we’ve been working on here at UNC: PGM^RT.  PGM^RT is a middleware framework for supporting real-time dataflow graph applications (PGM stands for processing graph method---a formalism for describing dataflow graphs).  PGM^RT simplifies/unifies the signaling and message passing that occurs between threads, and it has real-time-friendly design.  Graphs can be contained within a single process, or be broken up into separate processes.  PGM^RT also supports graphs that span a network.  Currently supported signaling/message-passing IPCs are: custom spinlock-based condition variables built directly on top of Linux futexes (low overheads! (x86/Linux only)), pthread condition variables, named pipes (aka FIFOs), POSIX message queues, and stream sockets (e.g., TCP).
> 
> Hi Glenn,
> 
> this looks quite impressive —-- nice work! I looked a bit through the repository, but wasn't quite sure how I would use the framework. Maybe I missed it, but is there a an easy tutorial-style example?

I’m afraid that I don’t have anything like liblitmus’s base_task.c yet.  However, there are a few examples that I can point to:

1) tools/basictest.cpp: This application sets up a three-level graph, with one source and one sink.  There are four interior nodes.  There are eight edges total.  Here, each of the eight edges take one of the four non-network-based edges IPCs.  No data is actually based among nodes in this example.

2) tools/datapassingtest.cpp: This application passes data between nodes.  There is an example of how an thread can get a reference to a data-passing edge’s write (producer) buffer (https://github.com/GElliott/pgm/blob/master/tools/datapassingtest.cpp#L66) and the edge’s read (consumer) buffer (https://github.com/GElliott/pgm/blob/master/tools/datapassingtest.cpp#L80).  An example of producing/consuming data is in the if/else block here: https://github.com/GElliott/pgm/blob/master/tools/datapassingtest.cpp#L113.

3) tools/sockstreamtest.cpp: Tests the basic TCP-based edge.

4) tools/pgmrt.cpp: You can think of this application as PGM^RT’s rtspin.  However, currently, all data is passed out-of-band and tasks synchronize through token passing.  That is, the PGM^RT edges don’t pass data themselves.  I don’t do that in this example, because pgmrt.cpp was designed to investigate cache reuse among producers and consumer.  I have no idea how the cache would behave if I passed data through FIFOs, etc.

I haven’t included any PGM^RT tests for inter-process graphs.  I need to update those tests for PGM^RT’s current API.

>> Although PGM^RT should work on any POSIX-compatible system, lower overheads can be achieved by using LITMUS^RT.  However, a minor patch to LITMUS^RT is required to properly assign scheduling priorities for deadline schedulers (no patch is needed for fixed-priority scheduling).  Patches have been posted to the LITMUS^RT Publications page: http://wiki.litmus-rt.org/litmus/Publications
> 
> Is this something we should consider merging into LITMUS^RT?
> 
> Thanks,
> Björn

Maybe, maybe not.  I would like this, but there are a few issues to consider:

1) PGM^RT uses C++ under the hood, even though it has a very C-style API.  I know that liblitmus has a C-only policy.  I used C++ for the sake of expediency: boost C++ provides very nice abstractions for managing shared memory and dealing with the file system.  However, I limited myself to a C-style API in case PGM^RT ever had to be C-only.  With some effort, we could eliminate the dependence on C++.  Note: I don’t use boost’s IPCs.  I use the POSIX interfaces for these.

2) There are two patches needed in litmus to support PGM^RT as I designed it:
	(a) A recompilation of deadline upon task wakeup.  Specifically, before a task blocks for tokens/data, it sets a “waiting” flag in the control page.  Upon task wake-up, litmus recomputes the task’s deadline based upon the current time (abs deadline = wake-up time + rel. deadline).  I think we could generalize this to support truly sporadic tasks in litmus.
	(b) Lazy priority boosting.  Producers enter a non-preemptive section while they pass data over the IPCs and wake consumers.  However, I made this non-preemption a little more robust.  After entering the np section, producers set a “pgm-signalling” flag in the control page. If an attempt is made to deschedule this task, the priority of the task is automatically boosted, and the scheduling decision is re-evaluated.  The pgm-signalling flag is cleared, and base priority restored, when the task exits the np section.
	Why do this?  Although PGM^RT uses non-blocking I/O, I was worried that bugs or new IPCs could cause a producer to block while it is in its np section.  I wanted to ensure that the task quickly regains the CPU as soon as it is unblocked.  The np flag alone can’t ensure this because it is not a part of task prioritization (and it probably shouldn’t be).  Np and boosting do similar things, but are handled differently within the kernel.  I would like further discussions on how np sections and boosting should play together in litmus before deciding whether or not lazy boosting is a good long-term solution.

3) PGM^RT is currently released under the BSD license.  I did this because we’re trying to court limited participation from industry partners.  (I want to be able to use it in their proprietary code.)  GPL is generally very scary to them.  However, since there has only been one contributor (me), it would be pretty easy to change the licensing scheme.  We could perhaps do LGPL, or use a dual licensing scheme.  I really have no idea how all this licensing stuff works.  I just picked a license known to be industry-friendly.

If we do want to support PGM^RT in mainline litmus, I would also like to explore the “collect()” system call that Björn suggested when I first started to look at supporting PGM graphs in litmus.  Unlike select(), which blocks until any file descriptor is ready for reading, collect() would block until all file descriptors are ready for reading.  I spent a fair amount of time investigating how collect() could be implemented.  However, filesystem code in Linux is huge, and I wasn’t sure where to begin.  It all looks very complicated.  For example, select(), poll(), and epoll() all do similar things but use different code paths.  At a high level, I imagine that collect() would set some sort of atomic counter to the number of file descriptors passed to it.  The counter would decrement whenever a file becomes ready.  The task would be awoken when the counter reaches 0.  Unfortunately, I’m not sure where to put this code in Linux.  I don’t know where collect() would hook into existing code paths.

There is one feature of PGM^RT that could be useful to liblitmus outside of the context of PGM^RT: There is an implementation of deadlock-free userspace spinlocks (ticketlocks).  The implementation can be compiled to use either litmus’s np sections or interrupt disabling.  There is also a futex-based condition variable that uses this spinlock (pthread’s condition variable requires the use of suspension-based mutexes).  These synchronization primitives could be handy in real-time applications.

-Glenn