[LITMUS^RT] Container implementation idea

Jonathan Herman hermanjl at cs.unc.edu
Mon Feb 27 20:54:02 CET 2012


On Mon, Feb 27, 2012 at 1:23 PM, Glenn Elliott <gelliott at cs.unc.edu> wrote:

> On Feb 26, 2012, at 9:40 PM, Jonathan Herman wrote:
>
> I will be adding container support to litmus over the next
> year. Exciting. My goals are to support a hierarchical container
> scheme where each container can have its own scheduling policy and a
> single container can be run on multiple CPUs simultaneously.
>
> This is my idea and I would love it to be critiqued.
>
> *-- Interface --*
> Tasks should be able to join and leave containers dynamically, but I
> don't intend to support (initially) dynamic creation of a container
> hierarchy. Instead, plugins will configure their own containers.
>
> I intend to completely copy the container proc interface from
> Linux. The interface looks something like this:
>  - <plugin>/<container-id>*/tasks
> Tasks are added and removed from containers by echoing their IDs into
> the tasks file of each container.
>
> For example, a CEDF plugin with containers would work as follows:
> 1. The scheduling plugin CODE would create a container for each
> cluster, and the container framework would automatically expose them
> under proc:
>  - CEDF/0/tasks
>  - CEDF/1/tasks
>  - CEDF/2/tasks
>  - CEDF/3/tasks
> 2. The scheduling plugin USER would create tasks, and echo their PIDs
> into these containers as he/she sees fit.
>
>
> Having to use fwrite() to join a container seems a bit heavy handed.  Why
> the departure from the mechanism used to specify CPU partitioning?  Perhaps
> a system call could return to the user a description of the container
> hierarchy.  A task could traverse this hierarchy and join the container
> with a given identifier.  I would also appreciate an interface, to be used
> from within the kernel, for migrating tasks between containers.
>
> What happens when a scheduled task changes containers?
>

The following interface description might address these issues better:
A Kernel interface will be provided for adding tasks to and moving tasks
between containers. Kernelspace (e.g. plugins) may use this interface
directly to migrate tasks between containers. The interface is exposed to
userspace through the proc interface mentioned above.

The proc interface is designed for simplicity when scripting and running
task sets. Using a method similar to -p to access hierarchies of containers
could get confusing (e.g., how do I remove a task from a container?).
Additionally, the interface is expandable later if we want to add
additional per-container features, such as memory allocation. However,
should tasks need to dynamically leave and join containers at run time from
-userspace- at performance-critical times, the overhead of fwrite() may be
an issue. Calls would have to be added to liblitmus for container
membership for these cases.

Thanks for pointing that out, I was only picturing tasks entering and
leaving containers during initialization.

>
>
> What happens when a scheduled task changes containers?

I imagine the result would be:

source->policy->ops->remove(source, task);
/* As that container was previously running, it now selects
 * another task to run after task is unlinked.
 */
dest->policy->ops->add(dest, task);
/* If this container is running and task is of sufficiently
 * high priority, it will now run on one of dest's rt_cpus.
 */

 So unless the other container can run the task, the scheduled task will
have to stop.

>
> *-- The framework --*
> The framework I'm showing is designed with the following thoughts:
> * Eventual support for less ad-hoc slack stealing
> * Not breaking locking code too much
> * Trying to minimize amount of extra data being passed around
> * Keeping as much container code out of the plugins as possible
>
> The basic idea is to have rt_task's become the fundamental schedulable
> entity in Litmus. An rt_task can correspond to a struct task_struct,
> as they do now, or an rt_cpu. I want to use these rt_tasks to abstract
> out the code needed to manage container hierarchies. Per-plugin
> scheduling code should not have to manage containers at all after
> initialization.
>
> An rt_cpu is a virtual processor which can run other tasks. It can
> have a task which is @linked to it, and it optionally enforces budget
> with an @timer. The virtual processor is run when its corresponding
> rt_task, or @server, is selected to run. When the rt_cpu is selected
> to run, it chooses a task to execute by asking its corresponding
> @container.
>
> struct rt_cpu {
> unsigned int cpu; /* Perhaps an RT_GLOBAL macro corresponds to
>    * a wandering global virtual processor?
>    */
>  struct rt_task *server; /* 0xdeadbeef for no server maybe?
>  * I'm thinking of doing something
>  * special for servers which have
>  * full utilization of a processor,
>  * as servers in the BASE container
>  * will.
>  */
> struct rt_task *linked; /* What is logically running */
>
> struct enforcement_timer timer;
>  struct bheap_node *hn; /* For membership in heaps */
>
>  struct rt_container *container; /* Clearly necessary */
> };
>
> An rt_container struct is a group of tasks scheduled together. The
> container
> can run tasks when one or more @procs are selected to run. When a
> container can run a task, it selects the next task to run using a
> @policy.
>
> struct rt_container {
> /* Potentially one of these for each CPU */
>  struct rt_cpu *procs;
> cpumask_t cpus; /* Or perhaps num_cpus? I want O(1) access to
>  * partitioned CPUs, but a container may also
>  * have multiple global rt_cpus. I am not sure
>  * how to accomplish O(1) access with global
>  * rt_cpus. Ideas? I'll try to think of something.
>  */
>
> /* To create the container hieararchy */
>  struct rt_container *parent;
>
>  /* The per-container method for scheduling container task */
> struct *rt_policy policy;
>
> /* Metadata kept seperate from the rest of the container
>  * because it is not used often. E.g. a task list, proc
>  * entries, or a container name would be stored in this.
>  */
> struct rt_cont_data *data;
> };
>
>
> Is a given policy instance static?  That is, is a single G-EDF policy
> instance shared between containers, or are there several distinct G-EDF
> policy instances, one per container?
>

The policy->ops are static, in that all G-EDF policies will point to the
same ops struct. The data wrapped around the rt_policy struct, like
rt_domains, is distinct per container.


>
> In general, I like this interface.  It appears clean to me.
>
Thanks!

-- 
Jonathan Herman
Department of Computer Science at UNC Chapel Hill
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.litmus-rt.org/pipermail/litmus-dev/attachments/20120227/0c905bd2/attachment.html>


More information about the litmus-dev mailing list