[LITMUS^RT] Container implementation idea

Mon Feb 27 21:48:59 CET 2012

>
> Followup question: Do these changes allow for backwards-compatibility with
> existing Litmus plugins?  That is, core data structures are only changed to
> an extent that existing non-container plugins would only need small (or no)
> updates.

Ack, forgot to mention that. I believe minor changes to existing plugins
will be required, but nothing major. This work would mostly be related to
using struct rt_task instead of struct task_struct to make scheduling
decisions. For example, all plugins would be returning struct rt_tasks from
their _schedule() methods, and some code would have to change to reflect
the migration of data to task->rt_param.server (the task's struct rt_task)
from struct rt_params.

I want to keep changes minor, but I don't think it will be possible to
cause zero changes.

On Mon, Feb 27, 2012 at 3:04 PM, Glenn Elliott <gelliott at cs.unc.edu> wrote:

>
> On Feb 27, 2012, at 2:54 PM, Jonathan Herman wrote:
>
>
>
> On Mon, Feb 27, 2012 at 1:23 PM, Glenn Elliott <gelliott at cs.unc.edu>wrote:
>
>> On Feb 26, 2012, at 9:40 PM, Jonathan Herman wrote:
>>
>> I will be adding container support to litmus over the next
>> year. Exciting. My goals are to support a hierarchical container
>> scheme where each container can have its own scheduling policy and a
>> single container can be run on multiple CPUs simultaneously.
>>
>> This is my idea and I would love it to be critiqued.
>>
>> *-- Interface --*
>> Tasks should be able to join and leave containers dynamically, but I
>> don't intend to support (initially) dynamic creation of a container
>> hierarchy. Instead, plugins will configure their own containers.
>>
>> I intend to completely copy the container proc interface from
>> Linux. The interface looks something like this:
>>  - <plugin>/<container-id>*/tasks
>> Tasks are added and removed from containers by echoing their IDs into
>> the tasks file of each container.
>>
>> For example, a CEDF plugin with containers would work as follows:
>> 1. The scheduling plugin CODE would create a container for each
>> cluster, and the container framework would automatically expose them
>> under proc:
>>  - CEDF/0/tasks
>>  - CEDF/1/tasks
>>  - CEDF/2/tasks
>>  - CEDF/3/tasks
>> 2. The scheduling plugin USER would create tasks, and echo their PIDs
>> into these containers as he/she sees fit.
>>
>>
>> Having to use fwrite() to join a container seems a bit heavy handed.  Why
>> the departure from the mechanism used to specify CPU partitioning?  Perhaps
>> a system call could return to the user a description of the container
>> hierarchy.  A task could traverse this hierarchy and join the container
>> with a given identifier.  I would also appreciate an interface, to be used
>> from within the kernel, for migrating tasks between containers.
>>
>> What happens when a scheduled task changes containers?
>>
>
> The following interface description might address these issues better:
> A Kernel interface will be provided for adding tasks to and moving tasks
> between containers. Kernelspace (e.g. plugins) may use this interface
> directly to migrate tasks between containers. The interface is exposed to
> userspace through the proc interface mentioned above.
>
> The proc interface is designed for simplicity when scripting and running
> task sets. Using a method similar to -p to access hierarchies of containers
> could get confusing (e.g., how do I remove a task from a container?).
> Additionally, the interface is expandable later if we want to add
> additional per-container features, such as memory allocation. However,
> should tasks need to dynamically leave and join containers at run time from
> -userspace- at performance-critical times, the overhead of fwrite() may be
> an issue. Calls would have to be added to liblitmus for container
> membership for these cases.
>
> Thanks for pointing that out, I was only picturing tasks entering and
> leaving containers during initialization.
>
> >
> >
> > What happens when a scheduled task changes containers?
>
> I imagine the result would be:
>
> source->policy->ops->remove(source, task);
> /* As that container was previously running, it now selects
>  * another task to run after task is unlinked.
>  */
> dest->policy->ops->add(dest, task);
> /* If this container is running and task is of sufficiently
>  * high priority, it will now run on one of dest's rt_cpus.
>  */
>
>  So unless the other container can run the task, the scheduled task will
> have to stop.
>
>>
>> *-- The framework --*
>> The framework I'm showing is designed with the following thoughts:
>> * Eventual support for less ad-hoc slack stealing
>> * Not breaking locking code too much
>> * Trying to minimize amount of extra data being passed around
>> * Keeping as much container code out of the plugins as possible
>>
>> The basic idea is to have rt_task's become the fundamental schedulable
>> entity in Litmus. An rt_task can correspond to a struct task_struct,
>> as they do now, or an rt_cpu. I want to use these rt_tasks to abstract
>> out the code needed to manage container hierarchies. Per-plugin
>> scheduling code should not have to manage containers at all after
>> initialization.
>>
>> An rt_cpu is a virtual processor which can run other tasks. It can
>> have a task which is @linked to it, and it optionally enforces budget
>> with an @timer. The virtual processor is run when its corresponding
>> rt_task, or @server, is selected to run. When the rt_cpu is selected
>> to run, it chooses a task to execute by asking its corresponding
>> @container.
>>
>> struct rt_cpu {
>> unsigned int cpu; /* Perhaps an RT_GLOBAL macro corresponds to
>>    * a wandering global virtual processor?
>>    */
>>  struct rt_task *server; /* 0xdeadbeef for no server maybe?
>>  * I'm thinking of doing something
>>  * special for servers which have
>>  * full utilization of a processor,
>>  * as servers in the BASE container
>>  * will.
>>  */
>> struct rt_task *linked; /* What is logically running */
>>
>> struct enforcement_timer timer;
>>  struct bheap_node *hn; /* For membership in heaps */
>>
>>  struct rt_container *container; /* Clearly necessary */
>> };
>>
>> An rt_container struct is a group of tasks scheduled together. The
>> container
>> can run tasks when one or more @procs are selected to run. When a
>> container can run a task, it selects the next task to run using a
>> @policy.
>>
>> struct rt_container {
>> /* Potentially one of these for each CPU */
>>  struct rt_cpu *procs;
>> cpumask_t cpus; /* Or perhaps num_cpus? I want O(1) access to
>>  * partitioned CPUs, but a container may also
>>  * have multiple global rt_cpus. I am not sure
>>  * how to accomplish O(1) access with global
>>  * rt_cpus. Ideas? I'll try to think of something.
>>  */
>>
>> /* To create the container hieararchy */
>>  struct rt_container *parent;
>>
>>  /* The per-container method for scheduling container task */
>> struct *rt_policy policy;
>>
>> /* Metadata kept seperate from the rest of the container
>>  * because it is not used often. E.g. a task list, proc
>>  * entries, or a container name would be stored in this.
>>  */
>> struct rt_cont_data *data;
>> };
>>
>>
>> Is a given policy instance static?  That is, is a single G-EDF policy
>> instance shared between containers, or are there several distinct G-EDF
>> policy instances, one per container?
>>
>
> The policy->ops are static, in that all G-EDF policies will point to the
> same ops struct. The data wrapped around the rt_policy struct, like
> rt_domains, is distinct per container.
>
>
>>
>> In general, I like this interface.  It appears clean to me.
>>
> Thanks!
>
> --
> Jonathan Herman
> Department of Computer Science at UNC Chapel Hill
>
>
>
> Followup question: Do these changes allow for backwards-compatibility with
> existing Litmus plugins?  That is, core data structures are only changed to
> an extent that existing non-container plugins would only need small (or no)
> updates.
>
> -Glenn
>
>
> _______________________________________________
> litmus-dev mailing list
> litmus-dev at lists.litmus-rt.org
> https://lists.litmus-rt.org/listinfo/litmus-dev
>
>

-- 
Jonathan Herman
Department of Computer Science at UNC Chapel Hill
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.litmus-rt.org/pipermail/litmus-dev/attachments/20120227/bed79dab/attachment.html>