[LITMUS^RT] Container implementation idea
Jonathan Herman
hermanjl at cs.unc.edu
Mon Feb 27 21:48:59 CET 2012
>
> Followup question: Do these changes allow for backwards-compatibility with
> existing Litmus plugins? That is, core data structures are only changed to
> an extent that existing non-container plugins would only need small (or no)
> updates.
Ack, forgot to mention that. I believe minor changes to existing plugins
will be required, but nothing major. This work would mostly be related to
using struct rt_task instead of struct task_struct to make scheduling
decisions. For example, all plugins would be returning struct rt_tasks from
their _schedule() methods, and some code would have to change to reflect
the migration of data to task->rt_param.server (the task's struct rt_task)
from struct rt_params.
I want to keep changes minor, but I don't think it will be possible to
cause zero changes.
On Mon, Feb 27, 2012 at 3:04 PM, Glenn Elliott <gelliott at cs.unc.edu> wrote:
>
> On Feb 27, 2012, at 2:54 PM, Jonathan Herman wrote:
>
>
>
> On Mon, Feb 27, 2012 at 1:23 PM, Glenn Elliott <gelliott at cs.unc.edu>wrote:
>
>> On Feb 26, 2012, at 9:40 PM, Jonathan Herman wrote:
>>
>> I will be adding container support to litmus over the next
>> year. Exciting. My goals are to support a hierarchical container
>> scheme where each container can have its own scheduling policy and a
>> single container can be run on multiple CPUs simultaneously.
>>
>> This is my idea and I would love it to be critiqued.
>>
>> *-- Interface --*
>> Tasks should be able to join and leave containers dynamically, but I
>> don't intend to support (initially) dynamic creation of a container
>> hierarchy. Instead, plugins will configure their own containers.
>>
>> I intend to completely copy the container proc interface from
>> Linux. The interface looks something like this:
>> - <plugin>/<container-id>*/tasks
>> Tasks are added and removed from containers by echoing their IDs into
>> the tasks file of each container.
>>
>> For example, a CEDF plugin with containers would work as follows:
>> 1. The scheduling plugin CODE would create a container for each
>> cluster, and the container framework would automatically expose them
>> under proc:
>> - CEDF/0/tasks
>> - CEDF/1/tasks
>> - CEDF/2/tasks
>> - CEDF/3/tasks
>> 2. The scheduling plugin USER would create tasks, and echo their PIDs
>> into these containers as he/she sees fit.
>>
>>
>> Having to use fwrite() to join a container seems a bit heavy handed. Why
>> the departure from the mechanism used to specify CPU partitioning? Perhaps
>> a system call could return to the user a description of the container
>> hierarchy. A task could traverse this hierarchy and join the container
>> with a given identifier. I would also appreciate an interface, to be used
>> from within the kernel, for migrating tasks between containers.
>>
>> What happens when a scheduled task changes containers?
>>
>
> The following interface description might address these issues better:
> A Kernel interface will be provided for adding tasks to and moving tasks
> between containers. Kernelspace (e.g. plugins) may use this interface
> directly to migrate tasks between containers. The interface is exposed to
> userspace through the proc interface mentioned above.
>
> The proc interface is designed for simplicity when scripting and running
> task sets. Using a method similar to -p to access hierarchies of containers
> could get confusing (e.g., how do I remove a task from a container?).
> Additionally, the interface is expandable later if we want to add
> additional per-container features, such as memory allocation. However,
> should tasks need to dynamically leave and join containers at run time from
> -userspace- at performance-critical times, the overhead of fwrite() may be
> an issue. Calls would have to be added to liblitmus for container
> membership for these cases.
>
> Thanks for pointing that out, I was only picturing tasks entering and
> leaving containers during initialization.
>
> >
> >
> > What happens when a scheduled task changes containers?
>
> I imagine the result would be:
>
> source->policy->ops->remove(source, task);
> /* As that container was previously running, it now selects
> * another task to run after task is unlinked.
> */
> dest->policy->ops->add(dest, task);
> /* If this container is running and task is of sufficiently
> * high priority, it will now run on one of dest's rt_cpus.
> */
>
> So unless the other container can run the task, the scheduled task will
> have to stop.
>
>>
>> *-- The framework --*
>> The framework I'm showing is designed with the following thoughts:
>> * Eventual support for less ad-hoc slack stealing
>> * Not breaking locking code too much
>> * Trying to minimize amount of extra data being passed around
>> * Keeping as much container code out of the plugins as possible
>>
>> The basic idea is to have rt_task's become the fundamental schedulable
>> entity in Litmus. An rt_task can correspond to a struct task_struct,
>> as they do now, or an rt_cpu. I want to use these rt_tasks to abstract
>> out the code needed to manage container hierarchies. Per-plugin
>> scheduling code should not have to manage containers at all after
>> initialization.
>>
>> An rt_cpu is a virtual processor which can run other tasks. It can
>> have a task which is @linked to it, and it optionally enforces budget
>> with an @timer. The virtual processor is run when its corresponding
>> rt_task, or @server, is selected to run. When the rt_cpu is selected
>> to run, it chooses a task to execute by asking its corresponding
>> @container.
>>
>> struct rt_cpu {
>> unsigned int cpu; /* Perhaps an RT_GLOBAL macro corresponds to
>> * a wandering global virtual processor?
>> */
>> struct rt_task *server; /* 0xdeadbeef for no server maybe?
>> * I'm thinking of doing something
>> * special for servers which have
>> * full utilization of a processor,
>> * as servers in the BASE container
>> * will.
>> */
>> struct rt_task *linked; /* What is logically running */
>>
>> struct enforcement_timer timer;
>> struct bheap_node *hn; /* For membership in heaps */
>>
>> struct rt_container *container; /* Clearly necessary */
>> };
>>
>> An rt_container struct is a group of tasks scheduled together. The
>> container
>> can run tasks when one or more @procs are selected to run. When a
>> container can run a task, it selects the next task to run using a
>> @policy.
>>
>> struct rt_container {
>> /* Potentially one of these for each CPU */
>> struct rt_cpu *procs;
>> cpumask_t cpus; /* Or perhaps num_cpus? I want O(1) access to
>> * partitioned CPUs, but a container may also
>> * have multiple global rt_cpus. I am not sure
>> * how to accomplish O(1) access with global
>> * rt_cpus. Ideas? I'll try to think of something.
>> */
>>
>> /* To create the container hieararchy */
>> struct rt_container *parent;
>>
>> /* The per-container method for scheduling container task */
>> struct *rt_policy policy;
>>
>> /* Metadata kept seperate from the rest of the container
>> * because it is not used often. E.g. a task list, proc
>> * entries, or a container name would be stored in this.
>> */
>> struct rt_cont_data *data;
>> };
>>
>>
>> Is a given policy instance static? That is, is a single G-EDF policy
>> instance shared between containers, or are there several distinct G-EDF
>> policy instances, one per container?
>>
>
> The policy->ops are static, in that all G-EDF policies will point to the
> same ops struct. The data wrapped around the rt_policy struct, like
> rt_domains, is distinct per container.
>
>
>>
>> In general, I like this interface. It appears clean to me.
>>
> Thanks!
>
> --
> Jonathan Herman
> Department of Computer Science at UNC Chapel Hill
>
>
>
> Followup question: Do these changes allow for backwards-compatibility with
> existing Litmus plugins? That is, core data structures are only changed to
> an extent that existing non-container plugins would only need small (or no)
> updates.
>
> -Glenn
>
>
> _______________________________________________
> litmus-dev mailing list
> litmus-dev at lists.litmus-rt.org
> https://lists.litmus-rt.org/listinfo/litmus-dev
>
>
--
Jonathan Herman
Department of Computer Science at UNC Chapel Hill
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.litmus-rt.org/pipermail/litmus-dev/attachments/20120227/bed79dab/attachment.html>
More information about the litmus-dev
mailing list