[LITMUS^RT] Container implementation idea
Glenn Elliott
gelliott at cs.unc.edu
Mon Feb 27 21:04:43 CET 2012
On Feb 27, 2012, at 2:54 PM, Jonathan Herman wrote:
>
>
> On Mon, Feb 27, 2012 at 1:23 PM, Glenn Elliott <gelliott at cs.unc.edu> wrote:
> On Feb 26, 2012, at 9:40 PM, Jonathan Herman wrote:
>
>> I will be adding container support to litmus over the next
>> year. Exciting. My goals are to support a hierarchical container
>> scheme where each container can have its own scheduling policy and a
>> single container can be run on multiple CPUs simultaneously.
>>
>> This is my idea and I would love it to be critiqued.
>>
>> -- Interface --
>> Tasks should be able to join and leave containers dynamically, but I
>> don't intend to support (initially) dynamic creation of a container
>> hierarchy. Instead, plugins will configure their own containers.
>>
>> I intend to completely copy the container proc interface from
>> Linux. The interface looks something like this:
>> - <plugin>/<container-id>*/tasks
>> Tasks are added and removed from containers by echoing their IDs into
>> the tasks file of each container.
>>
>> For example, a CEDF plugin with containers would work as follows:
>> 1. The scheduling plugin CODE would create a container for each
>> cluster, and the container framework would automatically expose them
>> under proc:
>> - CEDF/0/tasks
>> - CEDF/1/tasks
>> - CEDF/2/tasks
>> - CEDF/3/tasks
>> 2. The scheduling plugin USER would create tasks, and echo their PIDs
>> into these containers as he/she sees fit.
>
> Having to use fwrite() to join a container seems a bit heavy handed. Why the departure from the mechanism used to specify CPU partitioning? Perhaps a system call could return to the user a description of the container hierarchy. A task could traverse this hierarchy and join the container with a given identifier. I would also appreciate an interface, to be used from within the kernel, for migrating tasks between containers.
>
> What happens when a scheduled task changes containers?
>
> The following interface description might address these issues better:
> A Kernel interface will be provided for adding tasks to and moving tasks between containers. Kernelspace (e.g. plugins) may use this interface directly to migrate tasks between containers. The interface is exposed to userspace through the proc interface mentioned above.
>
> The proc interface is designed for simplicity when scripting and running task sets. Using a method similar to -p to access hierarchies of containers could get confusing (e.g., how do I remove a task from a container?). Additionally, the interface is expandable later if we want to add additional per-container features, such as memory allocation. However, should tasks need to dynamically leave and join containers at run time from -userspace- at performance-critical times, the overhead of fwrite() may be an issue. Calls would have to be added to liblitmus for container membership for these cases.
>
> Thanks for pointing that out, I was only picturing tasks entering and leaving containers during initialization.
>
> >
> >
> > What happens when a scheduled task changes containers?
>
> I imagine the result would be:
>
> source->policy->ops->remove(source, task);
> /* As that container was previously running, it now selects
> * another task to run after task is unlinked.
> */
> dest->policy->ops->add(dest, task);
> /* If this container is running and task is of sufficiently
> * high priority, it will now run on one of dest's rt_cpus.
> */
>
> So unless the other container can run the task, the scheduled task will have to stop.
>
>> -- The framework --
>> The framework I'm showing is designed with the following thoughts:
>> * Eventual support for less ad-hoc slack stealing
>> * Not breaking locking code too much
>> * Trying to minimize amount of extra data being passed around
>> * Keeping as much container code out of the plugins as possible
>>
>> The basic idea is to have rt_task's become the fundamental schedulable
>> entity in Litmus. An rt_task can correspond to a struct task_struct,
>> as they do now, or an rt_cpu. I want to use these rt_tasks to abstract
>> out the code needed to manage container hierarchies. Per-plugin
>> scheduling code should not have to manage containers at all after initialization.
>>
>> An rt_cpu is a virtual processor which can run other tasks. It can
>> have a task which is @linked to it, and it optionally enforces budget
>> with an @timer. The virtual processor is run when its corresponding
>> rt_task, or @server, is selected to run. When the rt_cpu is selected
>> to run, it chooses a task to execute by asking its corresponding
>> @container.
>>
>> struct rt_cpu {
>> unsigned int cpu; /* Perhaps an RT_GLOBAL macro corresponds to
>> * a wandering global virtual processor?
>> */
>> struct rt_task *server; /* 0xdeadbeef for no server maybe?
>> * I'm thinking of doing something
>> * special for servers which have
>> * full utilization of a processor,
>> * as servers in the BASE container
>> * will.
>> */
>> struct rt_task *linked; /* What is logically running */
>>
>> struct enforcement_timer timer;
>> struct bheap_node *hn; /* For membership in heaps */
>>
>> struct rt_container *container; /* Clearly necessary */
>> };
>>
>> An rt_container struct is a group of tasks scheduled together. The container
>> can run tasks when one or more @procs are selected to run. When a
>> container can run a task, it selects the next task to run using a
>> @policy.
>>
>> struct rt_container {
>> /* Potentially one of these for each CPU */
>> struct rt_cpu *procs;
>> cpumask_t cpus; /* Or perhaps num_cpus? I want O(1) access to
>> * partitioned CPUs, but a container may also
>> * have multiple global rt_cpus. I am not sure
>> * how to accomplish O(1) access with global
>> * rt_cpus. Ideas? I'll try to think of something.
>> */
>>
>> /* To create the container hieararchy */
>> struct rt_container *parent;
>>
>> /* The per-container method for scheduling container task */
>> struct *rt_policy policy;
>>
>> /* Metadata kept seperate from the rest of the container
>> * because it is not used often. E.g. a task list, proc
>> * entries, or a container name would be stored in this.
>> */
>> struct rt_cont_data *data;
>> };
>
> Is a given policy instance static? That is, is a single G-EDF policy instance shared between containers, or are there several distinct G-EDF policy instances, one per container?
>
> The policy->ops are static, in that all G-EDF policies will point to the same ops struct. The data wrapped around the rt_policy struct, like rt_domains, is distinct per container.
>
>
> In general, I like this interface. It appears clean to me.
> Thanks!
>
> --
> Jonathan Herman
> Department of Computer Science at UNC Chapel Hill
Followup question: Do these changes allow for backwards-compatibility with existing Litmus plugins? That is, core data structures are only changed to an extent that existing non-container plugins would only need small (or no) updates.
-Glenn
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.litmus-rt.org/pipermail/litmus-dev/attachments/20120227/fbf49d6f/attachment.html>
More information about the litmus-dev
mailing list