[LITMUS^RT] Container implementation idea

Mon Feb 27 03:40:33 CET 2012

I will be adding container support to litmus over the next
year. Exciting. My goals are to support a hierarchical container
scheme where each container can have its own scheduling policy and a
single container can be run on multiple CPUs simultaneously.

This is my idea and I would love it to be critiqued.

*-- Interface --*
Tasks should be able to join and leave containers dynamically, but I
don't intend to support (initially) dynamic creation of a container
hierarchy. Instead, plugins will configure their own containers.

I intend to completely copy the container proc interface from
Linux. The interface looks something like this:
 - <plugin>/<container-id>*/tasks
Tasks are added and removed from containers by echoing their IDs into
the tasks file of each container.

For example, a CEDF plugin with containers would work as follows:
1. The scheduling plugin CODE would create a container for each
cluster, and the container framework would automatically expose them
under proc:
 - CEDF/0/tasks
 - CEDF/1/tasks
 - CEDF/2/tasks
 - CEDF/3/tasks
2. The scheduling plugin USER would create tasks, and echo their PIDs
into these containers as he/she sees fit.

*-- The framework --*
The framework I'm showing is designed with the following thoughts:
* Eventual support for less ad-hoc slack stealing
* Not breaking locking code too much
* Trying to minimize amount of extra data being passed around
* Keeping as much container code out of the plugins as possible

The basic idea is to have rt_task's become the fundamental schedulable
entity in Litmus. An rt_task can correspond to a struct task_struct,
as they do now, or an rt_cpu. I want to use these rt_tasks to abstract
out the code needed to manage container hierarchies. Per-plugin
scheduling code should not have to manage containers at all after
initialization.

An rt_cpu is a virtual processor which can run other tasks. It can
have a task which is @linked to it, and it optionally enforces budget
with an @timer. The virtual processor is run when its corresponding
rt_task, or @server, is selected to run. When the rt_cpu is selected
to run, it chooses a task to execute by asking its corresponding
@container.

struct rt_cpu {
unsigned int cpu; /* Perhaps an RT_GLOBAL macro corresponds to
   * a wandering global virtual processor?
   */
 struct rt_task *server; /* 0xdeadbeef for no server maybe?
 * I'm thinking of doing something
 * special for servers which have
 * full utilization of a processor,
 * as servers in the BASE container
 * will.
 */
struct rt_task *linked; /* What is logically running */

struct enforcement_timer timer;
 struct bheap_node *hn; /* For membership in heaps */

 struct rt_container *container; /* Clearly necessary */
};

An rt_container struct is a group of tasks scheduled together. The container
can run tasks when one or more @procs are selected to run. When a
container can run a task, it selects the next task to run using a
@policy.

struct rt_container {
/* Potentially one of these for each CPU */
 struct rt_cpu *procs;
cpumask_t cpus; /* Or perhaps num_cpus? I want O(1) access to
 * partitioned CPUs, but a container may also
 * have multiple global rt_cpus. I am not sure
 * how to accomplish O(1) access with global
 * rt_cpus. Ideas? I'll try to think of something.
 */

/* To create the container hieararchy */
 struct rt_container *parent;

 /* The per-container method for scheduling container task */
struct *rt_policy policy;

/* Metadata kept seperate from the rest of the container
 * because it is not used often. E.g. a task list, proc
 * entries, or a container name would be stored in this.
 */
struct rt_cont_data *data;
};

An rt_policy is a per-container scheduling policy. It consists of
@ops for selecting a task.

struct rt_policy {
       struct rt_policy_ops *ops;
       int type; /* GSN-EDF, PSN-EDF, PFAIR, MC, etc */
};

Implementations of rt_policy's will wrap the policy with necessary
state for decision making, like so:

struct edf_policy {
struct rt_policy policy;
 struct rt_domain domains;
};
container.policy = alloc_gedf_policy();

The rt_policy_ops correspond to the methods of our plugins today. They
can access per-policy state using magic container-of macros. They are
passed in an rt_container so that they can make decisions using that
container's @procs. My design of rt_policy's is pretty similar to how
we do litmus_lock's today and what is recommended in the
"Object-oriented design patterns in the kernel" series on LWN.

struct rt_policy_ops {
struct rt_task* (*schedule)(struct rt_container*, struct rt_task*);
 void (*block)(struct rt_container*, struct rt_task*);
void (*wake_up)(struct rt_container*, struct rt_task*);

/* Why not new and exit? Semantics. These are different from
 * a new task joining and leaving the system. Hypothetically,
 * servers could migrate between containers, but they are not
 * "exiting". I'm flexible naming, though.
 */
 void (*remove)(struct rt_container*, struct rt_task*);

 /* Why no admit_task(), but returning a long? Conceivably, the
 * admit_task() code could require access to synchronized
 * state. What if a task passes admit_task() in a container,
 * then the state changes so that it would fail, then the task
 * calls add()? Failure.  Dealing with this would require the
 * caller to acquire locks for our sake, which I don't want to
 * do. Checks are done in the add method instead.
 */
 long (*add)(struct rt_container*, struct rt_task*);
};

The rt_param struct is a bit toned down, what with most of the work
moving into rt_task's. It is still the real-time data for system
processes.

struct rt_param {
 /* The server provided by the task */
struct rt_task server;
 /* The USERSPACE view of the task */
struct rt_job job;

/* When there is no budget enforcement or slack stealing, the
 * task.server.job will match the task.server.job.
 * With budget enforcement and no slack stealing, the
 * server.job corresponds with the KERNELSPACE view of the
 * task, and the task.job corresponds with the USERSPACE view
 * of the task.
 * With slack stealing, the server might run other tasks
 * entirely.
 */

/* Where the task is actually running. This only makes sense
 * when referring to a process so is left in this struct.
 */
 unsigned int scheduled_on;
};

The rt_task struct becomes the main schedulable entity, and will be
passed around by rt_domains and rt_policy's.

struct rt_task {
 lt_t wcet, period, phase;

 struct rt_job job;
        struct rt_container *container;

/* To figure out how to schedule a task. When a plugin returns
 * an rt_task, the container framework will use this to choose
 * between returning a process or calling into a container.
 */
task_type_t type;

task_class_t     class;  /* BE, HRT, SRT, etc */
 budget_polity_t  policy; /* Budget enforcment */
unsigned int     cpu;    /* Where the task must run */

#ifdef CONFIG_LITMUS_LOCKING
/* Locking stuff, so that we can apply our locking protocols
 * within containers (unlikely to be used for a while)
 */
#endif
/* Collection membership */
};

typedef enum {
VIRTUAL, PROCESS
} task_type_t;

Finally, rt_job represents what it always has: the current job of a
task.

struct rt_job {
        lt_t    release;
        lt_t    deadline;
        lt_t    exec_time;
        unsigned int    job_no;

/* Where the task / process is logically running. This is in
 * rt_job (instead of rt_task) because, with slack stealing,
 * a process's server may be linked running one task, while
 * another server attempts to run this task. This makes a
 * seperate link for the process and its server necessary.
 */
unsigned int linked_on;
};

To put it all together, a scheduling decision works as follows:
1. Linux calls into Litmus's schedule method.
2. Litmus calls schedule on the base container.
3. The base container returns some rt_task.
4. If the rt_task corresponds to a task_struct, the task_struct is
returned. Otherwise...
5. The rt_task's container is found. Litmus calls schedule on that
container.
6. Repeat steps 4 and 5 until a task_struct or NULL is returned.

Some decisions will work in reverse.
1. Linux calls into Litmus's block method for some task.
2. Litmus calls the task's container's block method.
3. That block method does some work. If it sees that its container can
no longer run on one of its @procs, it calls its parent container's
block method, passing in the rt_task corresponding to that @proc.
4. Repeat steps 2 and 3 until their is either no parent container or
the called container does not have to block on a processor.

Obviously, with how much work is going to be put into this, comments
and criticisms are welcome. I also may have been less than clear on
some points and will gladly clarify them.

I also assume I haven't thought of lots of things and would appreciate
being called out on that.

-- 
Jonathan Herman
Department of Computer Science at UNC Chapel Hill
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.litmus-rt.org/pipermail/litmus-dev/attachments/20120226/49c29360/attachment.html>