[LITMUS^RT] Job Quantity Issue

Fri Aug 28 11:00:35 CEST 2015

> On 28 Aug 2015, at 03:16, Geoffrey Tran <gtran at isi.edu> wrote:
> 
> I was hoping to please get assistance with the following problem. It
> is somewhat related to the previous messages at:
> https://lists.litmus-rt.org/pipermail/litmus-dev/2015/001107.html
> 
> I have written a simple application based off of base_task.c from
> liblitmus.  However, there is some strange behaviour.  First of 
> all, by the time the first job is run, the job number is around
> 4 or 5.  
> 
> The second problem is that the number of jobs that show up in
> the traces is non-deterministic by a large range.  Below I 
> show two outputs from st_job_stats, one where it behaves
> somewhat as expected, and another where it does not. According to
> the input parameters, there should be 10 jobs, with a WCET of
> 1ms, period of 100ms. However, the issue does show up at other
> parameters also. 

Hi Geoffrey,

let’s see if we can figure it out. A couple of questions:

1) Which version of LITMUS^RT is this?

2) Which plugin are you using? Do you have local modifications?

3) Which hardware platform? Native, para-virtualized Xen, or some full system emulator (e.g., QEMU)?

4) How do you determine when to shut down the task? Your pseudocode says “while job count != 0”, which means it shouldn’t terminate until your job counter wraps around?

5) Why do you expect exactly ten jobs in the traces? I’m not sure I understand your setup correctly. If you expect occasional budget overruns under precise enforcement, of course the number of “jobs” (= budget replenishments) is going to vary, depending on whether or not you overran a budget.

Say you have a budget of 10ms, a period of 100ms. For simplicity, let’s assume your task is the only real-time task in the system. Your task is invoked at time 0. Your task requires 11ms to complete the first invocation (i.e., the first iteration of the “job” loop). When done, your task calls sleep_next_period().  Under precise enforcement, the following is going to happen:

a) During [0, 10), the first 10ms of budget are going to be consumed.

b) Precise enforcement kicks in at 10ms, realizing a budget overrun. You get a “forced” job completion record in the sched_trace data and the task becomes ineligible for execution until its budget is replenished.

c) At time 100ms, (i.e, after the period has elapsed), the budget is replenished. This is recorded as a new job release in the sched_trace stream. The kernel has no idea what you consider to be a “job” in your application; from the point of view of the kernel, one budget allocation == one job.

d) At time 101ms, the task completes processing the first invocation and calls sleep_next_period().

e) The kernel processes the sleep_next_period() system call by discarding the rest of the current allocation (9ms in this case) and by marking the task as ineligible to execute until the next budget replenishment. This is recorded as a job completion recored with the “forced” field set to zero.

	https://github.com/LITMUS-RT/litmus-rt/blob/master/include/litmus/sched_trace.h#L55

f) At time 200ms, the budget is replenished and the task can process the second invocation. Note that the kernel’s notion of “job” and the task’s invocation count now disagree: due to precise budget enforcement, the task required two “jobs” (= budget allocations) to complete one invocation.

In other words, precise enforcement encapsulates each task in a server (in the sporadic server / CBS sense). The kernel tracks **server jobs** and has no insight into what userspace logically considers to be one invocation.

If your task has more work to do, and if it is using precise enforcement, it should not call sleep_next_period() unless you really, really want to discard your remaining budget. Note that you can also the conditional call wait_for_job_release(job_no) instead of the sleep_next_period():

	https://github.com/LITMUS-RT/liblitmus/blob/master/include/litmus.h#L232

For example:

	while (task should not exit):
		get_job_no(&cur_job_no)
		do_work()
		wait_for_job_release(cur_job_no + 1)

This will act like sleep_next_period() if you do NOT overrun your budget, but it does nothing (i.e., return immediately) if you had to tap into the next job’s budget already.

I hope this helps to explain what’s happening. Please let us know if this solves the problem.

Best regards,
Björn