[LITMUS^RT] Running LITMUS-RT on ARM64

Tue Aug 29 16:17:21 CEST 2017

On Mon, Aug 28, 2017 at 5:49 PM, Björn Brandenburg <bbb at mpi-sws.org> wrote:
>
>> On 28. Aug 2017, at 20:59, Meng Xu <xumengpanda at gmail.com> wrote:
>>
>> As to the clock_gettime(CLOCK_THREAD_CPUTIME, &ts) function, I agree
>> the workload in each job is fixed "if" the
>> clock_gettime(CLOCK_THREAD_CPUTIME, &ts) provides the "execution time"
>> of the thread.
>> However, after searching around the internet, it seems to me that this
>> clock_gettime() may provide an accurate measurement of the thread's
>> execution time.
>>
>> According to: https://linux.die.net/man/3/clock_gettime,
>> [Quote]
>> "The CLOCK_PROCESS_CPUTIME_ID and CLOCK_THREAD_CPUTIME_ID clocks are
>> realized on many platforms using timers from the CPUs (TSC on i386,
>> AR.ITC on Itanium). These registers may differ between CPUs and as a
>> consequence these clocks may return bogus results if a process is
>> migrated to another CPU."
>> [/Quote]
>
> The process clocks depend on the scheduler’s notion of “how long has this task had access to the CPU.” In LITMUS^RT, this is tracked here:
>
>         https://github.com/LITMUS-RT/litmus-rt/blob/linux-4.9-litmus/kernel/sched/litmus.c#L23
>
> Note that the ‘delta’ value is computed using Linux’s schedulers default clock ‘rq->clock':
>
>         https://github.com/LITMUS-RT/litmus-rt/blob/linux-4.9-litmus/kernel/sched/litmus.c#L17
>
> So LITMUS^RT tasks enjoy execution time tracking as accurately as any other Linux process.
>
>>
>> Since VCPUs are scheduled around CPUs, is it possible that the
>> clock_gettime provides an inaccurate value?
>
> Unlikely. To my understanding, Linux uses cycle counters on ARM64, and these cycle counters are synchronized across cores.
>
> I do not know how Linux’s time-tracking code interacts with Xen’s virtualization of the cycle counters, in either native or para-virtualized mode.
>
> In particular, what happens when a vCPU is preempted? To the guest kernel, the currently scheduled guest process “is using the vCPU” (even though it is not making progress since the vCPU itself is not scheduled), so most likely it will be charged for that execution time (i.e., the time that the vCPU was preempted will be reflected by the computed ‘delta’ value the next time the scheduler is invoked).
>
> As a result, rtspin would be getting confused regarding how much of its simulated workload is already complete. This would explain why the 55% workload did not become backlogged in the 40% VM, and also why the rtspin jobs sometimes appear to be “stretched” (they cover the gap when the vCPU was not receiving service).
>
> My conclusions:
>
> 1) rtspin works just fine on bare metal as intended. It was never designed for virtualized environments.
>
> 2) rtspin should not be used as a benchmarking tool under virtualization. You need a real workload, or at least something that carries out a known amount of work.
>
> 3) It’s not just rtspin that will be confused — likely all of LITMUS^RT’s budget tracking and enforcement code will not work as expected under virtualization (when given vCPUs with non-100% utilization). This is because the scheduler is not aware of any times during which a vCPU does not receive service, so tasks will be charged for bogus execution time.
>
> I’m not aware of any open-source real-time OS with non-trivial budget tracking and enforcement support that solves (3) correctly. Pointers appreciated.
>

I agree with your three conclusions. The budget accounting is broken
in virtualization environment.

But I think we can still run periodic real-time task in LITMUS^RT in
virtualization, if the following three conditions hold:
1) the job() in each task performs a known amount of work. For
example, if it takes 1ms to perform 2M addition operations, we can let
the job do e * 2M addition operations to create a periodic task with
WCET = e;
2) the termination condition of each job() is that the job finishes
the known amount of work, not the amount of time it thinks it runs;
3) in LITMUS^RT scheduler, the scheduler let a task overrun even when
the task has no budget. In other words, LITMUS^RT does not enforce the
budget;

The condition 1) and 2) can be satisfied by creating a real-time task
based on base_task.c;
The condition 3) is satisfied by default in LITMUS^RT (2015), IIRC.

---Besides what I said above,
I'm wondering if we should "fix" the budget tracking of OS in
virtualization environment?
Is it a known fact? (If it's designed on purpose by Linux people, what
is the design principle behind this?)
I may need to do some dig in this.
If you happen to know the answer, it will be great if you could share
some pointers.

Best regards,

Meng

-----------
Meng Xu
PhD Candidate in Computer and Information Science
University of Pennsylvania
http://www.cis.upenn.edu/~mengxu/