[LITMUS^RT] RT-litmus behaviour under real-workloads

Ashraf E. Suyyagh mrsuyyagh at gmail.com
Tue Feb 20 05:46:39 CET 2018


Hi,

Thanks for your help. I would like to point out the following to finalize
my observations and be of help to the community who will read litmus
digest, especially those who might be forced to run litmus on single board
computers with older kernels, requiring older versions of litmus.

1. Despite what I was led to initially believe that there are significant
changes between the 2014.2 tracing code formats and the recent ones, it
turns out that it all goes down to adding the ACET in the completion struct
in newer versions. This explains why using st_job_stats on the 2014.2
traces yielded garbage data.

(latest versions)
struct st_completion_data { /* A job completed. */

u64 when;
u64 forced:1;

/* Set to 1 if job overran and kernel advanced to the

​​
* next task automatically; set to 0 otherwise.
*/

u64 exec_time:63;

​
/* Actual execution time of job. */

};

(2014.2 version)
struct st_completion_data { /* A job completed. */
u64 when;
u8 forced:1; /* Set to 1 if job overran and kernel advanced to the
* next task automatically; set to 0 otherwise.*/
u8 __uflags:7;
u8 __unused[7];
};

2. As for the wrong period reported when it becomes too large, after
investigating, this problem is expected in *all* versions *including the
recent ones*. This has to do with the *struct st_param_data *because in
there, the period and WCET are defined as *u32 *type. Since all timing is
in nanoseconds, all execution times and periods exceeding the *int* type
capacity will be wrongly represented. However, since internally litmus_rt
WCET and periods are of type lt_t (long long), my expectation is that
internally Litmus will work correctly. To correct the issue, a complete
rework and resizing of the event record struct and size is needed if one
wishes to see a correct large period/WCET displayed in st-job-stats. This
might not be a priority, but worth noting.

3. I have found the old unit trace tool which was used to parse the 2014.2
traces and older ones. Comparing the codes, I realized that the most
changes between 2014.2 and recent versions have nothing to do with the
traces, formats, and records except for the ACET field. All work in recent
changes was mostly related to writing better parsers, drawing tools, job
statistics .. etc. Yet, *st* tracing remains essentially unchanged for the
most part. So basically using the 2016.1 ft_tools version with 2014.2
litmus works fine as long as one discards the ACET field data.

4. Now the only puzzling issue remaining is for five benchmarks out of 45.
If the benchmark is launched using a litmus_rt wrapper which calls it as a
periodic task, the execution time is stretched for no apparent reason. For
example, one benchmark which has a WCET of 50ms (non-rt executable) will
have 1500ms exec time when launched as a real-time task. I opted to do
without these five benchmarks in my final set. Some of these troublesome
benchmarks were CPU bound too.

5. Thanks for the help. I hope that at one point, whenever time
permits, you could document more of the litmus files. Plus, some sort of
how the files are related to each other could be useful. It would help the
community in understanding and using Litmus RT.  Using cscope to navigate
the code is recommended for new users.  Great job though on the work done.

regards,

On 14 February 2018 at 22:47, Ashraf E. Suyyagh <mrsuyyagh at gmail.com> wrote:

> Hi,
>
> First of all, my apologies for the long post, but needed to cover the
> setup and results in details. I am forced to use litmus 2014.2 on a 3.10
> kernel.
>
> I have a set of 45 embedded benchmarks. I have written a litmus RT main
> file (wrapper) for each benchmark which calls it as a periodic litmus RT
> job for a duration T. The tasks are compiled and linked against the litmus
> RT API and libraries. The attached file (main.c) is for benchmarks with
> arguments.
>
> The running platform is an ARM Odroid-XU4 board with Exynos 5422 Octa-core
> (ARM big little A15 and A7 cores).
>
> In a previously published research project we did, the WCET of each
> benchmark was estimated on the same platform for each type of the cores, so
> I have two sets of WCETs for each A15 and A7 core types. This is a safe
> WCET estimate. Execution times never exceeded this WCET. We choose the WCET
> based on which type of core the task is run at.
>
> I am using partitioned EDF (each core is a partition - L1 cache based). I
> have estimated the overhead of the wrapper and scheduler to be 0.36ms on
> the A15 core and 0.5ms on the A7 core. *Those overheads are factored into
> the final WCET when running the litmus task.*
>
> For now, *I am running each RT benchmark one at a time
> with synchronous release and forced completion.*
>
> A sample command is running the bitcnts benchmark, *the period is four
> times the WCET. Therefore, the implicit deadline is far away.*
> sudo ./bitcnts -p 1 -w 1502 6010 30 -- 1125000  &
>
> I have run each of the benchmarks and they do execute periodically. So no
> problem in launching the benchmarks and I verified that my tasks are
> launching fine with correct results/outputs. However, the problems are the
> follows:
>
>    1. When running *st-trace-schedule* and collecting results by *st-job
>    stats* for one task, ACET values are odd. In many cases, they are 0,
>    in others extremely high values which makes no sense (e.g. 3677604500000)
>    (see file result_01
>    2. This issue is a frequent case with most benchmarks, I will list one
>    as an example. In a task with a WCET of 107ms and a period of 500ms running
>    for 30 seconds, the results are odd. The ACET shows as before, 0. And there
>    are lots of forced jobs in a pattern of 1,1,0,1,1,0 ... etc or 0,1,0,10,1
>    ...etc. Why would the task be forced? The execution time is less than the
>    WCET and the deadline is four times the WCET. There is no way I can imagine
>    that the job would exceed its deadline. (see file result_02). Do note, in a
>    previous project we have run those benchmarks tens of thousands of times
>    each and we have a solid idea of how long they execute on average.
>    3. Is the period capped? I have a task which is run as " sudo ./bitcnts
>    -p 1 -w 1502 6010 30 -- 1125000  &  ". However, the trace shows a period of
>    1715 instead of 6010
>
> # Task,   Job,     Period,   Response, DL Miss?,   Lateness,  Tardiness,
> Forced?,       ACET,  Preemptions,   Migrations
>
> # task NAME=bitcnts PID=4052 COST=1502000000 PERIOD=1715032704 CPU=1
>
>   4052,     2, 1715032704, 1443401363,        0, -4566598637,          0,
>      0,          0,            0,            0
>
>   4052,     3, 1715032704, 1446788641,        0, -4563211359,          0,
>      0,          0,            0,            0
>
> ​Thank you​
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.litmus-rt.org/pipermail/litmus-dev/attachments/20180219/c00dc810/attachment.html>


More information about the litmus-dev mailing list