[LITMUS^RT] high miss ratio for low task utilization

Glenn Elliott gelliott at cs.unc.edu
Fri Nov 21 00:01:36 CET 2014


> On Nov 20, 2014, at 4:01 PM, jp walters <mailinglistjp39 at gmail.com> wrote:
> 
> Hi,
> 
> I'm just getting started with Litmus-RT, and I'm hoping someone can shed some light on the results that I'm seeing.  
> 
> I'm running the Litmus 2014.2 patch against the 3.10.41 upstream Linux kernel.  I've built and installed the kernel according to the directions on the wiki.
> 
> I've tested this on two systems with the same results.  The first is the Ubuntu VM that's linked  on the wiki, mostly unmodified except for the installing the development tools and updated kernel/liblitmus.  I allow the VM 8GB RAM and 8 cores from a total of 128 physical cores on the machine.  The machine isn't loaded.  The second system is a non-virtualized 16 core machine, also not loaded.
> 
> In both cases, everything builds fine, but when I try to sanity-check my install using the experimental-scripts hosted on github, I'm finding that the simple 3-task system generates some unexpected results.  I realize that the experimental scripts on github aren't supported, but I'm hoping someone can comment on whether the results I'm getting seem reasonable.
> 
> Using the run_exps.py script, everything looks reasonable, but when parsing the results, I get the following:
> 
> root at ubuntu-qemu:~/experiment-scripts# ./parse_exps.py run-data/test1
> Loading experiments...
> Parsing data...
>  0.00%
> Writing csvs into parse-data...
> Too little data to make csv files, printing results.
> <ExpPoint-run-data/test1>
>            block-avg:  Avg:     0.000  Max:     0.000  Min:     0.000  Var:     0.000
>            block-max:  Avg:     0.000  Max:     0.000  Min:     0.000  Var:     0.000
>           miss-ratio:  Avg:     1.000  Max:     1.000  Min:     1.000  Var:     0.000
>          record-loss:  Avg:     0.000  Max:     0.000  Min:     0.000  Var:     0.000
>             tard-avg:  Avg:     0.000  Max:     0.000  Min:     0.000  Var:     0.000
>             tard-max:  Avg:     0.000  Max:     0.000  Min:     0.000  Var:     0.000
> 
> The 1.0 miss-ratio seems odd to me, given the resources available and the workload described in the 3 tasks example.  Am I missing something?
> 
> best,
> JP


Hi JP,

If you compiled Litmus with “CONFIG_SCHED_DEBUG_TRACE” enabled, then the experiment-scripts run.py should have created a file named “trace.slog”. Can you find such a file?  Take a look at the file and see if any error messages jump out at you.  It might give you a hint as to what is wrong.  If you can’t make heads or tails of the log file (which is completely understandable), tgz it up and post it to the mailing list (unless its tremendously big).

If I may inquire, what x86 system do you have that has 128 physical cores?  That’s pretty incredible!  I don’t believe that anyone has ever run Litmus on anything with more than 64 cores.  Come to think of it, some of liblitmus’s routines will break when P-EDF (and C-EDF with L1 clustering) is used on a system with more than 64 CPUs.  This is a limitation of the user-space, not the Litmus kernel.

Here are links to the broken user-space code:
https://github.com/LITMUS-RT/liblitmus/blob/master/src/migration.c#L105
https://github.com/LITMUS-RT/liblitmus/blob/master/src/migration.c#L127

The limitation is this: a 64-bit mask is used to report the mapping between CPU clusters (“domains”) and CPUs.  If you have more than 64 CPUs or more than 64 domains, then these routines will break.  You haven’t hit this limit yet since your VM is constrained to 8 cores.  I must say, I didn’t think Litmus users would hit the 64-CPU limit so soon.  Maybe it’s time to come up with a solution.  Björn, should we use a __uint128_t, a struct, or CPU_SET?  I prefer __uint128_t to keep things simple, but I admit that only buys us time.  We’d have to do something else if someone wanted to run Litmus on Xeon Phi (available today), which has 244 hardware threads.

-Glenn



More information about the litmus-dev mailing list