[LITMUS^RT] prop/many-cpu-migration

Björn Brandenburg bbb at mpi-sws.org
Tue Jun 3 09:29:02 CEST 2014


On 26 May 2014, at 10:00, Björn Brandenburg <bbb at mpi-sws.org> wrote:

> On 20 May 2014, at 03:27, Glenn Elliott <gelliott at cs.unc.edu> wrote:
>> 
>> Björn stumbled upon a bug in the code I added to map between CPUs and clusters/partitions (aka ‘domains’).  The problem arrises when the Linux kernel prints cpumasks larger than 32 bits to /proc/litmus/cpus/* and /proc/litmus/domains/*.  Namely, the kernel inserts ‘,’ every eight hex characters (32 bits).  I have posted a patch to prop/many-cpu-migration to parse these strings correctly.  Please review the code.  Also, could someone please try it out on a system with more than 32 CPUs (or better, more than 64 CPUs)?  Even though I compile with NR_CPUS = 4096, I can’t get cpumask_scnprintf() to print a data for more than 64 bits.
>> 
>> Note #1: libtlitmus code now supports 4096 bits (or CPUs). However, looking at the code in litmus_mapping_proc_show(), I believe the output to proc is 256 bits.
>> 
>> Note #2: This branch also includes a minor open() fix for DFLP.  The lock test won’t compile without the fix (GCC 4.8.1 on Ubuntu 13.10).
> 
> 
> Hi Glenn,
> 
> thanks a lot for the patches. I've merged them into staging.
> 
> - Björn


Hi Glenn,

unfortunately I’m seeing test failures that seem related to this patch. With the current liblitmus staging branch, I’m seeing the following:

> [root at litmus-rt ~]# setsched PSN-EDF
> [root at litmus-rt ~]# runtests
> ** LITMUS^RT test suite.
> ** Running tests for PSN-EDF.
> ** Testing: reject invalid rt_task pointers... ok.
> ** Testing: reject invalid rt_task values... ok.
> ** Testing: reject job control for non-rt tasks... ok.
> ** Testing: children of RT tasks are not automatically RT tasks... be_migrate_to_partition(): Invalid argument
> Warning: Could not initialize LITMUS^RT, be_migrate_to_partition() failed.
> 
> !! TEST FAILURE sporadic_partitioned(ms2ns(10), ms2ns(100), 0) -> -1, Invalid argument
>    at tests/core_api.c:124 (test_rt_fork_non_rt)
>    in task PID=1716
> ** Testing: tasks have write access to /dev/litmus/ctrl mappings... ok.
> ** Testing: admission control handles suspended tasks correctly... ok.
> ** Testing: admission control handles running tasks correctly... ok.
> ** Testing: reject invalid object descriptors... ok.
> ** Testing: reject invalid object types... ok.
> ** Testing: don't inherit FDSO handles across fork... be_migrate_to_partition(): Invalid argument
> Warning: Could not initialize LITMUS^RT, be_migrate_to_partition() failed.
> 
> !! TEST FAILURE sporadic_partitioned(ms2ns(10), ms2ns(100), 0) -> -1, Invalid argument
>    at tests/fdso.c:69 (test_not_inherit_od)
>    in task PID=1724
> be_migrate_to_partition(): Invalid argument
> Warning: Could not initialize LITMUS^RT, be_migrate_to_partition() failed.
> 
> !! TEST FAILURE sporadic_partitioned(ms2ns(10), ms2ns(100), 0) -> -1, Invalid argument
>    at tests/fdso.c:69 (test_not_inherit_od)
>    in task PID=1725
> ** Testing: don't let best-effort tasks lock FMLP semaphores... ok.
> ** Testing: don't let best-effort tasks open SRP semaphores... ok.
> ** Testing: SRP acquisition and release... be_migrate_to_partition(): Invalid argument
> Warning: Could not initialize LITMUS^RT, be_migrate_to_partition() failed.
> 
> !! TEST FAILURE sporadic_partitioned(ms2ns(10), ms2ns(100), 0) -> -1, Invalid argument
>    at tests/locks.c:57 (test_lock_srp)
>    in task PID=1728
> ** Testing: FMLP acquisition and release... be_migrate_to_partition(): Invalid argument
> Warning: Could not initialize LITMUS^RT, be_migrate_to_partition() failed.
> 
> !! TEST FAILURE sporadic_partitioned(ms2ns(10), ms2ns(100), 0) -> -1, Invalid argument
>    at tests/locks.c:89 (test_lock_fmlp)
>    in task PID=1729
> ** Testing: SRP task becomes non-RT task while holding lock... ok.
> ** Testing: FMLP task becomes non-RT task while holding lock... ok.
> ** Testing: SRP task exits while holding lock... ok.
> ** Testing: FMLP task exits while holding lock... ok.
> ** Testing: FMLP no nesting allowed... be_migrate_to_partition(): Invalid argument
> Warning: Could not initialize LITMUS^RT, be_migrate_to_partition() failed.
> 
> !! TEST FAILURE sporadic_partitioned(10, 100, 0) -> -1, Invalid argument
>    at tests/nesting.c:15 (test_lock_fmlp_nesting)
>    in task PID=1736
> ** Testing: FMLP no nesting with SRP resources allowed... be_migrate_to_partition(): Invalid argument
> Warning: Could not initialize LITMUS^RT, be_migrate_to_partition() failed.
> 
> !! TEST FAILURE sporadic_partitioned(10, 100, 0) -> -1, Invalid argument
>    at tests/nesting.c:50 (test_lock_fmlp_srp_nesting)
>    in task PID=1737
> ** Testing: SRP nesting allowed... be_migrate_to_partition(): Invalid argument
> Warning: Could not initialize LITMUS^RT, be_migrate_to_partition() failed.
> 
> !! TEST FAILURE sporadic_partitioned(10, 100, 0) -> -1, Invalid argument
>    at tests/nesting.c:85 (test_lock_srp_nesting)
>    in task PID=1738
> ** Testing: SRP ceiling blocking... ok.
> ** Testing: preempt lower-priority task when a higher-priority task resumes... ok.
> ** Result: 16 ok, 7 failed.


If I revert commit 5f2866d (Migration: Support systems with more than 32 CPUs, https://github.com/LITMUS-RT/liblitmus/commit/5f2866d43d9a2e33bc2961edf9966cad5708cc4d), then I get the following expected behavior:

> [root at litmus-rt ~]# setsched PSN-EDF
> [root at litmus-rt ~]# runtests
> ** LITMUS^RT test suite.
> ** Running tests for PSN-EDF.
> ** Testing: reject invalid rt_task pointers... ok.
> ** Testing: reject invalid rt_task values... ok.
> ** Testing: reject job control for non-rt tasks... ok.
> ** Testing: children of RT tasks are not automatically RT tasks... ok.
> ** Testing: tasks have write access to /dev/litmus/ctrl mappings... ok.
> ** Testing: admission control handles suspended tasks correctly... ok.
> ** Testing: admission control handles running tasks correctly... ok.
> ** Testing: reject invalid object descriptors... ok.
> ** Testing: reject invalid object types... ok.
> ** Testing: don't inherit FDSO handles across fork... ok.
> ** Testing: don't let best-effort tasks lock FMLP semaphores... ok.
> ** Testing: don't let best-effort tasks open SRP semaphores... ok.
> ** Testing: SRP acquisition and release... ok.
> ** Testing: FMLP acquisition and release... ok.
> ** Testing: SRP task becomes non-RT task while holding lock... ok.
> ** Testing: FMLP task becomes non-RT task while holding lock... ok.
> ** Testing: SRP task exits while holding lock... ok.
> ** Testing: FMLP task exits while holding lock... ok.
> ** Testing: FMLP no nesting allowed... ok.
> ** Testing: FMLP no nesting with SRP resources allowed... ok.
> ** Testing: SRP nesting allowed... ok.
> ** Testing: SRP ceiling blocking... ok.
> ** Testing: preempt lower-priority task when a higher-priority task resumes... ok.
> ** Result: 23 ok, 0 failed.


FYI, this is what the system looks like:

> [root at litmus-rt ~]# ls /proc/litmus/domains/
> 0  1  2  3
> [root at litmus-rt ~]# cat /proc/litmus/domains/*
> 000001
> 000002
> 000004
> 000008
> [root at litmus-rt ~]# ls /proc/litmus/cpus/
> 0  1  2  3
> [root at litmus-rt ~]# cat /proc/litmus/cpus/*
> 000001
> 000002
> 000004
> 000008

Could you please investigate and propose a fix?

Thanks,
Björn





More information about the litmus-dev mailing list