[LITMUS^RT] Running LITMUS-RT on ARM64
Björn Brandenburg
bbb at mpi-sws.org
Thu Sep 21 12:08:57 CEST 2017
> On 18. Sep 2017, at 15:42, Andrii Anisov <andrii_anisov at epam.com> wrote:
>
>
>> When I ran the calibration loop and then plugged the obtained value for 1ms into -a, I saw calibration errors in excess of 50% when using -l. Is this expected?
> No, this is not as expected.
> Could you please check if you have loop() function not inlined in your rtspin binary. I.e. with `${CROSS_COMPILE}objdump -t rtspin | grep loop`.
Hi Andrii,
this is what I’m seeing; seems to be ok:
objdump -t rtspin | grep loop
0000000000402ee0 l F .text 0000000000000065 loop
0000000000403040 l F .text 0000000000000084 loop_once_with_mem
00000000004031b0 l F .text 000000000000012d loop_for
When running the configuration loop (on an Intel Xeon CPU E5-2450 @ 2.10GHz):
./rtspin -a0
Probe 4096 loops for 1 ms:
Probe 8192 loops for 1 ms:
Probe 16384 loops for 1 ms:
Probe 32768 loops for 1 ms:
Probe 65536 loops for 1 ms:
Probe 131072 loops for 1 ms:
98304 loops elapsed in 0.00078678131103515625 s
114688 loops elapsed in 0.00091290473937988281 s
122880 loops elapsed in 0.00098991394042968750 s
126976 loops elapsed in 0.00100708007812500000 s
124928 loops elapsed in 0.00100111961364746094 s
123904 loops elapsed in 0.00098490715026855469 s
124416 loops elapsed in 0.00100708007812500000 s
124160 loops elapsed in 0.00098299980163574219 s
124288 loops elapsed in 0.00098085403442382812 s
124352 loops elapsed in 0.00098705291748046875 s
124384 loops elapsed in 0.00092506408691406250 s
124400 loops elapsed in 0.00092005729675292969 s
124408 loops elapsed in 0.00092291831970214844 s
124412 loops elapsed in 0.00092983245849609375 s
124414 loops elapsed in 0.00091719627380371094 s
124415 loops elapsed in 0.00092101097106933594 s
In 1 ms 124415 loops.
Probe 4096 loops for 10 ms:
Probe 8192 loops for 10 ms:
Probe 16384 loops for 10 ms:
Probe 32768 loops for 10 ms:
Probe 65536 loops for 10 ms:
Probe 131072 loops for 10 ms:
Probe 262144 loops for 10 ms:
Probe 524288 loops for 10 ms:
Probe 1048576 loops for 10 ms:
Probe 2097152 loops for 10 ms:
1572864 loops elapsed in 0.00922393798828125000 s
1835008 loops elapsed in 0.01025319099426269531 s
1703936 loops elapsed in 0.00898504257202148438 s
1769472 loops elapsed in 0.00889706611633300781 s
1802240 loops elapsed in 0.00862717628479003906 s
1818624 loops elapsed in 0.00828123092651367188 s
1826816 loops elapsed in 0.00806307792663574219 s
1830912 loops elapsed in 0.00781297683715820312 s
1832960 loops elapsed in 0.00758409500122070312 s
1833984 loops elapsed in 0.00734114646911621094 s
1834496 loops elapsed in 0.00710415840148925781 s
1834752 loops elapsed in 0.00692510604858398438 s
1834880 loops elapsed in 0.00677204132080078125 s
1834944 loops elapsed in 0.00649595260620117188 s
1834976 loops elapsed in 0.00631403923034667969 s
1834992 loops elapsed in 0.00604510307312011719 s
1835000 loops elapsed in 0.00607514381408691406 s
1835004 loops elapsed in 0.00609707832336425781 s
1835006 loops elapsed in 0.00610995292663574219 s
1835007 loops elapsed in 0.00610494613647460938 s
In 10 ms 1835007 loops.
Probe 4096 loops for 100 ms:
Probe 8192 loops for 100 ms:
Probe 16384 loops for 100 ms:
Probe 32768 loops for 100 ms:
Probe 65536 loops for 100 ms:
Probe 131072 loops for 100 ms:
Probe 262144 loops for 100 ms:
Probe 524288 loops for 100 ms:
Probe 1048576 loops for 100 ms:
Probe 2097152 loops for 100 ms:
Probe 4194304 loops for 100 ms:
Probe 8388608 loops for 100 ms:
Probe 16777216 loops for 100 ms:
Probe 33554432 loops for 100 ms:
25165824 loops elapsed in 0.08332800865173339844 s
29360128 loops elapsed in 0.09707999229431152344 s
31457280 loops elapsed in 0.10405802726745605469 s
30408704 loops elapsed in 0.10079097747802734375 s
29884416 loops elapsed in 0.09911704063415527344 s
30146560 loops elapsed in 0.09944319725036621094 s
30277632 loops elapsed in 0.10028100013732910156 s
30212096 loops elapsed in 0.09977984428405761719 s
30244864 loops elapsed in 0.10025095939636230469 s
30228480 loops elapsed in 0.09993004798889160156 s
30236672 loops elapsed in 0.10005903244018554688 s
30232576 loops elapsed in 0.10019898414611816406 s
30230528 loops elapsed in 0.10017704963684082031 s
30229504 loops elapsed in 0.10037112236022949219 s
30228992 loops elapsed in 0.10012698173522949219 s
30228736 loops elapsed in 0.09994411468505859375 s
30228864 loops elapsed in 0.09987020492553710938 s
30228928 loops elapsed in 0.10010409355163574219 s
30228896 loops elapsed in 0.10033512115478515625 s
30228880 loops elapsed in 0.09997582435607910156 s
30228888 loops elapsed in 0.09982991218566894531 s
30228892 loops elapsed in 0.09987282752990722656 s
30228894 loops elapsed in 0.10009694099426269531 s
30228893 loops elapsed in 0.10006999969482421875 s
In 100 ms 30228892 loops.
Probe 4096 loops for 1000 ms:
Probe 8192 loops for 1000 ms:
Probe 16384 loops for 1000 ms:
Probe 32768 loops for 1000 ms:
Probe 65536 loops for 1000 ms:
Probe 131072 loops for 1000 ms:
Probe 262144 loops for 1000 ms:
Probe 524288 loops for 1000 ms:
Probe 1048576 loops for 1000 ms:
Probe 2097152 loops for 1000 ms:
Probe 4194304 loops for 1000 ms:
Probe 8388608 loops for 1000 ms:
Probe 16777216 loops for 1000 ms:
Probe 33554432 loops for 1000 ms:
Probe 67108864 loops for 1000 ms:
Probe 134217728 loops for 1000 ms:
Probe 268435456 loops for 1000 ms:
Probe 536870912 loops for 1000 ms:
402653184 loops elapsed in 1.33262491226196289062 s
335544320 loops elapsed in 1.11118102073669433594 s
301989888 loops elapsed in 1.00045084953308105469 s
285212672 loops elapsed in 0.94368410110473632812 s
293601280 loops elapsed in 0.97163510322570800781 s
297795584 loops elapsed in 0.98513817787170410156 s
299892736 loops elapsed in 0.99275398254394531250 s
300941312 loops elapsed in 0.99712610244750976562 s
301465600 loops elapsed in 0.99880599975585937500 s
301727744 loops elapsed in 0.99890494346618652344 s
301858816 loops elapsed in 0.99928998947143554688 s
301924352 loops elapsed in 0.99973702430725097656 s
301957120 loops elapsed in 0.99970698356628417969 s
301973504 loops elapsed in 1.00021719932556152344 s
301965312 loops elapsed in 1.00005698204040527344 s
301961216 loops elapsed in 0.99930715560913085938 s
301963264 loops elapsed in 0.99909400939941406250 s
301964288 loops elapsed in 0.99878096580505371094 s
301964800 loops elapsed in 0.99935603141784667969 s
301965056 loops elapsed in 1.00014495849609375000 s
301964928 loops elapsed in 0.99921607971191406250 s
301964992 loops elapsed in 1.00018692016601562500 s
301964960 loops elapsed in 0.99950981140136718750 s
301964976 loops elapsed in 0.99894189834594726562 s
301964984 loops elapsed in 0.99995088577270507812 s
301964988 loops elapsed in 0.99937891960144042969 s
301964990 loops elapsed in 0.99993705749511718750 s
301964991 loops elapsed in 1.00036287307739257812 s
In 1 s 301964990 loops.
So I pick 124415 (the 1ms result) and get this:
./rtspin -a 124415 -l
Evaluating loop with 124415 cycles:
0.5000s: looped for 0.27058574s, delta=-0.22941426s, error=-45.8829%
0.4900s: looped for 0.20155855s, delta=-0.28844145s, error=-58.8656%
0.4800s: looped for 0.19776059s, delta=-0.28223941s, error=-58.7999%
0.4700s: looped for 0.19392000s, delta=-0.27608000s, error=-58.7404%
0.4600s: looped for 0.18951456s, delta=-0.27048544s, error=-58.8012%
0.4500s: looped for 0.18549689s, delta=-0.26450311s, error=-58.7785%
0.4400s: looped for 0.18138122s, delta=-0.25861878s, error=-58.7770%
0.4300s: looped for 0.17711085s, delta=-0.25288915s, error=-58.8114%
0.4200s: looped for 0.17304478s, delta=-0.24695522s, error=-58.7989%
0.4100s: looped for 0.16858504s, delta=-0.24141496s, error=-58.8817%
0.4000s: looped for 0.16444956s, delta=-0.23555044s, error=-58.8876%
0.3900s: looped for 0.16047009s, delta=-0.22952991s, error=-58.8538%
0.3800s: looped for 0.15612403s, delta=-0.22387597s, error=-58.9147%
0.3700s: looped for 0.15227059s, delta=-0.21772941s, error=-58.8458%
0.3600s: looped for 0.14818439s, delta=-0.21181561s, error=-58.8377%
0.3500s: looped for 0.14409752s, delta=-0.20590248s, error=-58.8293%
0.3400s: looped for 0.13996957s, delta=-0.20003043s, error=-58.8325%
0.3300s: looped for 0.13590484s, delta=-0.19409516s, error=-58.8167%
0.3200s: looped for 0.13206942s, delta=-0.18793058s, error=-58.7283%
0.3100s: looped for 0.12762452s, delta=-0.18237548s, error=-58.8308%
0.3000s: looped for 0.12348666s, delta=-0.17651334s, error=-58.8378%
0.2900s: looped for 0.11929376s, delta=-0.17070624s, error=-58.8642%
0.2800s: looped for 0.11530717s, delta=-0.16469283s, error=-58.8189%
0.2700s: looped for 0.11088594s, delta=-0.15911406s, error=-58.9311%
0.2600s: looped for 0.10693564s, delta=-0.15306436s, error=-58.8709%
0.2500s: looped for 0.10303040s, delta=-0.14696960s, error=-58.7878%
0.2400s: looped for 0.09873869s, delta=-0.14126131s, error=-58.8589%
0.2300s: looped for 0.09481494s, delta=-0.13518506s, error=-58.7761%
0.2200s: looped for 0.09065027s, delta=-0.12934973s, error=-58.7953%
0.2100s: looped for 0.08653805s, delta=-0.12346195s, error=-58.7914%
0.2000s: looped for 0.08241614s, delta=-0.11758386s, error=-58.7919%
0.1900s: looped for 0.07820506s, delta=-0.11179494s, error=-58.8394%
0.1800s: looped for 0.07410046s, delta=-0.10589954s, error=-58.8331%
0.1700s: looped for 0.07016598s, delta=-0.09983402s, error=-58.7259%
0.1600s: looped for 0.06586765s, delta=-0.09413235s, error=-58.8327%
0.1500s: looped for 0.06183796s, delta=-0.08816204s, error=-58.7747%
0.1400s: looped for 0.05761669s, delta=-0.08238331s, error=-58.8452%
0.1300s: looped for 0.05362693s, delta=-0.07637307s, error=-58.7485%
0.1200s: looped for 0.04958293s, delta=-0.07041707s, error=-58.6809%
0.1100s: looped for 0.04530530s, delta=-0.06469470s, error=-58.8134%
0.1000s: looped for 0.04111829s, delta=-0.05888171s, error=-58.8817%
0.0900s: looped for 0.03695448s, delta=-0.05304552s, error=-58.9395%
0.0800s: looped for 0.03312724s, delta=-0.04687276s, error=-58.5909%
0.0700s: looped for 0.02886074s, delta=-0.04113926s, error=-58.7704%
0.0600s: looped for 0.02479697s, delta=-0.03520303s, error=-58.6717%
0.0500s: looped for 0.02059629s, delta=-0.02940371s, error=-58.8074%
0.0400s: looped for 0.01654219s, delta=-0.02345781s, error=-58.6445%
0.0300s: looped for 0.01239347s, delta=-0.01760653s, error=-58.6884%
0.0200s: looped for 0.00826822s, delta=-0.01173178s, error=-58.6589%
That doesn’t look right. What should I be doing differently?
Just tried this on another machine, an Intel Core i5-4590 CPU @ 3.30GHz. Here I’m getting "In 1 ms 441849 loops.”, which results in an error between 2% and 3%, which is much better, but still not that great. For comparison, the old cputime()/TSC-based loop usually achieves an error of less than 0.002%.
Thanks,
Björn
More information about the litmus-dev
mailing list