[LITMUS^RT] Feather-Trace scalability & TSC calibration patches
Björn Brandenburg
bbb at mpi-sws.org
Tue Jan 14 10:56:21 CET 2014
Hi everyone,
here's another set of patches extracted from our RTAS'14 branch. When tracing LITMUS^RT on a 64-core platform, we ran into two issues:
1) Feather-Trace's global timestamp buffer became a major scalability bottleneck, and
2) non-synchronized TSCs with a constant offset.
(1) is a serious problem because it distorts the measurements (i.e, the overhead Feather-Trace itself becomes much larger than the overhead that it is supposed to measure). (2) is not a problem for most measurements (CPU-local measurements are not affected by cross-CPU skew), but measuring IPI latencies becomes difficult if TSCs do not share a common time zero.
The following patches address (1) by changing Feather-Trace to record all timestamps into processor-local trace buffers, and provide a workaround for (2) by adding some benchmarking code that determines the offset between any two cores, which can then be used to patch up measurements to refer to a common time base.
https://github.com/LITMUS-RT/litmus-rt/commits/wip-ft-pcpu
The corresponding userspace patches can be found here:
https://github.com/LITMUS-RT/feather-trace-tools/commits/wip-ft-pcpu
This of course breaks userspace scripts in all sorts of ways because /litmus/dev/ft_trace0 goes away, new devices appear, etc. Nonetheless, the pain is worth it I believe because otherwise we won't be able to derive meaningful measurements on large multicore platforms. I'd appreciate help with testing and feedback.
Thanks,
Björn
More information about the litmus-dev
mailing list