[LITMUS^RT] Feather-Trace scalability & TSC calibration patches
Björn Brandenburg
bbb at mpi-sws.org
Wed Mar 19 18:29:46 CET 2014
On 05 Feb 2014, at 14:33, Glenn Elliott <gelliott at cs.unc.edu> wrote:
>
> On Feb 5, 2014, at 3:02 AM, Björn Brandenburg <bbb at mpi-sws.org> wrote:
>
>>
>> On 14 Jan 2014, at 10:56, Björn Brandenburg <bbb at mpi-sws.org> wrote:
>>
>>> Hi everyone,
>>>
>>> here's another set of patches extracted from our RTAS'14 branch. When tracing LITMUS^RT on a 64-core platform, we ran into two issues:
>>>
>>> 1) Feather-Trace's global timestamp buffer became a major scalability bottleneck, and
>>>
>>> 2) non-synchronized TSCs with a constant offset.
>>>
>>> (1) is a serious problem because it distorts the measurements (i.e, the overhead Feather-Trace itself becomes much larger than the overhead that it is supposed to measure). (2) is not a problem for most measurements (CPU-local measurements are not affected by cross-CPU skew), but measuring IPI latencies becomes difficult if TSCs do not share a common time zero.
>>>
>>> The following patches address (1) by changing Feather-Trace to record all timestamps into processor-local trace buffers, and provide a workaround for (2) by adding some benchmarking code that determines the offset between any two cores, which can then be used to patch up measurements to refer to a common time base.
>>>
>>> https://github.com/LITMUS-RT/litmus-rt/commits/wip-ft-pcpu
>>>
>>> The corresponding userspace patches can be found here:
>>>
>>> https://github.com/LITMUS-RT/feather-trace-tools/commits/wip-ft-pcpu
>>>
>>> This of course breaks userspace scripts in all sorts of ways because /litmus/dev/ft_trace0 goes away, new devices appear, etc. Nonetheless, the pain is worth it I believe because otherwise we won't be able to derive meaningful measurements on large multicore platforms. I'd appreciate help with testing and feedback.
>>
>> Now that we are all out of ECRTS mode, are there any objections to or comments on merging this?
>>
>> Thanks,
>> Björn
>
> I have no objections. We’ll have to update Jonathan’s experiment-scripts code though. UNC can take care of this. (The patch might lag by a week or two though.)
FYI: I've merged the patches into staging.
- Björn
More information about the litmus-dev
mailing list