July 29, 2015
Related Material:
Additional Participants: Chris Mason, Christoph Lameter, Lai Jiangshan, and Peter Zijlstra.
People tagged: Jens Axboe, Jon Corbet, Mathieu Desnoyers, Paul E. McKenney, and Shaohua Li.
Andy Lutomirski
suggests a discussion of a light-weight mechanism permitting user-mode
code to implement per-CPU operations, calling out Paul Turner's patch,
Mathieu Desnoyers's patch, and his own approach of using %gs
on x86.
Chris Mason
said that his group has started experimenting with these patches and hopes
to have performance data from production workloads soonish, which
Christoph Lameter
applauded, and suggested might also be applied in-kernel.
Peter Zijlstra
replied that in-kernel experimentation need not wait on an API,
and argued that in-kernel use could rely on interrupt hooks instead
of scheduler hooks.
However, Peter suspects that forcing function calls for these operations
will eat up much of the potential performance gains.
Finally, Peter believes that %gs
prefixes will have
substantial performance advantages.
Christoph
responded that one could avoid function-call overhead by moving the
calling function into the special code region and that some of the
non-%gs
approaches might avoid the implicit memory barriers
that degrade performance of read-modify-write instructions on x86.
Andy
agreed that read-modify-write instructions can be slow, but that
cmpxchg
is pretty fast.
Andy also suggested per-CPU memory mappings as a self-described crazy idea.
Christoph
liked the per-CPU memory mappings, noting that this had been done on
Itanium, but that x86 would require a separate page table for each
CPU for each task.
Lai Jiangshan called out anohter disadvantage of a special code region, namely that all functions in that region must avoid invoking functions outside that region, however, he agrees that doing this simplifies scheduler hooks. Lai also notes that in-kernel application of these techniques could simplify NMI handlers.