lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Tue, 20 Oct 2020 16:50:30 +0200 From: Vlastimil Babka <vbabka@...e.cz> To: Axel Rasmussen <axelrasmussen@...gle.com>, Steven Rostedt <rostedt@...dmis.org>, Ingo Molnar <mingo@...hat.com>, Andrew Morton <akpm@...ux-foundation.org>, Michel Lespinasse <walken@...gle.com>, Daniel Jordan <daniel.m.jordan@...cle.com>, Laurent Dufour <ldufour@...ux.ibm.com>, Jann Horn <jannh@...gle.com>, Chinwen Chang <chinwen.chang@...iatek.com> Cc: Yafang Shao <laoar.shao@...il.com>, linux-kernel@...r.kernel.org, linux-mm@...ck.org Subject: Re: [PATCH v3 2/2] mmap_lock: add tracepoints around lock acquisition On 10/10/20 12:05 AM, Axel Rasmussen wrote: > The goal of these tracepoints is to be able to debug lock contention > issues. This lock is acquired on most (all?) mmap / munmap / page fault > operations, so a multi-threaded process which does a lot of these can > experience significant contention. > > We trace just before we start acquisition, when the acquisition returns > (whether it succeeded or not), and when the lock is released (or > downgraded). The events are broken out by lock type (read / write). > > The events are also broken out by memcg path. For container-based > workloads, users often think of several processes in a memcg as a single > logical "task", so collecting statistics at this level is useful. > > The end goal is to get latency information. This isn't directly included > in the trace events. Instead, users are expected to compute the time > between "start locking" and "acquire returned", using e.g. synthetic > events or BPF. The benefit we get from this is simpler code. > > Because we use tracepoint_enabled() to decide whether or not to trace, > this patch has effectively no overhead unless tracepoints are enabled at > runtime. If tracepoints are enabled, there is a performance impact, but > how much depends on exactly what e.g. the BPF program does. > > Signed-off-by: Axel Rasmussen <axelrasmussen@...gle.com> Yeah I agree with this approach that follows the page ref one. ... > diff --git a/mm/mmap_lock.c b/mm/mmap_lock.c > new file mode 100644 > index 000000000000..b849287bd12a > --- /dev/null > +++ b/mm/mmap_lock.c > @@ -0,0 +1,87 @@ > +// SPDX-License-Identifier: GPL-2.0 > +#define CREATE_TRACE_POINTS > +#include <trace/events/mmap_lock.h> > + > +#include <linux/mm.h> > +#include <linux/cgroup.h> > +#include <linux/memcontrol.h> > +#include <linux/mmap_lock.h> > +#include <linux/percpu.h> > +#include <linux/smp.h> > +#include <linux/trace_events.h> > + > +/* > + * We have to export these, as drivers use mmap_lock, and our inline functions > + * in the header check if the tracepoint is enabled. They can't be GPL, as e.g. > + * the nvidia driver is an existing caller of this code. I don't think this argument works in the kernel community. I would just remove this comment. > + */ > +EXPORT_SYMBOL(__tracepoint_mmap_lock_start_locking); > +EXPORT_SYMBOL(__tracepoint_mmap_lock_acquire_returned); > +EXPORT_SYMBOL(__tracepoint_mmap_lock_released); You can use EXPORT_TRACEPOINT_SYMBOL() here. > +#ifdef CONFIG_MEMCG > + > +DEFINE_PER_CPU(char[MAX_FILTER_STR_VAL], trace_memcg_path); > + > +/* > + * Write the given mm_struct's memcg path to a percpu buffer, and return a > + * pointer to it. If the path cannot be determined, the buffer will contain the > + * empty string. > + * > + * Note: buffers are allocated per-cpu to avoid locking, so preemption must be > + * disabled by the caller before calling us, and re-enabled only after the > + * caller is done with the pointer. > + */ > +static const char *get_mm_memcg_path(struct mm_struct *mm) > +{ > + struct mem_cgroup *memcg = get_mem_cgroup_from_mm(mm); > + > + if (memcg != NULL && likely(memcg->css.cgroup != NULL)) { > + char *buf = this_cpu_ptr(trace_memcg_path); > + > + cgroup_path(memcg->css.cgroup, buf, MAX_FILTER_STR_VAL); > + return buf; > + } > + return ""; > +} > + > +#define TRACE_MMAP_LOCK_EVENT(type, mm, ...) \ > + do { \ > + if (trace_mmap_lock_##type##_enabled()) { \ Is this check really needed? We only got called from the functions inlined in the .h file because tracepoint_enabled() was true in the first place, so this seems redundant. > + get_cpu(); \ > + trace_mmap_lock_##type(mm, get_mm_memcg_path(mm), \ > + ##__VA_ARGS__); \ > + put_cpu(); \ > + } \ > + } while (0) > + > +#else /* !CONFIG_MEMCG */ > + > +#define TRACE_MMAP_LOCK_EVENT(type, mm, ...) \ > + trace_mmap_lock_##type(mm, "", ##__VA_ARGS__) > + > +#endif /* CONFIG_MEMCG */ > + > +/* > + * Trace calls must be in a separate file, as otherwise there's a circular > + * dependency between linux/mmap_lock.h and trace/events/mmap_lock.h. > + */ > + > +void __mmap_lock_do_trace_start_locking(struct mm_struct *mm, bool write) > +{ > + TRACE_MMAP_LOCK_EVENT(start_locking, mm, write, true); Seems wasteful to have an always-true success field here. Yeah, not reusing the same event class for all three tracepoints means more code, but for tracing efficiency it's worth it, IMHO. > +} > +EXPORT_SYMBOL(__mmap_lock_do_trace_start_locking); > + > +void __mmap_lock_do_trace_acquire_returned(struct mm_struct *mm, bool write, > + bool success) > +{ > + TRACE_MMAP_LOCK_EVENT(acquire_returned, mm, write, success); > +} > +EXPORT_SYMBOL(__mmap_lock_do_trace_acquire_returned); > + > +void __mmap_lock_do_trace_released(struct mm_struct *mm, bool write) > +{ > + TRACE_MMAP_LOCK_EVENT(released, mm, write, true); Ditto. > +} > +EXPORT_SYMBOL(__mmap_lock_do_trace_released); >
Powered by blists - more mailing lists