[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YjNwq+YrUULI/3QC@ip-172-31-19-208.ap-northeast-1.compute.internal>
Date: Thu, 17 Mar 2022 17:32:27 +0000
From: Hyeonggon Yoo <42.hyeyoo@...il.com>
To: Namhyung Kim <namhyung@...nel.org>
Cc: Peter Zijlstra <peterz@...radead.org>,
Ingo Molnar <mingo@...nel.org>, Will Deacon <will@...nel.org>,
Waiman Long <longman@...hat.com>,
Boqun Feng <boqun.feng@...il.com>,
LKML <linux-kernel@...r.kernel.org>,
Thomas Gleixner <tglx@...utronix.de>,
Steven Rostedt <rostedt@...dmis.org>,
Byungchul Park <byungchul.park@....com>,
"Paul E. McKenney" <paulmck@...nel.org>,
Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
Arnd Bergmann <arnd@...db.de>,
Radoslaw Burny <rburny@...gle.com>, linux-arch@...r.kernel.org,
bpf@...r.kernel.org
Subject: Re: [PATCH 0/2] locking: Add new lock contention tracepoints (v3)
On Wed, Mar 16, 2022 at 03:45:46PM -0700, Namhyung Kim wrote:
> Hello,
>
> There have been some requests for low-overhead kernel lock contention
> monitoring. The kernel has CONFIG_LOCK_STAT to provide such an infra
> either via /proc/lock_stat or tracepoints directly.
>
> However it's not light-weight and hard to be used in production. So
> I'm trying to add new tracepoints for lock contention and using them
> as a base to build a new monitoring system.
Hello Namhyung,
I like this series so much.
Lock contentions became much more visible without serious overhead.
For the series:
Tested-by: Hyeonggon Yoo <42.hyeyoo@...il.com>
How would you use these tracepoints, is there a script you use?
For testing, I just wrote a simple bpftrace script:
$ sudo bpftrace -e 'BEGIN
{
printf("Collecting lockstats... Hit Ctrl-C to end.\n");
}
tracepoint:lock:contention_begin
{
@start_us[tid] = nsecs / 1000;
}
tracepoint:lock:contention_end
{
if (args->ret == 0) {
@stats[kstack] = hist(nsecs / 1000 - @start_us[tid]);
}
}
END
{
clear(@start_us);
}'
And it shows its distribution of slowpath wait time like below. Great.
@stats[
__traceiter_contention_end+76
__traceiter_contention_end+76
queued_spin_lock_slowpath+556
_raw_spin_lock+108
rmqueue_bulk+80
rmqueue+1060
get_page_from_freelist+372
__alloc_pages+208
alloc_pages_vma+160
alloc_zeroed_user_highpage_movable+72
do_anonymous_page+92
handle_pte_fault+320
__handle_mm_fault+252
handle_mm_fault+244
do_page_fault+340
do_translation_fault+100
do_mem_abort+76
el0_da+60
el0t_64_sync_handler+232
el0t_64_sync+420
]:
[2, 4) 1 |@ |
[4, 8) 30 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ |
[8, 16) 25 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ |
[16, 32) 33 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[32, 64) 29 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ |
[64, 128) 13 |@@@@@@@@@@@@@@@@@@@@ |
[128, 256) 2 |@@@ |
@stats[
__traceiter_contention_end+76
__traceiter_contention_end+76
rwsem_down_write_slowpath+1216
down_write_killable+100
do_mprotect_pkey.constprop.0+176
__arm64_sys_mprotect+40
invoke_syscall+80
el0_svc_common.constprop.0+76
do_el0_svc+52
el0_svc+48
el0t_64_sync_handler+164
el0t_64_sync+420
]:
[1] 13 |@@@@@@@@@@@ |
[2, 4) 42 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ |
[4, 8) 5 |@@@@ |
[8, 16) 10 |@@@@@@@@ |
[16, 32) 60 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[32, 64) 41 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ |
[64, 128) 40 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ |
[128, 256) 14 |@@@@@@@@@@@@ |
[256, 512) 7 |@@@@@@ |
[512, 1K) 6 |@@@@@ |
[1K, 2K) 2 |@ |
[2K, 4K) 1 | |
@stats[
__traceiter_contention_end+76
__traceiter_contention_end+76
queued_spin_lock_slowpath+556
_raw_spin_lock+108
futex_wake+168
do_futex+200
__arm64_sys_futex+112
invoke_syscall+80
el0_svc_common.constprop.0+76
do_el0_svc+52
el0_svc+48
el0t_64_sync_handler+164
el0t_64_sync+420
]:
[0] 3 | |
[1] 2515 |@ |
[2, 4) 8747 |@@@@@ |
[4, 8) 17052 |@@@@@@@@@@ |
[8, 16) 46706 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ |
[16, 32) 82105 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[32, 64) 46918 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ |
[64, 128) 99 | |
[128, 256) 1 | |
[256, 512) 8 | |
[512, 1K) 0 | |
[1K, 2K) 0 | |
[2K, 4K) 0 | |
[4K, 8K) 0 | |
[8K, 16K) 0 | |
[16K, 32K) 5 | |
[32K, 64K) 12 | |
[64K, 128K) 34 | |
[128K, 256K) 68 | |
[256K, 512K) 7 | |
> * Changes in v3
> - move the tracepoints deeper in the slow path
> - remove the caller ip
> - don't use task state in the flags
>
> * Changes in v2
> - do not use lockdep infrastructure
> - add flags argument to lock:contention_begin tracepoint
>
> I added a flags argument in the contention_begin to classify locks in
> question. It can tell whether it's a spinlock, reader-writer lock or
> a mutex. With stacktrace, users can identify which lock is contended.
>
> The patch 01 added the tracepoints and move the definition to the
> mutex.c file so that it can see the tracepoints without lockdep.
>
> The patch 02 actually installs the tracepoints in the locking code.
> To minimize the overhead, they were added in the slow path of the code
> separately. As spinlocks are defined in the arch headers, I couldn't
> handle them all. I've just added it to generic queued spinlock and
> rwlocks only. Each arch can add the tracepoints later.
>
> This series base on the current tip/locking/core and you get it from
> 'locking/tracepoint-v3' branch in my tree at:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/namhyung/linux-perf.git
>
>
> Thanks,
> Namhyung
>
>
> Namhyung Kim (2):
> locking: Add lock contention tracepoints
> locking: Apply contention tracepoints in the slow path
>
> include/trace/events/lock.h | 54 +++++++++++++++++++++++++++++++++--
> kernel/locking/lockdep.c | 1 -
> kernel/locking/mutex.c | 6 ++++
> kernel/locking/percpu-rwsem.c | 3 ++
> kernel/locking/qrwlock.c | 9 ++++++
> kernel/locking/qspinlock.c | 5 ++++
> kernel/locking/rtmutex.c | 11 +++++++
> kernel/locking/rwbase_rt.c | 3 ++
> kernel/locking/rwsem.c | 9 ++++++
> kernel/locking/semaphore.c | 14 ++++++++-
> 10 files changed, 110 insertions(+), 5 deletions(-)
>
>
> base-commit: cd27ccfc727e99352321c0c75012ab9c5a90321e
> --
> 2.35.1.894.gb6a874cedc-goog
>
--
Thank you, You are awesome!
Hyeonggon :-)
Powered by blists - more mailing lists