lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YjNwq+YrUULI/3QC@ip-172-31-19-208.ap-northeast-1.compute.internal>
Date:   Thu, 17 Mar 2022 17:32:27 +0000
From:   Hyeonggon Yoo <42.hyeyoo@...il.com>
To:     Namhyung Kim <namhyung@...nel.org>
Cc:     Peter Zijlstra <peterz@...radead.org>,
        Ingo Molnar <mingo@...nel.org>, Will Deacon <will@...nel.org>,
        Waiman Long <longman@...hat.com>,
        Boqun Feng <boqun.feng@...il.com>,
        LKML <linux-kernel@...r.kernel.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Steven Rostedt <rostedt@...dmis.org>,
        Byungchul Park <byungchul.park@....com>,
        "Paul E. McKenney" <paulmck@...nel.org>,
        Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
        Arnd Bergmann <arnd@...db.de>,
        Radoslaw Burny <rburny@...gle.com>, linux-arch@...r.kernel.org,
        bpf@...r.kernel.org
Subject: Re: [PATCH 0/2] locking: Add new lock contention tracepoints (v3)

On Wed, Mar 16, 2022 at 03:45:46PM -0700, Namhyung Kim wrote:
> Hello,
> 
> There have been some requests for low-overhead kernel lock contention
> monitoring.  The kernel has CONFIG_LOCK_STAT to provide such an infra
> either via /proc/lock_stat or tracepoints directly.
> 
> However it's not light-weight and hard to be used in production.  So
> I'm trying to add new tracepoints for lock contention and using them
> as a base to build a new monitoring system.

Hello Namhyung,
I like this series so much.
Lock contentions became much more visible without serious overhead.

For the series:
Tested-by: Hyeonggon Yoo <42.hyeyoo@...il.com>

How would you use these tracepoints, is there a script you use?
For testing, I just wrote a simple bpftrace script:

$ sudo bpftrace -e 'BEGIN
{
	printf("Collecting lockstats... Hit Ctrl-C to end.\n");
}

tracepoint:lock:contention_begin
{
	@start_us[tid] = nsecs / 1000;
}

tracepoint:lock:contention_end
{
	if (args->ret == 0) {
		@stats[kstack] = hist(nsecs / 1000 - @start_us[tid]);
	}
}

END
{
	clear(@start_us);
}'

And it shows its distribution of slowpath wait time like below. Great.

@stats[
    __traceiter_contention_end+76
    __traceiter_contention_end+76
    queued_spin_lock_slowpath+556
    _raw_spin_lock+108
    rmqueue_bulk+80
    rmqueue+1060
    get_page_from_freelist+372
    __alloc_pages+208
    alloc_pages_vma+160
    alloc_zeroed_user_highpage_movable+72
    do_anonymous_page+92
    handle_pte_fault+320
    __handle_mm_fault+252
    handle_mm_fault+244
    do_page_fault+340
    do_translation_fault+100
    do_mem_abort+76
    el0_da+60
    el0t_64_sync_handler+232
    el0t_64_sync+420
]:
[2, 4)                 1 |@                                                   |
[4, 8)                30 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@     |
[8, 16)               25 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@             |
[16, 32)              33 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[32, 64)              29 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@       |
[64, 128)             13 |@@@@@@@@@@@@@@@@@@@@                                |
[128, 256)             2 |@@@                                                 |


@stats[
    __traceiter_contention_end+76
    __traceiter_contention_end+76
    rwsem_down_write_slowpath+1216
    down_write_killable+100
    do_mprotect_pkey.constprop.0+176
    __arm64_sys_mprotect+40
    invoke_syscall+80
    el0_svc_common.constprop.0+76
    do_el0_svc+52
    el0_svc+48
    el0t_64_sync_handler+164
    el0t_64_sync+420
]: 
[1]                   13 |@@@@@@@@@@@                                         |
[2, 4)                42 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@                |
[4, 8)                 5 |@@@@                                                |
[8, 16)               10 |@@@@@@@@                                            |
[16, 32)              60 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[32, 64)              41 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@                 |
[64, 128)             40 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@                  |
[128, 256)            14 |@@@@@@@@@@@@                                        |
[256, 512)             7 |@@@@@@                                              |
[512, 1K)              6 |@@@@@                                               |
[1K, 2K)               2 |@                                                   |
[2K, 4K)               1 |                                                    |

@stats[
    __traceiter_contention_end+76
    __traceiter_contention_end+76
    queued_spin_lock_slowpath+556
    _raw_spin_lock+108
    futex_wake+168
    do_futex+200
    __arm64_sys_futex+112
    invoke_syscall+80
    el0_svc_common.constprop.0+76
    do_el0_svc+52
    el0_svc+48
    el0t_64_sync_handler+164
    el0t_64_sync+420
]:
[0]                    3 |                                                    |
[1]                 2515 |@                                                   |
[2, 4)              8747 |@@@@@                                               |
[4, 8)             17052 |@@@@@@@@@@                                          |
[8, 16)            46706 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@                       |
[16, 32)           82105 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[32, 64)           46918 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@                       |
[64, 128)             99 |                                                    |
[128, 256)             1 |                                                    |
[256, 512)             8 |                                                    |
[512, 1K)              0 |                                                    |
[1K, 2K)               0 |                                                    |
[2K, 4K)               0 |                                                    |
[4K, 8K)               0 |                                                    |
[8K, 16K)              0 |                                                    |
[16K, 32K)             5 |                                                    |
[32K, 64K)            12 |                                                    |
[64K, 128K)           34 |                                                    |
[128K, 256K)          68 |                                                    |
[256K, 512K)           7 |                                                    |


> * Changes in v3
>  - move the tracepoints deeper in the slow path
>  - remove the caller ip
>  - don't use task state in the flags
> 
> * Changes in v2
>  - do not use lockdep infrastructure
>  - add flags argument to lock:contention_begin tracepoint
> 
> I added a flags argument in the contention_begin to classify locks in
> question.  It can tell whether it's a spinlock, reader-writer lock or
> a mutex.  With stacktrace, users can identify which lock is contended.
>
> The patch 01 added the tracepoints and move the definition to the
> mutex.c file so that it can see the tracepoints without lockdep.
> 
> The patch 02 actually installs the tracepoints in the locking code.
> To minimize the overhead, they were added in the slow path of the code
> separately.  As spinlocks are defined in the arch headers, I couldn't
> handle them all.  I've just added it to generic queued spinlock and
> rwlocks only.  Each arch can add the tracepoints later.
>
> This series base on the current tip/locking/core and you get it from
> 'locking/tracepoint-v3' branch in my tree at:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/namhyung/linux-perf.git
> 
> 
> Thanks,
> Namhyung
> 
> 
> Namhyung Kim (2):
>   locking: Add lock contention tracepoints
>   locking: Apply contention tracepoints in the slow path
> 
>  include/trace/events/lock.h   | 54 +++++++++++++++++++++++++++++++++--
>  kernel/locking/lockdep.c      |  1 -
>  kernel/locking/mutex.c        |  6 ++++
>  kernel/locking/percpu-rwsem.c |  3 ++
>  kernel/locking/qrwlock.c      |  9 ++++++
>  kernel/locking/qspinlock.c    |  5 ++++
>  kernel/locking/rtmutex.c      | 11 +++++++
>  kernel/locking/rwbase_rt.c    |  3 ++
>  kernel/locking/rwsem.c        |  9 ++++++
>  kernel/locking/semaphore.c    | 14 ++++++++-
>  10 files changed, 110 insertions(+), 5 deletions(-)
> 
> 
> base-commit: cd27ccfc727e99352321c0c75012ab9c5a90321e
> -- 
> 2.35.1.894.gb6a874cedc-goog
> 

-- 
Thank you, You are awesome!
Hyeonggon :-)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ