lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAEf4BzZuR883FEuKAXp3DY1iJcL+ST8eNq5ioq8oRpDyg0w8Kw@mail.gmail.com>
Date: Mon, 15 Jul 2024 10:10:57 -0700
From: Andrii Nakryiko <andrii.nakryiko@...il.com>
To: Peter Zijlstra <peterz@...radead.org>
Cc: oleg@...hat.com, mingo@...nel.org, andrii@...nel.org, 
	linux-kernel@...r.kernel.org, linux-trace-kernel@...r.kernel.org, 
	rostedt@...dmis.org, mhiramat@...nel.org, jolsa@...nel.org, clm@...a.com, 
	paulmck@...nel.org
Subject: Re: [PATCH v2 00/11] perf/uprobe: Optimize uprobes

On Mon, Jul 15, 2024 at 7:45 AM Peter Zijlstra <peterz@...radead.org> wrote:
>
> On Thu, Jul 11, 2024 at 09:57:44PM -0700, Andrii Nakryiko wrote:
>
> > But then I also ran it on Linux built from perf/uprobes branch (these
> > patches), and after a few seconds I see that there is no more
> > attachment/detachment happening. Eventually I got splats, which you
> > can see in [1]. I used `sudo ./uprobe-stress -a10 -t5 -m5 -f3` command
> > to run it inside my QEMU image.
>
> So them git voodoo incantations did work and I got it built. I'm running
> that exact same line above (minus the sudo, because test box only has a
> root account I think) on real hardware.
>
> I'm now ~100 periods in and wondering what 'eventually' means...

So I was running in a qemu set up with 16 cores on top of bare metal's
80 core CPU (Intel(R) Xeon(R) Gold 6138 CPU @ 2.00GHz). I just tried
it again, and I can reproduce it within first few periods:

WORKING HARD!..

PERIOD #1 STATS:
FUNC CALLS               919632
UPROBE HITS              706351
URETPROBE HITS           641679
ATTACHED LINKS              951
ATTACHED UPROBES           2421
ATTACHED URETPROBES        2343
MMAP CALLS                33533
FORKS CALLS                 241

PERIOD #2 STATS:
FUNC CALLS                11444
UPROBE HITS               14320
URETPROBE HITS             9896
ATTACHED LINKS               26
ATTACHED UPROBES             75
ATTACHED URETPROBES          61
MMAP CALLS                39093
FORKS CALLS                  14

PERIOD #3 STATS:
FUNC CALLS                  230
UPROBE HITS                 152
URETPROBE HITS              145
ATTACHED LINKS                2
ATTACHED UPROBES              2
ATTACHED URETPROBES           2
MMAP CALLS                39121
FORKS CALLS                   0

PERIOD #4 STATS:
FUNC CALLS                    0
UPROBE HITS                   0
URETPROBE HITS                0
ATTACHED LINKS                0
ATTACHED UPROBES              0
ATTACHED URETPROBES           0
MMAP CALLS                39010
FORKS CALLS                   0

You can see in the second period all the numbers drop and by period #4
(which is about 20 seconds in) anything but mmap()ing stops. When I
said "eventually" I meant about a minute tops, however long it takes
to do soft lockup detection, 23 seconds this time.

So it should be very fast.

Note that I'm running with debug kernel configuration (see [0] for
full kernel config), here are debug-related settings, in case that
makes a difference:

$ cat ~/linux-build/default/.config | rg -i debug | rg -v '^#'
CONFIG_X86_DEBUGCTLMSR=y
CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC=y
CONFIG_BLK_DEBUG_FS=y
CONFIG_PNP_DEBUG_MESSAGES=y
CONFIG_AIC7XXX_DEBUG_MASK=0
CONFIG_AIC79XX_DEBUG_MASK=0
CONFIG_SCSI_MVSAS_DEBUG=y
CONFIG_DM_DEBUG=y
CONFIG_MLX4_DEBUG=y
CONFIG_USB_SERIAL_DEBUG=m
CONFIG_INFINIBAND_MTHCA_DEBUG=y
CONFIG_INFINIBAND_IPOIB_DEBUG=y
CONFIG_INFINIBAND_IPOIB_DEBUG_DATA=y
CONFIG_CIFS_DEBUG=y
CONFIG_DLM_DEBUG=y
CONFIG_DEBUG_BUGVERBOSE=y
CONFIG_DEBUG_KERNEL=y
CONFIG_DEBUG_INFO=y
CONFIG_DEBUG_INFO_DWARF4=y
CONFIG_DEBUG_INFO_COMPRESSED_NONE=y
CONFIG_DEBUG_INFO_BTF=y
CONFIG_DEBUG_INFO_BTF_MODULES=y
CONFIG_DEBUG_FS=y
CONFIG_DEBUG_FS_ALLOW_ALL=y
CONFIG_ARCH_HAS_DEBUG_WX=y
CONFIG_HAVE_DEBUG_KMEMLEAK=y
CONFIG_ARCH_HAS_DEBUG_VM_PGTABLE=y
CONFIG_ARCH_HAS_DEBUG_VIRTUAL=y
CONFIG_SCHED_DEBUG=y
CONFIG_DEBUG_PREEMPT=y
CONFIG_LOCK_DEBUGGING_SUPPORT=y
CONFIG_DEBUG_RT_MUTEXES=y
CONFIG_DEBUG_SPINLOCK=y
CONFIG_DEBUG_MUTEXES=y
CONFIG_DEBUG_WW_MUTEX_SLOWPATH=y
CONFIG_DEBUG_RWSEMS=y
CONFIG_DEBUG_LOCK_ALLOC=y
CONFIG_DEBUG_LOCKDEP=y
CONFIG_DEBUG_ATOMIC_SLEEP=y
CONFIG_DEBUG_IRQFLAGS=y
CONFIG_X86_DEBUG_FPU=y
CONFIG_FAULT_INJECTION_DEBUG_FS=y

  [0] https://gist.github.com/anakryiko/97a023a95b30fb0fe607ff743433e64b

>
> Also, this is a 2 socket, 10 core per socket, 2 threads per core
> ivybridge thing, are those parameters sufficient?

Should be, I guess? It might be VM vs bare metal differences, though.
I'll try to run this on bare metal with more production-like kernel
configuration to see if I can still trigger this. Will let you know
the results when I get them.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ