lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZBBnUlIQ5gLhoScx@kernel.org>
Date:   Tue, 14 Mar 2023 09:23:46 -0300
From:   Arnaldo Carvalho de Melo <acme@...nel.org>
To:     Namhyung Kim <namhyung@...nel.org>
Cc:     Jiri Olsa <jolsa@...nel.org>,
        Peter Zijlstra <peterz@...radead.org>,
        Ingo Molnar <mingo@...nel.org>, Song Liu <song@...nel.org>,
        Hao Luo <haoluo@...gle.com>,
        Juri Lelli <juri.lelli@...hat.com>,
        Will Deacon <will@...nel.org>,
        Waiman Long <longman@...hat.com>,
        Boqun Feng <boqun.feng@...il.com>,
        Stephane Eranian <eranian@...gle.com>,
        LKML <linux-kernel@...r.kernel.org>,
        linux-perf-users@...r.kernel.org, bpf@...r.kernel.org
Subject: Re: [PATCH 0/4] perf lock contention: Improve lock symbol display
 (v1)

Em Mon, Mar 13, 2023 at 06:45:53PM -0300, Arnaldo Carvalho de Melo escreveu:
> Em Mon, Mar 13, 2023 at 01:48:21PM -0700, Namhyung Kim escreveu:
> > Hello,
> > 
> > This patchset improves the symbolization of locks for -l/--lock-addr mode.
> > As of now it only shows global lock symbols present in the kallsyms.  But
> > we can add some more lock symbols by traversing pointers in the BPF program.
> > 
> > For example, mmap_lock can be reached from the mm_struct of the current task
> > (task_struct->mm->mmap_lock) and we can compare the address of the give lock
> > with it.  Similarly I've added 'siglock' for current->sighand->siglock.

Hey, we can go a bit further by using something like pahole's
--expand_types and --expand_pointers and play iterating a type members
and looking for locks, like:

⬢[acme@...lbox pahole]$ pahole task_struct | grep spinlock_t
	spinlock_t                 alloc_lock;           /*  3280     4 */
	raw_spinlock_t             pi_lock;              /*  3284     4 */
	seqcount_spinlock_t        mems_allowed_seq;     /*  3616     4 */
⬢[acme@...lbox pahole]$

Expand points will find mmap_lock:

⬢[acme@...lbox pahole]$ pahole --expand_pointers -C task_struct | grep -B10 mmap_lock
								} *pgd;
								atomic_t                       membarrier_state;
								atomic_t                       mm_users;
								atomic_t                       mm_count;

								/* XXX 4 bytes hole, try to pack */

								atomic_long_t                  pgtables_bytes;
								int                            map_count;
								spinlock_t                     page_table_lock;
								struct rw_semaphore                          mmap_lock;
^C
⬢[acme@...lbox pahole]$


ITs just too much expansion to see task_struct->mm, but it is there, of
course:

⬢[acme@...lbox pahole]$ pahole mm_struct | grep mmap_lock
		struct rw_semaphore mmap_lock;           /*   120    40 */
⬢[acme@...lbox pahole]$

Also:

⬢[acme@...lbox pahole]$ pahole --contains rw_semaphore
address_space
signal_struct
key
inode
super_block
quota_info
user_namespace
blocking_notifier_head
backing_dev_info
anon_vma
tty_struct
cpufreq_policy
tcf_block
ipc_ids
autogroup
kvm_arch
posix_clock
listener_list
uprobe
kernfs_root
configfs_fragment
ext4_inode_info
ext4_group_info
btrfs_fs_info
extent_buffer
btrfs_dev_replace
btrfs_space_info
btrfs_inode
btrfs_block_group
tpm_chip
ib_device
ib_xrcd
blk_crypto_profile
controller
led_classdev
cppc_pcc_data
dm_snapshot
⬢[acme@...lbox pahole]$

And:

⬢[acme@...lbox pahole]$ pahole --find_pointers_to mm_struct
task_struct: mm
task_struct: active_mm
vm_area_struct: vm_mm
flush_tlb_info: mm
signal_struct: oom_mm
tlb_state: loaded_mm
linux_binprm: mm
mmu_gather: mm
trace_event_raw_xen_mmu_ptep_modify_prot: mm
trace_event_raw_xen_mmu_alloc_ptpage: mm
trace_event_raw_xen_mmu_pgd: mm
trace_event_raw_xen_mmu_flush_tlb_multi: mm
trace_event_raw_hyperv_mmu_flush_tlb_multi: mm
mmu_notifier: mm
mmu_notifier_range: mm
sgx_encl_mm: mm
rq: prev_mm
kvm: mm
cpuset_migrate_mm_work: mm
mmap_unlock_irq_work: mm
delayed_uprobe: mm
map_info: mm
trace_event_raw_mmap_lock: mm
trace_event_raw_mmap_lock_acquire_returned: mm
mm_walk: mm
make_exclusive_args: mm
mmu_interval_notifier: mm
mm_slot: mm
rmap_item: mm
trace_event_raw_mm_khugepaged_scan_pmd: mm
trace_event_raw_mm_collapse_huge_page: mm
trace_event_raw_mm_collapse_huge_page_swapin: mm
mm_slot: mm
move_charge_struct: mm
userfaultfd_ctx: mm
proc_maps_private: mm
remap_pfn: mm
intel_svm: mm
binder_alloc: vma_vm_mm
⬢[acme@...lbox pahole]$

- Arnaldo


> > On the other hand, we can traverse some of semi-global locks like per-cpu,
> > per-device, per-filesystem and so on.  I've added 'rqlock' for each cpu's
> > runqueue lock.
> > 
> > It cannot cover all types of locks in the system but it'd be fairly usefule
> > if we can add many of often contended locks.  I tried to add futex locks
> > but it failed to find the __futex_data symbol from BTF.  I'm not sure why but
> > I guess it's because the struct doesn't have a tag name.
> > 
> > Those locks are added just because they got caught during my test.
> > It'd be nice if you suggest which locks to add and how to do that. :)
> > I'm thinking if there's a way to track file-based locks (like epoll, etc).
> > 
> > Finally I also added a lock type name after the symbols (if any) so that we
> > can get some idea even though it has no symbol.  The example output looks
> > like below:
> > 
> >   $ sudo ./perf lock con -abl -- sleep 1
> >    contended   total wait     max wait     avg wait            address   symbol
> > 
> >           44      6.13 ms    284.49 us    139.28 us   ffffffff92e06080   tasklist_lock (rwlock)
> >          159    983.38 us     12.38 us      6.18 us   ffff8cc717c90000   siglock (spinlock)
> >           10    679.90 us    153.35 us     67.99 us   ffff8cdc2872aaf8   mmap_lock (rwsem)
> >            9    558.11 us    180.67 us     62.01 us   ffff8cd647914038   mmap_lock (rwsem)
> >           78    228.56 us      7.82 us      2.93 us   ffff8cc700061c00    (spinlock)
> >            5     41.60 us     16.93 us      8.32 us   ffffd853acb41468    (spinlock)
> >           10     37.24 us      5.87 us      3.72 us   ffff8cd560b5c200   siglock (spinlock)
> >            4     11.17 us      3.97 us      2.79 us   ffff8d053ddf0c80   rq_lock (spinlock)
> >            1      7.86 us      7.86 us      7.86 us   ffff8cd64791404c    (spinlock)
> >            1      4.13 us      4.13 us      4.13 us   ffff8d053d930c80   rq_lock (spinlock)
> >            7      3.98 us      1.67 us       568 ns   ffff8ccb92479440    (mutex)
> >            2      2.62 us      2.33 us      1.31 us   ffff8cc702e6ede0    (rwlock)
> > 
> > The tasklist_lock is global so it's from the kallsyms.  But others like
> > siglock, mmap_lock and rq_lock are from the BPF.
> 
> Beautiful :-)
> 
> And the csets are _so_ small and demonstrate techniques that should be
> used in more and more tools.
> 
> Applied, testing.
> 
> - Arnaldo
>  
> > You get get the code at 'perf/lock-symbol-v1' branch in
> > 
> >   git://git.kernel.org/pub/scm/linux/kernel/git/namhyung/linux-perf.git
> > 
> > Thanks,
> > Namhyung
> > 
> > Namhyung Kim (4):
> >   perf lock contention: Track and show mmap_lock with address
> >   perf lock contention: Track and show siglock with address
> >   perf lock contention: Show per-cpu rq_lock with address
> >   perf lock contention: Show lock type with address
> > 
> >  tools/perf/builtin-lock.c                     | 46 +++++++----
> >  tools/perf/util/bpf_lock_contention.c         | 35 ++++++++-
> >  .../perf/util/bpf_skel/lock_contention.bpf.c  | 77 +++++++++++++++++++
> >  tools/perf/util/bpf_skel/lock_data.h          | 14 ++++
> >  4 files changed, 152 insertions(+), 20 deletions(-)
> > 
> > 
> > base-commit: b8fa3e3833c14151a47ebebbc5427dcfe94bb407
> > -- 
> > 2.40.0.rc1.284.g88254d51c5-goog
> > 
> 
> -- 
> 
> - Arnaldo

-- 

- Arnaldo

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ