lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAP-5=fWF0u58zeJHhVDvKs8RFjoJmo6pZHPoKOjfU-jMzn1a8w@mail.gmail.com>
Date:   Tue, 24 Jan 2023 16:26:14 -0800
From:   Ian Rogers <irogers@...gle.com>
To:     Namhyung Kim <namhyung@...nel.org>
Cc:     Arnaldo Carvalho de Melo <acme@...nel.org>,
        Jiri Olsa <jolsa@...nel.org>,
        Peter Zijlstra <peterz@...radead.org>,
        Song Liu <song@...nel.org>, Ingo Molnar <mingo@...nel.org>,
        LKML <linux-kernel@...r.kernel.org>,
        Adrian Hunter <adrian.hunter@...el.com>,
        linux-perf-users@...r.kernel.org, Will Deacon <will@...nel.org>,
        Waiman Long <longman@...hat.com>,
        Boqun Feng <boqun.feng@...il.com>,
        Davidlohr Bueso <dave@...olabs.net>, bpf@...r.kernel.org
Subject: Re: [RFC/PATCH] perf lock contention: Add -o/--lock-owner option

On Tue, Jan 24, 2023 at 9:59 AM Namhyung Kim <namhyung@...nel.org> wrote:
>
> Hello,
>
> Any feedback on this?
>
> On Thu, Jan 5, 2023 at 12:32 PM Namhyung Kim <namhyung@...nel.org> wrote:
> >
> > When there're many lock contentions in the system, people sometimes
> > want to know who caused the contention, IOW who's the owner of the
> > locks.
> >
> > The -o/--lock-owner option tries to follow the lock owners for the
> > contended mutexes and rwsems from BPF, and then attributes the
> > contention time to the owner instead of the waiter.  It's a best
> > effort approach to get the owner info at the time of the contention
> > and doesn't guarantee to have the precise tracking of owners if it's
> > changing over time.
> >
> > Currently it only handles mutex and rwsem that have owner field in
> > their struct and it basically points to a task_struct that owns the
> > lock at the moment.
> >
> > Technically its type is atomic_long_t and it comes with some LSB bits
> > used for other meaninigs.  So it needs to clear them when casting it
> > to a pointer to task_struct.
> >
> > Also the atomic_long_t is a typedef of the atomic 32 or 64 bit types
> > depending on arch which is a wrapper struct for the counter value.
> > I'm not aware of proper ways to access those kernel atomic types from
> > BPF so I just read the internal counter value directly.  Please let me
> > know if there's a better way.
> >
> > When -o/--lock-owner option is used, it goes to the task aggregation
> > mode like -t/--threads option does.  However it cannot get the owner
> > for other lock types like spinlock and sometimes even for mutex.
> >
> >   $ sudo ./perf lock con -abo -- ./perf bench sched pipe
> >   # Running 'sched/pipe' benchmark:
> >   # Executed 1000000 pipe operations between two processes
> >
> >        Total time: 4.766 [sec]
> >
> >          4.766540 usecs/op
> >            209795 ops/sec
> >    contended   total wait     max wait     avg wait          pid   owner
> >
> >          403    565.32 us     26.81 us      1.40 us           -1   Unknown
> >            4     27.99 us      8.57 us      7.00 us      1583145   sched-pipe
> >            1      8.25 us      8.25 us      8.25 us      1583144   sched-pipe
> >            1      2.03 us      2.03 us      2.03 us         5068   chrome
> >
> > As you can see, the owner is unknown for the most cases.  But if we
> > filter only for the mutex locks, it'd more likely get the onwers.
> >
> >   $ sudo ./perf lock con -abo -Y mutex -- ./perf bench sched pipe
> >   # Running 'sched/pipe' benchmark:
> >   # Executed 1000000 pipe operations between two processes
> >
> >        Total time: 4.910 [sec]
> >
> >          4.910435 usecs/op
> >            203647 ops/sec
> >    contended   total wait     max wait     avg wait          pid   owner
> >
> >            2     15.50 us      8.29 us      7.75 us      1582852   sched-pipe
> >            7      7.20 us      2.47 us      1.03 us           -1   Unknown
> >            1      6.74 us      6.74 us      6.74 us      1582851   sched-pipe
> >
> > Signed-off-by: Namhyung Kim <namhyung@...nel.org>

Nothing to add on the coding side, and as the commit message says this
can only be a best effort.

Reviewed-by: Ian Rogers <irogers@...gle.com>

Thanks,
Ian

> > ---
> >  tools/perf/Documentation/perf-lock.txt        |  5 ++
> >  tools/perf/builtin-lock.c                     | 49 ++++++++++++---
> >  tools/perf/util/bpf_lock_contention.c         |  1 +
> >  .../perf/util/bpf_skel/lock_contention.bpf.c  | 60 +++++++++++++++++--
> >  tools/perf/util/lock-contention.h             |  1 +
> >  5 files changed, 102 insertions(+), 14 deletions(-)
> >
> > diff --git a/tools/perf/Documentation/perf-lock.txt b/tools/perf/Documentation/perf-lock.txt
> > index 0f9f720e599d..a41c8acc7002 100644
> > --- a/tools/perf/Documentation/perf-lock.txt
> > +++ b/tools/perf/Documentation/perf-lock.txt
> > @@ -172,6 +172,11 @@ CONTENTION OPTIONS
> >  --lock-addr::
> >         Show lock contention stat by address
> >
> > +-o::
> > +--lock-owner::
> > +       Show lock contention stat by owners.  Implies --threads and
> > +       requires --use-bpf.
> > +
> >  -Y::
> >  --type-filter=<value>::
> >         Show lock contention only for given lock types (comma separated list).
> > diff --git a/tools/perf/builtin-lock.c b/tools/perf/builtin-lock.c
> > index 718b82bfcdff..5a3ed5a2bd3d 100644
> > --- a/tools/perf/builtin-lock.c
> > +++ b/tools/perf/builtin-lock.c
> > @@ -58,6 +58,7 @@ static struct rb_root         thread_stats;
> >  static bool combine_locks;
> >  static bool show_thread_stats;
> >  static bool show_lock_addrs;
> > +static bool show_lock_owner;
> >  static bool use_bpf;
> >  static unsigned long bpf_map_entries = 10240;
> >  static int max_stack_depth = CONTENTION_STACK_DEPTH;
> > @@ -1567,7 +1568,8 @@ static void print_contention_result(struct lock_contention *con)
> >
> >                 switch (aggr_mode) {
> >                 case LOCK_AGGR_TASK:
> > -                       pr_info("  %10s   %s\n\n", "pid", "comm");
> > +                       pr_info("  %10s   %s\n\n", "pid",
> > +                               show_lock_owner ? "owner" : "comm");
> >                         break;
> >                 case LOCK_AGGR_CALLER:
> >                         pr_info("  %10s   %s\n\n", "type", "caller");
> > @@ -1607,7 +1609,8 @@ static void print_contention_result(struct lock_contention *con)
> >                 case LOCK_AGGR_TASK:
> >                         pid = st->addr;
> >                         t = perf_session__findnew(session, pid);
> > -                       pr_info("  %10d   %s\n", pid, thread__comm_str(t));
> > +                       pr_info("  %10d   %s\n",
> > +                               pid, pid == -1 ? "Unknown" : thread__comm_str(t));
> >                         break;
> >                 case LOCK_AGGR_ADDR:
> >                         pr_info("  %016llx   %s\n", (unsigned long long)st->addr,
> > @@ -1718,6 +1721,37 @@ static void sighandler(int sig __maybe_unused)
> >  {
> >  }
> >
> > +static int check_lock_contention_options(const struct option *options,
> > +                                        const char * const *usage)
> > +
> > +{
> > +       if (show_thread_stats && show_lock_addrs) {
> > +               pr_err("Cannot use thread and addr mode together\n");
> > +               parse_options_usage(usage, options, "threads", 0);
> > +               parse_options_usage(NULL, options, "lock-addr", 0);
> > +               return -1;
> > +       }
> > +
> > +       if (show_lock_owner && !use_bpf) {
> > +               pr_err("Lock owners are available only with BPF\n");
> > +               parse_options_usage(usage, options, "lock-owner", 0);
> > +               parse_options_usage(NULL, options, "use-bpf", 0);
> > +               return -1;
> > +       }
> > +
> > +       if (show_lock_owner && show_lock_addrs) {
> > +               pr_err("Cannot use owner and addr mode together\n");
> > +               parse_options_usage(usage, options, "lock-owner", 0);
> > +               parse_options_usage(NULL, options, "lock-addr", 0);
> > +               return -1;
> > +       }
> > +
> > +       if (show_lock_owner)
> > +               show_thread_stats = true;
> > +
> > +       return 0;
> > +}
> > +
> >  static int __cmd_contention(int argc, const char **argv)
> >  {
> >         int err = -EINVAL;
> > @@ -1742,6 +1776,7 @@ static int __cmd_contention(int argc, const char **argv)
> >                 .max_stack = max_stack_depth,
> >                 .stack_skip = stack_skip,
> >                 .filters = &filters,
> > +               .owner = show_lock_owner,
> >         };
> >
> >         session = perf_session__new(use_bpf ? NULL : &data, &eops);
> > @@ -2188,6 +2223,7 @@ int cmd_lock(int argc, const char **argv)
> >                      "Filter specific type of locks", parse_lock_type),
> >         OPT_CALLBACK('L', "lock-filter", NULL, "ADDRS/NAMES",
> >                      "Filter specific address/symbol of locks", parse_lock_addr),
> > +       OPT_BOOLEAN('o', "lock-owner", &show_lock_owner, "show lock owners instead of waiters"),
> >         OPT_PARENT(lock_options)
> >         };
> >
> > @@ -2258,14 +2294,9 @@ int cmd_lock(int argc, const char **argv)
> >                                              contention_usage, 0);
> >                 }
> >
> > -               if (show_thread_stats && show_lock_addrs) {
> > -                       pr_err("Cannot use thread and addr mode together\n");
> > -                       parse_options_usage(contention_usage, contention_options,
> > -                                           "threads", 0);
> > -                       parse_options_usage(NULL, contention_options,
> > -                                           "lock-addr", 0);
> > +               if (check_lock_contention_options(contention_options,
> > +                                                 contention_usage) < 0)
> >                         return -1;
> > -               }
> >
> >                 rc = __cmd_contention(argc, argv);
> >         } else {
> > diff --git a/tools/perf/util/bpf_lock_contention.c b/tools/perf/util/bpf_lock_contention.c
> > index 0236334fd69b..709d418be873 100644
> > --- a/tools/perf/util/bpf_lock_contention.c
> > +++ b/tools/perf/util/bpf_lock_contention.c
> > @@ -146,6 +146,7 @@ int lock_contention_prepare(struct lock_contention *con)
> >         /* these don't work well if in the rodata section */
> >         skel->bss->stack_skip = con->stack_skip;
> >         skel->bss->aggr_mode = con->aggr_mode;
> > +       skel->bss->lock_owner = con->owner;
> >
> >         lock_contention_bpf__attach(skel);
> >         return 0;
> > diff --git a/tools/perf/util/bpf_skel/lock_contention.bpf.c b/tools/perf/util/bpf_skel/lock_contention.bpf.c
> > index ad0ca5d50557..a035a267b08e 100644
> > --- a/tools/perf/util/bpf_skel/lock_contention.bpf.c
> > +++ b/tools/perf/util/bpf_skel/lock_contention.bpf.c
> > @@ -10,6 +10,14 @@
> >  /* default buffer size */
> >  #define MAX_ENTRIES  10240
> >
> > +/* lock contention flags from include/trace/events/lock.h */
> > +#define LCB_F_SPIN     (1U << 0)
> > +#define LCB_F_READ     (1U << 1)
> > +#define LCB_F_WRITE    (1U << 2)
> > +#define LCB_F_RT       (1U << 3)
> > +#define LCB_F_PERCPU   (1U << 4)
> > +#define LCB_F_MUTEX    (1U << 5)
> > +
> >  struct tstamp_data {
> >         __u64 timestamp;
> >         __u64 lock;
> > @@ -83,6 +91,7 @@ int has_task;
> >  int has_type;
> >  int has_addr;
> >  int stack_skip;
> > +int lock_owner;
> >
> >  /* determine the key of lock stat */
> >  int aggr_mode;
> > @@ -131,17 +140,24 @@ static inline int can_record(u64 *ctx)
> >         return 1;
> >  }
> >
> > -static inline void update_task_data(__u32 pid)
> > +static inline int update_task_data(struct task_struct *task)
> >  {
> >         struct contention_task_data *p;
> > +       int pid, err;
> > +
> > +       err = bpf_core_read(&pid, sizeof(pid), &task->pid);
> > +       if (err)
> > +               return -1;
> >
> >         p = bpf_map_lookup_elem(&task_data, &pid);
> >         if (p == NULL) {
> > -               struct contention_task_data data;
> > +               struct contention_task_data data = {};
> >
> > -               bpf_get_current_comm(data.comm, sizeof(data.comm));
> > +               BPF_CORE_READ_STR_INTO(&data.comm, task, comm);
> >                 bpf_map_update_elem(&task_data, &pid, &data, BPF_NOEXIST);
> >         }
> > +
> > +       return 0;
> >  }
> >
> >  SEC("tp_btf/contention_begin")
> > @@ -178,6 +194,38 @@ int contention_begin(u64 *ctx)
> >                                                   BPF_F_FAST_STACK_CMP | stack_skip);
> >                 if (pelem->stack_id < 0)
> >                         lost++;
> > +       } else if (aggr_mode == LOCK_AGGR_TASK) {
> > +               struct task_struct *task;
> > +
> > +               if (lock_owner) {
> > +                       if (pelem->flags & LCB_F_MUTEX) {
> > +                               struct mutex *lock = (void *)pelem->lock;
> > +                               unsigned long owner = BPF_CORE_READ(lock, owner.counter);
> > +
> > +                               task = (void *)(owner & ~7UL);
> > +                       } else if (pelem->flags == LCB_F_READ || pelem->flags == LCB_F_WRITE) {
> > +                               struct rw_semaphore *lock = (void *)pelem->lock;
> > +                               unsigned long owner = BPF_CORE_READ(lock, owner.counter);
> > +
> > +                               task = (void *)(owner & ~7UL);
> > +                       } else {
> > +                               task = NULL;
> > +                       }
> > +
> > +                       /* The flags is not used anymore.  Pass the owner pid. */
> > +                       if (task)
> > +                               pelem->flags = BPF_CORE_READ(task, pid);
> > +                       else
> > +                               pelem->flags = -1U;
> > +
> > +               } else {
> > +                       task = bpf_get_current_task_btf();
> > +               }
> > +
> > +               if (task) {
> > +                       if (update_task_data(task) < 0 && lock_owner)
> > +                               pelem->flags = -1U;
> > +               }
> >         }
> >
> >         return 0;
> > @@ -207,8 +255,10 @@ int contention_end(u64 *ctx)
> >                 key.aggr_key = pelem->stack_id;
> >                 break;
> >         case LOCK_AGGR_TASK:
> > -               key.aggr_key = pid;
> > -               update_task_data(pid);
> > +               if (lock_owner)
> > +                       key.aggr_key = pelem->flags;
> > +               else
> > +                       key.aggr_key = pid;
> >                 break;
> >         case LOCK_AGGR_ADDR:
> >                 key.aggr_key = pelem->lock;
> > diff --git a/tools/perf/util/lock-contention.h b/tools/perf/util/lock-contention.h
> > index b99e83fccf5c..862b0617698a 100644
> > --- a/tools/perf/util/lock-contention.h
> > +++ b/tools/perf/util/lock-contention.h
> > @@ -128,6 +128,7 @@ struct lock_contention {
> >         int max_stack;
> >         int stack_skip;
> >         int aggr_mode;
> > +       int owner;
> >  };
> >
> >  #ifdef HAVE_BPF_SKEL
> > --
> > 2.39.0.314.g84b9a713c41-goog
> >

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ