lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Thu, 2 May 2024 11:14:50 -0700
From: Namhyung Kim <namhyung@...nel.org>
To: Arnaldo Carvalho de Melo <acme@...nel.org>
Cc: Ian Rogers <irogers@...gle.com>, Kan Liang <kan.liang@...ux.intel.com>, 
	Jiri Olsa <jolsa@...nel.org>, Adrian Hunter <adrian.hunter@...el.com>, 
	Peter Zijlstra <peterz@...radead.org>, Ingo Molnar <mingo@...nel.org>, 
	LKML <linux-kernel@...r.kernel.org>, linux-perf-users@...r.kernel.org
Subject: Re: [PATCH 4/6] perf annotate-data: Check memory access with two registers

On Thu, May 2, 2024 at 7:05 AM Arnaldo Carvalho de Melo <acme@...nel.org> wrote:
>
> On Wed, May 01, 2024 at 11:00:09PM -0700, Namhyung Kim wrote:
> > The following instruction pattern is used to access a global variable.
> >
> >   mov     $0x231c0, %rax
> >   movsql  %edi, %rcx
> >   mov     -0x7dc94ae0(,%rcx,8), %rcx
> >   cmpl    $0x0, 0xa60(%rcx,%rax,1)     <<<--- here
> >
> > The first instruction set the address of the per-cpu variable (here, it
> > is 'runqueus' of struct rq).  The second instruction seems like a cpu
>
> You mean 'runqueues', i.e. this one:
>
> kernel/sched/core.c
> DEFINE_PER_CPU_SHARED_ALIGNED(struct rq, runqueues);
>
> ?

Right, sorry for the typo.

>
> But that 0xa60 would be in an alignment hole, at least in:
>
> $ pahole --hex rq | egrep 0xa40 -A12
>         struct mm_struct *         prev_mm;              /* 0xa40   0x8 */
>         unsigned int               clock_update_flags;   /* 0xa48   0x4 */
>
>         /* XXX 4 bytes hole, try to pack */
>
>         u64                        clock;                /* 0xa50   0x8 */
>
>         /* XXX 40 bytes hole, try to pack */
>
>         /* --- cacheline 42 boundary (2688 bytes) --- */
>         u64                        clock_task __attribute__((__aligned__(64))); /* 0xa80   0x8 */
>         u64                        clock_pelt;           /* 0xa88   0x8 */
>         long unsigned int          lost_idle_time;       /* 0xa90   0x8 */
> $ uname -a
> Linux toolbox 6.7.11-200.fc39.x86_64 #1 SMP PREEMPT_DYNAMIC Wed Mar 27 16:50:39 UTC 2024 x86_64 GNU/Linux
> $

This would be different on kernel version, config and
other changes like backports or local modifications.

On my system, it was cpu_stop_work.arg.

$ pahole --hex rq | grep 0xa40 -C1
    /* --- cacheline 41 boundary (2624 bytes) --- */
    struct cpu_stop_work       active_balance_work;  /* 0xa40  0x30 */
    int                        cpu;                  /* 0xa70   0x4 */

$ pahole --hex cpu_stop_work
struct cpu_stop_work {
    struct list_head           list;                 /*     0  0x10 */
    cpu_stop_fn_t              fn;                   /*  0x10   0x8 */
    long unsigned int          caller;               /*  0x18   0x8 */
    void *                     arg;                  /*  0x20   0x8 */
    struct cpu_stop_done *     done;                 /*  0x28   0x8 */

    /* size: 48, cachelines: 1, members: 5 */
    /* last cacheline: 48 bytes */
};


>
> The paragraph then reads:
>
> ----
> The first instruction set the address of the per-cpu variable (here, it
> is 'runqueues' of type 'struct rq').  The second instruction seems like
> a cpu number of the per-cpu base.  The third instruction get the base
> offset of per-cpu area for that cpu.  The last instruction compares the
> value of the per-cpu variable at the offset of 0xa60.
> ----
>
> Ok?

Yep, looks good.

Thanks,
Namhyung

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ