lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAP-5=fXrCMJW-j9Y1qvrxxymLnjOt4GCQ5pdMPSPrT9ZmDPA3g@mail.gmail.com>
Date: Mon, 13 Jan 2025 11:45:39 -0800
From: Ian Rogers <irogers@...gle.com>
To: Adrian Hunter <adrian.hunter@...el.com>
Cc: Tavian Barnes <tavianator@...ianator.com>, linux-perf-users@...r.kernel.org, 
	Peter Zijlstra <peterz@...radead.org>, Ingo Molnar <mingo@...hat.com>, 
	Arnaldo Carvalho de Melo <acme@...nel.org>, Namhyung Kim <namhyung@...nel.org>, 
	Mark Rutland <mark.rutland@....com>, 
	Alexander Shishkin <alexander.shishkin@...ux.intel.com>, Jiri Olsa <jolsa@...nel.org>, 
	"Liang, Kan" <kan.liang@...ux.intel.com>, Andrew Kreimer <algonell@...il.com>, 
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH] perf intel-pt: don't zero the whole perf_sample

On Mon, Jan 13, 2025 at 9:30 AM Ian Rogers <irogers@...gle.com> wrote:
>
> On Mon, Jan 13, 2025 at 12:15 AM Adrian Hunter <adrian.hunter@...el.com> wrote:
> >
> > On 11/01/25 19:56, Tavian Barnes wrote:
> > > C designated initializers like
> > >
> > >     struct perf_sample sample = { .ip = 0, };
> > >
> > > set every unmentioned field of the struct to zero.  But since
> > > sizeof(struct perf_sample) == 1384, this takes a long time.
> > >
> > > struct perf_sample does not need to be fully initialized, and even
> >
> > Yes it does need to be fully initialized.  Leaving members
> > uninitialized in the hope that they never get used adds to
> > code complexity e.g. how do you know they never are used,
> > or future members never will be used.
>
> First thought, perhaps we can use sanitizers to catch mistakes but
> leave the default optimized code to not have any overhead.
>
> Second thought, the perf_sample looks like:
> ```
> $ pahole perf
> ...
> struct perf_sample {
>        u64                        ip;                   /*     0     8 */
>        u32                        pid;                  /*     8     4 */
>        u32                        tid;                  /*    12     4 */
>        u64                        time;                 /*    16     8 */
>        u64                        addr;                 /*    24     8 */
>        u64                        id;                   /*    32     8 */
>        u64                        stream_id;            /*    40     8 */
>        u64                        period;               /*    48     8 */
>        u64                        weight;               /*    56     8 */
>        /* --- cacheline 1 boundary (64 bytes) --- */
>        u64                        transaction;          /*    64     8 */
>        u64                        insn_cnt;             /*    72     8 */
>        u64                        cyc_cnt;              /*    80     8 */
>        u32                        cpu;                  /*    88     4 */
>        u32                        raw_size;             /*    92     4 */
>        u64                        data_src;             /*    96     8 */
>        u64                        phys_addr;            /*   104     8 */
>        u64                        data_page_size;       /*   112     8 */
>        u64                        code_page_size;       /*   120     8 */
>        /* --- cacheline 2 boundary (128 bytes) --- */
>        u64                        cgroup;               /*   128     8 */
>        u32                        flags;                /*   136     4 */
>        u32                        machine_pid;          /*   140     4 */
>        u32                        vcpu;                 /*   144     4 */
>        u16                        insn_len;             /*   148     2 */
>        u8                         cpumode;              /*   150     1 */
>
>        /* XXX 1 byte hole, try to pack */
>
>        u16                        misc;                 /*   152     2 */
>        u16                        ins_lat;              /*   154     2 */
>        union {
>                u16                p_stage_cyc;          /*   156     2 */
>                u16                retire_lat;           /*   156     2 */
>        };                                               /*   156     2 */
>        union {
>                u16                        p_stage_cyc;          /*
> 0     2 */
>                u16                        retire_lat;           /*
> 0     2 */
>        };
>
>        _Bool                      no_hw_idx;            /*   158     1 */
>        char                       insn[16];             /*   159    16 */
>
>        /* XXX 1 byte hole, try to pack */
>
>        void *                     raw_data;             /*   176     8 */
>        struct ip_callchain *      callchain;            /*   184     8 */
>        /* --- cacheline 3 boundary (192 bytes) --- */
>        struct branch_stack *      branch_stack;         /*   192     8 */
>        u64 *                      branch_stack_cntr;    /*   200     8 */
>        struct regs_dump           user_regs;            /*   208   544 */
>        /* --- cacheline 11 boundary (704 bytes) was 48 bytes ago --- */
>        struct regs_dump           intr_regs;            /*   752   544 */
>        /* --- cacheline 20 boundary (1280 bytes) was 16 bytes ago --- */
>        struct stack_dump          user_stack;           /*  1296    24 */
>        struct sample_read         read;                 /*  1320    40 */
>         /* --- cacheline 21 boundary (1344 bytes) was 16 bytes ago --- */
>        struct aux_sample          aux_sample;           /*  1360    16 */
>        struct simd_flags          simd_flags;           /*  1376     8 */
>
>        /* size: 1384, cachelines: 22, members: 39 */
>        /* sum members: 1382, holes: 2, sum holes: 2 */
>        /* last cacheline: 40 bytes */
> };
> ```
> The two regs_dump structs are 79% of the size. They comprise of:
> ```
> struct regs_dump {
>        u64                        abi;                  /*     0     8 */
>        u64                        mask;                 /*     8     8 */
>        u64 *                      regs;                 /*    16     8 */
>        u64                        cache_regs[64];       /*    24   512 */
>        /* --- cacheline 8 boundary (512 bytes) was 24 bytes ago --- */
>        u64                        cache_mask;           /*   536     8 */
>
>        /* size: 544, cachelines: 9, members: 5 */
>        /* last cacheline: 32 bytes */
> };
> ```
> So the cache_regs are taking up the majority of the space. I have a
> WIP patch set to make user_regs and intr_regs optional/lazily
> allocated. I'll send it out shortly.

Sent as:
https://lore.kernel.org/lkml/20250113194345.1537821-1-irogers@google.com/

Thanks,
Ian

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ