linux-kernel - Re: [PATCH v4 17/39] unwind_user/sframe: Add support for reading .sframe headers

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAEf4BzZaWrscT1HrcUJkz45iMMuyCcA6ivfMffeHpxf=LmmXRg@mail.gmail.com>
Date: Mon, 27 Jan 2025 17:10:27 -0800
From: Andrii Nakryiko <andrii.nakryiko@...il.com>
To: Indu Bhagat <indu.bhagat@...cle.com>, Josh Poimboeuf <jpoimboe@...nel.org>
Cc: x86@...nel.org, Peter Zijlstra <peterz@...radead.org>, 
	Steven Rostedt <rostedt@...dmis.org>, Ingo Molnar <mingo@...nel.org>, 
	Arnaldo Carvalho de Melo <acme@...nel.org>, linux-kernel@...r.kernel.org, 
	Mark Rutland <mark.rutland@....com>, 
	Alexander Shishkin <alexander.shishkin@...ux.intel.com>, Jiri Olsa <jolsa@...nel.org>, 
	Namhyung Kim <namhyung@...nel.org>, Ian Rogers <irogers@...gle.com>, 
	Adrian Hunter <adrian.hunter@...el.com>, linux-perf-users@...r.kernel.org, 
	Mark Brown <broonie@...nel.org>, linux-toolchains@...r.kernel.org, 
	Jordan Rome <jordalgo@...a.com>, Sam James <sam@...too.org>, linux-trace-kernel@...r.kernel.org, 
	Jens Remus <jremus@...ux.ibm.com>, Mathieu Desnoyers <mathieu.desnoyers@...icios.com>, 
	Florian Weimer <fweimer@...hat.com>, Andy Lutomirski <luto@...nel.org>, 
	Masami Hiramatsu <mhiramat@...nel.org>, Weinan Liu <wnliu@...gle.com>
Subject: Re: [PATCH v4 17/39] unwind_user/sframe: Add support for reading
 .sframe headers

On Fri, Jan 24, 2025 at 2:14 PM Indu Bhagat <indu.bhagat@...cle.com> wrote:
>
> On 1/24/25 11:21 AM, Josh Poimboeuf wrote:
> > On Fri, Jan 24, 2025 at 10:00:52AM -0800, Andrii Nakryiko wrote:
> >> On Tue, Jan 21, 2025 at 6:32 PM Josh Poimboeuf <jpoimboe@...nel.org> wrote:
> >>> +static inline int sframe_add_section(unsigned long sframe_start, unsigned long sframe_end, unsigned long text_start, unsigned long text_end) { return -ENOSYS; }
> >>
> >> nit: very-very long, wrap it?
> >
> > That was intentional as it's just an empty stub, but yeah, maybe 160
> > chars is a bit much.
> >
> >>> +       if (shdr.preamble.magic != SFRAME_MAGIC ||
> >>> +           shdr.preamble.version != SFRAME_VERSION_2 ||
> >>> +           !(shdr.preamble.flags & SFRAME_F_FDE_SORTED) ||
> >>
> >> probably more a question to Indu, but why is this sorting not
> >> mandatory and part of SFrame "standard"? How realistically non-sorted
> >> FDEs would work in practice? Ain't nobody got time to sort them just
> >> to unwind the stack...
> >
> > No idea...
> >
> >>> +       if (!shdr.num_fdes || !shdr.num_fres) {
> >>
> >> given SFRAME_F_FRAME_POINTER in the header, is it really that
> >> nonsensical and illegal to have zero FDEs/FREs? Maybe we should allow
> >> that?
> >
> > It would seem a bit silly to create an empty .sframe section just to set
> > that SFRAME_F_FRAME_POINTER bit.  Regardless, there's nothing the kernel
> > can do with that.
> >
>
> Yes, in theory, it is allowed (as per the specification) to have an
> SFrame section with zero number of FDEs/FREs.  But since such a section
> will not be useful, I share the opinion that it makes sense to disallow
> it in the current unwinding contexts, for now (JIT usecase may change
> things later).
>

I disagree, actually. If it's a legal thing, it shouldn't be randomly
rejected. If we later make use of that, we'd have to worry not to
accidentally cause problems on older kernels that arbitrarily rejected
empty FDE just because it didn't make sense at some point (without
causing any issues).


> SFRAME_F_FRAME_POINTER flag is not being set currently by GAS/GNU ld at all.
>
> >>> +               dbg("no fde/fre entries\n");
> >>> +               return -EINVAL;
> >>> +       }
> >>> +
> >>> +       header_end = sec->sframe_start + SFRAME_HEADER_SIZE(shdr);
> >>> +       if (header_end >= sec->sframe_end) {
> >>
> >> if we allow zero FDEs/FREs, header_end == sec->sframe_end is legal, right?
> >
> > I suppose so, but again I'm not seeing any reason to support that.

Let's invert this. Is there any reason why it shouldn't be supported? ;)

> >
> >>> +               dbg("header doesn't fit in section\n");
> >>> +               return -EINVAL;
> >>> +       }
> >>> +
> >>> +       num_fdes   = shdr.num_fdes;
> >>> +       fdes_start = header_end + shdr.fdes_off;
> >>> +       fdes_end   = fdes_start + (num_fdes * sizeof(struct sframe_fde));
> >>> +
> >>> +       fres_start = header_end + shdr.fres_off;
> >>> +       fres_end   = fres_start + shdr.fre_len;
> >>> +
> >>
> >> maybe use check_add_overflow() in all the above calculation, at least
> >> on 32-bit arches this all can overflow and it's not clear if below
> >> sanity check detects all possible overflows
> >
> > Ok, I'll look into it.
> >
> >>> +struct sframe_preamble {
> >>> +       u16     magic;
> >>> +       u8      version;
> >>> +       u8      flags;
> >>> +} __packed;
> >>> +
> >>> +struct sframe_header {
> >>> +       struct sframe_preamble preamble;
> >>> +       u8      abi_arch;
> >>> +       s8      cfa_fixed_fp_offset;
> >>> +       s8      cfa_fixed_ra_offset;
> >>> +       u8      auxhdr_len;
> >>> +       u32     num_fdes;
> >>> +       u32     num_fres;
> >>> +       u32     fre_len;
> >>> +       u32     fdes_off;
> >>> +       u32     fres_off;
> >>> +} __packed;
> >>> +
> >>> +struct sframe_fde {
> >>> +       s32     start_addr;
> >>> +       u32     func_size;
> >>> +       u32     fres_off;
> >>> +       u32     fres_num;
> >>> +       u8      info;
> >>> +       u8      rep_size;
> >>> +       u16 padding;
> >>> +} __packed;
> >>
> >> I couldn't understand from SFrame itself, but why do sframe_header,
> >> sframe_preamble, and sframe_fde have to be marked __packed, if it's
> >> all naturally aligned (intentionally and by design)?..
> >
> > Right, but the spec says they're all packed.  Maybe the point is that
> > some future sframe version is free to introduce unaligned fields.
> >
>
> SFrame specification aims to keep SFrame header and SFrame FDE members
> at aligned boundaries in future versions.
>
> Only SFrame FRE related accesses may have unaligned accesses.

Yeah, and it's actually bothering me quite a lot :) I have a tentative
proposal, maybe we can discuss this for SFrame v3? Let me briefly
outline the idea.

So, currently in v2, FREs within FDEs use an array-of-structs layout.
If we use preudo-C type definitions, it would be something like this
for FDE + its FREs:

struct FDE_and_FREs {
    struct sframe_func_desc_entry fde_metadata;

    union FRE {
        struct FRE8 {
            u8 sfre_start_address;
            u8 sfre_info;
            u8|u16|u32 offsets[M];
        }
        struct FRE16 {
            u16 sfre_start_address;
            u16 sfre_info;
            u8|u16|u32 offsets[M];
        }
        struct FRE32 {
            u32 sfre_start_address;
            u32 sfre_info;
            u8|u16|u32 offsets[M];
        }
    } fres[N] __packed;
};

where all fres[i]s are one of those FRE8/FRE16/FRE32, so start
addresses have the same size, but each FRE has potentially different
offsets sizing, so there is no common alignment, and so everything has
to be packed and unaligned.

But what if we take a struct-of-arrays approach and represent it more like:

struct FDE_and_FREs {
    struct sframe_func_desc_entry fde_metadata;
    u8|u16|u32 start_addrs[N]; /* can extend to u64 as well */
    u8 sfre_infos[N];
    u8 offsets8[M8];
    u16 offsets16[M16] __aligned(2);
    u32 offsets32[M32] __aligned(4);
    /* we can naturally extend to support also u64 offsets */
};

i.e., we split all FRE records into their three constituents: start
addresses, info bytes, and then each FRE can fall into either 8-, 16-,
or 32-bit offsets "bucket". We collect all the offsets, depending on
their size, into these aligned offsets{8,16,32} arrays (with natural
extension to 64 bits, if necessary), with at most wasting 1-3 bytes to
ensure proper alignment everywhere.

Note, at this point we need to decide if we want to make FREs binary
searchable or not.

If not, we don't really need anything extra. As we process each
start_addrs[i] and sfre_infos[i] to find matching FRE, we keep track
of how many 8-, 16-, and 32-bit offsets already processed FREs
consumed, and when we find the right one, we know exactly the starting
index within offset{8,16,32}. Done.

But if we were to make FREs binary searchable, we need to basically
have an index of offset pointers to quickly find offsetsX[j] position
corresponding to FRE #i. For that, we can have an extra array right
next to start_addrs, "semantically parallel" to it:

u8|u16|u32 start_addrs[N];
u8|u16|u32 offset_idxs[N];

where start_addrs[i] corresponds to offset_idxs[i], and offset_idxs[i]
points to the first offset corresponding to FRE #i in offsetX[] array
(depending on FRE's "bitness"). This is a bit more storage for this
offset index, but for FDEs with lots of FREs this might be a
worthwhile tradeoff.

Few points:
  a) we can decide this "binary searchability" per-FDE, and for FDEs
with 1-2-3 FREs not bother, while those with more FREs would be
searchable ones with index. So we can combine both fast lookups,
natural alignment of on-disk format, and compactness. The presence of
index is just another bit in FDE metadata.
  b) bitness of offset_idxs[] can be coupled with bitness of
start_addrs (for simplicity), or could be completely independent and
identified by FDE's metadata (2 more bits to define this just like
start_addr bitness is defined). Independent probably would be my
preference, with linker (or whoever will be producing .sframe data)
can pick the smallest bitness that is sufficient to represent
everything.

Yes, it's a bit more complicated to draw and explain, but everything
will be nicely aligned, extensible to 64 bits, and (optionally at
least) binary searchable. Implementation-wise on the kernel side it
shouldn't be significantly more involved. Maybe the compiler would
need to be a bit smarter when producing FDE data, but it's no rocket
science.

Thoughts?