lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAEf4BzbLt3b8xH3eSvRJdnorZvQfWzOFeV-gYRxDmaS6YVba2A@mail.gmail.com>
Date: Fri, 1 Nov 2024 11:38:47 -0700
From: Andrii Nakryiko <andrii.nakryiko@...il.com>
To: Indu Bhagat <indu.bhagat@...cle.com>
Cc: Josh Poimboeuf <jpoimboe@...nel.org>, x86@...nel.org, 
	Peter Zijlstra <peterz@...radead.org>, Steven Rostedt <rostedt@...dmis.org>, 
	Ingo Molnar <mingo@...nel.org>, Arnaldo Carvalho de Melo <acme@...nel.org>, linux-kernel@...r.kernel.org, 
	Mark Rutland <mark.rutland@....com>, 
	Alexander Shishkin <alexander.shishkin@...ux.intel.com>, Jiri Olsa <jolsa@...nel.org>, 
	Namhyung Kim <namhyung@...nel.org>, Ian Rogers <irogers@...gle.com>, 
	Adrian Hunter <adrian.hunter@...el.com>, linux-perf-users@...r.kernel.org, 
	Mark Brown <broonie@...nel.org>, linux-toolchains@...r.kernel.org, 
	Jordan Rome <jordalgo@...a.com>, Sam James <sam@...too.org>, linux-trace-kernel@...r.kerne.org, 
	Jens Remus <jremus@...ux.ibm.com>, Mathieu Desnoyers <mathieu.desnoyers@...icios.com>, 
	Florian Weimer <fweimer@...hat.com>, Andy Lutomirski <luto@...nel.org>, bpf <bpf@...r.kernel.org>
Subject: Re: [PATCH v3 09/19] unwind: Introduce sframe user space unwinding

On Thu, Oct 31, 2024 at 2:38 PM Indu Bhagat <indu.bhagat@...cle.com> wrote:
>
> On 10/31/24 1:57 PM, Andrii Nakryiko wrote:
> > On Tue, Oct 29, 2024 at 10:53 PM Josh Poimboeuf <jpoimboe@...nel.org> wrote:
> >>
> >> On Tue, Oct 29, 2024 at 04:32:40PM -0700, Andrii Nakryiko wrote:
> >>> It feels like this patch is trying to do too much. There is both new
> >>> UAPI introduction, and SFrame format definition, and unwinder
> >>> integration, etc, etc. Do you think it can be split further into more
> >>> focused smaller patches?
> >>
> >> True, let me see if I can split it up.
> >>
> >>>> +
> >>>> +                       if ((eppnt->p_flags & PF_X) && k < start_code)
> >>>> +                               start_code = k;
> >>>> +
> >>>> +                       if ((eppnt->p_flags & PF_X) && k + eppnt->p_filesz > end_code)
> >>>> +                               end_code = k + eppnt->p_filesz;
> >>>> +                       break;
> >>>> +               }
> >>>> +               case PT_GNU_SFRAME:
> >>>> +                       sframe_phdr = eppnt;
> >>>
> >>> if I understand correctly, there has to be only one sframe, is that
> >>> right? Should we validate that?
> >>
> >> Yes, there shouldn't be more than one PT_GNU_SFRAME for the executable
> >> itself.  I can validate that.
> >>
> >>>> +                       break;
> >>>>                  }
> >>>>          }
> >>>>
> >>>> +       if (sframe_phdr)
> >>>> +               sframe_add_section(load_addr + sframe_phdr->p_vaddr,
> >>>> +                                  start_code, end_code);
> >>>> +
> >>>
> >>> no error checking?
> >>
> >> Good point.  I remember discussing this with some people at Cauldon/LPC,
> >> I just forgot to do it!
> >>
> >> Right now it does all the validation at unwind, which could really slow
> >> things down unnecessarily if the sframe isn't valid.
> >>
> >>>> +#ifdef CONFIG_HAVE_UNWIND_USER_SFRAME
> >>>> +
> >>>> +#define INIT_MM_SFRAME .sframe_mt = MTREE_INIT(sframe_mt, 0),
> >>>> +
> >>>> +extern void sframe_free_mm(struct mm_struct *mm);
> >>>> +
> >>>> +/* text_start, text_end, file_name are optional */
> >>>
> >>> what file_name? was that an extra argument that got removed?
> >>
> >> Indeed, that was for some old code.
> >>
> >>>>          case PR_RISCV_SET_ICACHE_FLUSH_CTX:
> >>>>                  error = RISCV_SET_ICACHE_FLUSH_CTX(arg2, arg3);
> >>>>                  break;
> >>>> +       case PR_ADD_SFRAME:
> >>>> +               if (arg5)
> >>>> +                       return -EINVAL;
> >>>> +               error = sframe_add_section(arg2, arg3, arg4);
> >>>
> >>> wouldn't it be better to make this interface extendable from the get
> >>> go? Instead of passing 3 arguments with fixed meaning, why not pass a
> >>> pointer to an extendable binary struct like seems to be the trend
> >>> nowadays with nicely extensible APIs. See [0] for one such example
> >>> (specifically, struct procmap_query). Seems more prudent, as we'll
> >>> most probably will be adding flags, options, extra information, etc)
> >>>
> >>>    [0] https://lore.kernel.org/linux-mm/20240627170900.1672542-3-andrii@kernel.org/
> >>
> >> This ioctl interface was admittedly hacked together.  I was hoping
> >> somebody would suggest something better :-)  I'll take a look.
> >>
> >>>> +static int find_fde(struct sframe_section *sec, unsigned long ip,
> >>>> +                   struct sframe_fde *fde)
> >>>> +{
> >>>> +       struct sframe_fde __user *first, *last, *found = NULL;
> >>>> +       u32 ip_off, func_off_low = 0, func_off_high = -1;
> >>>> +
> >>>> +       ip_off = ip - sec->sframe_addr;
> >>>
> >>> what if ip_off is larger than 4GB? ELF section can be bigger than 4GB, right?
> >>
> >> That's baked into sframe v2.
> >
> > I believe we do have large production binaries with more than 4GB of
> > text, what are we going to do about them? It would be interesting to
> > hear sframe people's opinion. Adding such a far-reaching new format in
> > 2024 with these limitations is kind of sad. At the very least maybe we
> > should allow some form of chaining sframe definitions to cover more
> > than 4GB segments? Please CC relevant folks, I'm wondering what
> > they're thinking about this.
> >
>
> SFrame V2 does have that limitation. We can try to have 64-bit
> representation for the 'ip' in the SFrame FDE and conditionalize it
> somehow (say, with a flag in the header) so as to not bloat the majority
> of applications.

Hi Indu,

I think that's prudent if we believe that SFrame is the solution here.
See my reply to Josh. Real-world already approach 4GB limits, and
things are not going to shrink in the years to come. So yeah, probably
we need some adjustments to the format to at least allow 64-bit
offsets (though trying to stick to 32-bit as much as possible, of
course, if they work).

I'm not really familiar with the nuances of the format just yet, so
can't really provide anything more useful at this point. What would be
the sort of gold reference for Sframe format to familiarize myself
thoroughly?

BTW, I wanted to ask. Are there any plans to add SFrame support to
Clang as well? It feels like without that there is no future for
SFrame as a general-purpose solution for stack traces.

>
> >>
> >>> and also, does it mean that SFrame doesn't support executables with
> >>> text bigger than 4GB?
> >>
> >> Yes, but is that a realistic concern?
> >
> > See above, yes. You'd be surprised. As somewhat corroborating
> > evidence, there were tons of problems and churn (within at least Meta)
> > with DWARF not supporting more than 2GB sizes, so yes, this is not an
> > abstract problem for sure. Modern production applications can be
> > ridiculously big.
> >
> >>
> >>>> +       } else {
> >>>> +               struct vm_area_struct *vma, *text_vma = NULL;
> >>>> +               VMA_ITERATOR(vmi, mm, 0);
> >>>> +
> >>>> +               for_each_vma(vmi, vma) {
> >>>> +                       if (vma->vm_file != sframe_vma->vm_file ||
> >>>> +                           !(vma->vm_flags & VM_EXEC))
> >>>> +                               continue;
> >>>> +
> >>>> +                       if (text_vma) {
> >>>> +                               pr_warn_once("%s[%d]: multiple EXEC segments unsupported\n",
> >>>> +                                            current->comm, current->pid);
> >>>
> >>> is this just something that fundamentally can't be supported by SFrame
> >>> format? Or just an implementation simplification?
> >>
> >> It's a simplification I suppose.
> >
> > That's a rather random limitation, IMO... How hard would it be to not
> > make that assumption?
> >
> >>
> >>> It's not illegal to have an executable with multiple VM_EXEC segments,
> >>> no? Should this be a pr_warn_once() then?
> >>
> >> I don't know, is it allowed?  I've never seen it in practice.  The
> >
> > I'm pretty sure you can do that with a custom linker script, at the
> > very least. Normally this probably won't happen, but I don't think
> > Linux dictates how many executable VMAs an application can have. And
> > it probably just naturally happens for JIT-ted applications (Java, Go,
> > etc).
> >
> > Linux kernel itself has two executable segments, for instance (though
> > kernel is special, of course, but still).
> >
> >> pr_warn_once() is not reporting that it's illegal but rather that this
> >> corner case actually exists and maybe needs to be looked at.
> >
> > This warn() will be logged across millions of machines in the fleet,
> > triggering alarms, people looking at this, making custom internal
> > patches to disable the known-to-happen warn. Why do we need all this?
> > This is an issue that is trivial to trigger by user process that's not
> > doing anything illegal. Why?
> >
> >>
> >> --
> >> Josh
>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ