[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CANpmjNNeH7+7H3y-5BCNGx+Yo11HG-F3M5TLqCAXd11Up5PTWA@mail.gmail.com>
Date: Thu, 29 Apr 2021 20:46:48 +0200
From: Marco Elver <elver@...gle.com>
To: "Eric W. Biederman" <ebiederm@...ssion.com>
Cc: Florian Weimer <fweimer@...hat.com>,
"David S. Miller" <davem@...emloft.net>,
Arnd Bergmann <arnd@...db.de>,
Peter Zijlstra <peterz@...radead.org>,
Ingo Molnar <mingo@...nel.org>,
Thomas Gleixner <tglx@...utronix.de>,
Peter Collingbourne <pcc@...gle.com>,
Dmitry Vyukov <dvyukov@...gle.com>,
Alexander Potapenko <glider@...gle.com>,
sparclinux@...r.kernel.org,
linux-arch <linux-arch@...r.kernel.org>,
LKML <linux-kernel@...r.kernel.org>, linux-api@...r.kernel.org,
kasan-dev <kasan-dev@...glegroups.com>
Subject: Re: siginfo_t ABI break on sparc64 from si_addr_lsb move 3y ago
On Thu, 29 Apr 2021 at 19:24, Eric W. Biederman <ebiederm@...ssion.com> wrote:
[...]
> > Granted, nobody seems to have noticed because I don't even know if these
> > fields have use on sparc64. But I don't yet see this as justification to
> > leave things as-is...
> >
> > The collateral damage of this, and the acute problem that I'm having is
> > defining si_perf in a sort-of readable and portable way in siginfo_t
> > definitions that live outside the kernel, where sparc64 does not yet
> > have broken si_addr_lsb. And the same difficulty applies to the kernel
> > if we want to unbreak sparc64, while not wanting to move si_perf for
> > other architectures.
> >
> > There are 2 options I see to solve this:
> >
> > 1. Make things simple again. We could just revert the change moving
> > si_addr_lsb into the union, and sadly accept we'll have to live with
> > that legacy "design" mistake. (si_perf stays in the union, but will
> > unfortunately change its offset for all architectures... this one-off
> > move might be ok because it's new.)
> >
> > 2. Add special cases to retain si_addr_lsb in the union on architectures
> > that do not have __ARCH_SI_TRAPNO (the majority). I have added a
> > draft patch that would do this below (with some refactoring so that
> > it remains sort-of readable), as an experiment to see how complicated
> > this gets.
> >
> > Which option do you prefer? Are there better options?
>
> Personally the most important thing to have is a single definition
> shared by all architectures so that we consolidate testing.
>
> A little piece of me cries a little whenever I see how badly we
> implemented the POSIX design. As specified by POSIX the fields can be
> place in siginfo such that 32bit and 64bit share a common definition.
> Unfortunately we did not addpadding after si_addr on 32bit to
> accommodate a 64bit si_addr.
I think it's even worse than that, see the fun I had with siginfo last
week: https://lkml.kernel.org/r/20210422191823.79012-1-elver@google.com
... because of the 3 initial ints and no padding after them, we can't
portably add __u64 fields to siginfo, and are forever forced to have
subtly different behaviour between 32-bit and 64-bit architectures.
:-/
> I find it unfortunate that we are adding yet another definition that
> requires translation between 32bit and 64bit, but I am glad
> that at least the translation is not architecture specific. That common
> definition is what has allowed this potential issue to be caught
> and that makes me very happy to see.
>
> Let's go with Option 3.
>
> Confirm BUS_MCEERR_AR, BUS_MCEERR_AO, SEGV_BNDERR, SEGV_PKUERR are not
> in use on any architecture that defines __ARCH_SI_TRAPNO, and then fixup
> the userspace definitions of these fields.
>
> To the kernel I would add some BUILD_BUG_ON's to whatever the best
> maintained architecture (sparc64?) that implements __ARCH_SI_TRAPNO just
> to confirm we don't create future regressions by accident.
>
> I did a quick search and the architectures that define __ARCH_SI_TRAPNO
> are sparc, mips, and alpha. All have 64bit implementations. A further
> quick search shows that none of those architectures have faults that
> use BUS_MCEERR_AR, BUS_MCEERR_AO, SEGV_BNDERR, SEGV_PKUERR, nor do
> they appear to use mm/memory-failure.c
>
> So it doesn't look like we have an ABI regression to fix.
That sounds fine to me -- my guess was that they're not used on these
architectures, but I just couldn't make that call.
I have patches adding compile-time asserts for sparc64, arm, arm64
ready to go. I'll send them after some more testing.
Thanks,
-- Marco
Powered by blists - more mailing lists