linux-kernel - Re: [PATCH 01/29] Revert "userfaultfd: don't fail on unrecognized features"

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJHvVcgiLbACcCr1O4ng7vrxC1Sok_HXDuzbvnVyAaeqGfdwuw@mail.gmail.com>
Date:   Fri, 31 Mar 2023 13:04:53 -0700
From:   Axel Rasmussen <axelrasmussen@...gle.com>
To:     Dmitry Safonov <0x7f454c46@...il.com>
Cc:     Peter Xu <peterx@...hat.com>, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org,
        Mike Kravetz <mike.kravetz@...cle.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Andrea Arcangeli <aarcange@...hat.com>,
        Mike Rapoport <rppt@...ux.vnet.ibm.com>,
        Nadav Amit <nadav.amit@...il.com>,
        Leonardo Bras Soares Passos <lsoaresp@...hat.com>,
        David Hildenbrand <david@...hat.com>,
        linux-stable <stable@...r.kernel.org>
Subject: Re: [PATCH 01/29] Revert "userfaultfd: don't fail on unrecognized features"

On Fri, Mar 31, 2023 at 11:08 AM Dmitry Safonov <0x7f454c46@...il.com> wrote:
>
> On Fri, 31 Mar 2023 at 17:52, Axel Rasmussen <axelrasmussen@...gle.com> wrote:
> >
> > On Thu, Mar 30, 2023 at 3:27 PM Peter Xu <peterx@...hat.com> wrote:
> > >
> > > On Thu, Mar 30, 2023 at 12:04:09PM -0700, Axel Rasmussen wrote:
> > > > On Thu, Mar 30, 2023 at 8:57 AM Peter Xu <peterx@...hat.com> wrote:
> > > > >
> > > > > This is a proposal to revert commit 914eedcb9ba0ff53c33808.
> > > > >
> > > > > I found this when writting a simple UFFDIO_API test to be the first unit
> > > > > test in this set.  Two things breaks with the commit:
> > > > >
> > > > >   - UFFDIO_API check was lost and missing.  According to man page, the
> > > > >   kernel should reject ioctl(UFFDIO_API) if uffdio_api.api != 0xaa.  This
> > > > >   check is needed if the api version will be extended in the future, or
> > > > >   user app won't be able to identify which is a new kernel.
> > > > >
> > > > >   - Feature flags checks were removed, which means UFFDIO_API with a
> > > > >   feature that does not exist will also succeed.  According to the man
> > > > >   page, we should (and it makes sense) to reject ioctl(UFFDIO_API) if
> > > > >   unknown features passed in.
>
> If features/flags are not checked in kernel, and the kernel doesn't
> return an error on
> an unknown flag/error, that makes the syscall non-extendable, meaning
> that adding
> any new feature may break existing software, which doesn't sanitize
> them properly.
> https://lwn.net/Articles/588444/

I don't think the same problem applies here. In the case of syscalls,
the problem is the only way the kernel can communicate is by the
EINVAL return value. Without the check, if a call succeeds the caller
can't tell: was the flag supported + applied, or unrecognized +
ignored?

With UFFDIO_API (we aren't talking about userfaultfd(2) itself), when
you pass in a set of flags, we return the subset of flags which were
enabled, in addition to the return code. So via that mechanism, one is
"able to check whether it is running on a kernel where [userfaultfd]
supports [the feature]" as the article describes - the only difference
is, the caller must check the returned set of features, instead of
checking for an error code. I don't think it's exactly *how* userspace
can check that's important, but rather *that* it can check.

Another important difference: I have a hard time imagining a case
where adding a new feature could break userspace, even with my
approach, but let's say for the sake of argument one arises in the
future. Unlike normal syscalls, we have the UFFD_API version check, so
we have the option of incrementing that to separate users relying on
the old behavior, from users willing to deal with the new behavior.

(Syscalls can kind of replicate this by adding a new syscall, like
clone() vs clone2(), but I think that's messier than the API version
check being built-in to the API.)

>
> See a bunch of painful exercises from syscalls with numbers in the end:
> https://lwn.net/Articles/792628/
> To adding an additional setsockopt() because an old one didn't have
> sanity checks for flags:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=8917a777be3b
> (not the best example, as the new setsockopt() didn't check flags for
> sanity as well (sic!),
> but that's near the code I work on now)
>
> This is even documented nowadays:
> https://www.kernel.org/doc/html/latest/process/adding-syscalls.html#designing-the-api-planning-for-extension
>
> ...and everyone knows what happens when you blame userspace for breaking by
> not doing what you would have expected it to do:
> https://lkml.org/lkml/2012/12/23/75

100% agreed. :)

>
> [..]
> > > There's one reason that we may consider keeping the behavior.  IMHO it is
> > > when there're major softwares that uses the "wrong" ABI (let's say so;
> > > because it's not following the man pages).  If you're aware any such major
> > > softwares (especially open sourced) will break due to this revert patch,
> > > please shoot.
> >
> > Well, I did find one example, criu:
> > https://github.com/checkpoint-restore/criu/blob/criu-dev/criu/uffd.c#L266
>
> Mike can speak better than me about uffd, but AFAICS, CRIU correctly detects
> features with kerneldat/kdat:
> https://github.com/checkpoint-restore/criu/blob/criu-dev/criu/kerndat.c#L1235

Ah, right, this is the simplest case where no optional features are
asked for. So, it's not a great example; this particular case would
look the same regardless of what the kernel does.

>
> So, doing a sane thing in kernel shouldn't break CRIU (at least here).
>
> Thanks,
>              Dmitry