lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20211127123958.588350-1-agruenba@redhat.com>
Date:   Sat, 27 Nov 2021 13:39:58 +0100
From:   Andreas Gruenbacher <agruenba@...hat.com>
To:     Catalin Marinas <catalin.marinas@....com>
Cc:     Andreas Gruenbacher <agruenba@...hat.com>,
        Matthew Wilcox <willy@...radead.org>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Josef Bacik <josef@...icpanda.com>,
        David Sterba <dsterba@...e.com>,
        Al Viro <viro@...iv.linux.org.uk>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Will Deacon <will@...nel.org>,
        linux-fsdevel <linux-fsdevel@...r.kernel.org>,
        LKML <linux-kernel@...r.kernel.org>,
        Linux ARM <linux-arm-kernel@...ts.infradead.org>,
        linux-btrfs <linux-btrfs@...r.kernel.org>
Subject: Re: [PATCH 3/3] btrfs: Avoid live-lock in search_ioctl() on hardware with sub-page faults

On Sat, Nov 27, 2021 at 4:52 AM Andreas Gruenbacher <agruenba@...hat.com> wrote:
> On Sat, Nov 27, 2021 at 12:06 AM Catalin Marinas <catalin.marinas@....com> wrote:
> > On Fri, Nov 26, 2021 at 11:29:45PM +0100, Andreas Gruenbacher wrote:
> > > On Thu, Nov 25, 2021 at 11:42 PM Catalin Marinas <catalin.marinas@....com> wrote:
> > > > As per Linus' reply, we can work around this by doing
> > > > a sub-page fault_in_writable(point_of_failure, align) where 'align'
> > > > should cover the copy_to_user() impreciseness.
> > > >
> > > > (of course, fault_in_writable() takes the full size argument but behind
> > > > the scene it probes the 'align' prefix at sub-page fault granularity)
> > >
> > > That doesn't make sense; we don't want fault_in_writable() to fail or
> > > succeed depending on the alignment of the address range passed to it.
> >
> > If we know that the arch copy_to_user() has an error of say maximum 16
> > bytes (or 15 rather on arm64), we can instead get fault_in_writeable()
> > to probe the first 16 bytes rather than 1.
>
> That isn't going to help one bit: [raw_]copy_to_user() is allowed to
> copy as little or as much as it wants as long as it follows the rules
> documented in include/linux/uaccess.h:
>
> [] If copying succeeds, the return value must be 0.  If some data cannot be
> [] fetched, it is permitted to copy less than had been fetched; the only
> [] hard requirement is that not storing anything at all (i.e. returning size)
> [] should happen only when nothing could be copied.  In other words, you don't
> [] have to squeeze as much as possible - it is allowed, but not necessary.
>
> When fault_in_writeable() tells us that an address range is accessible
> in principle, that doesn't mean that copy_to_user() will allow us to
> access it in arbitrary chunks. It's also not the case that
> fault_in_writeable(addr, size) is always followed by
> copy_to_user(addr, ..., size) for the exact same address range, not
> even in this case.
>
> These alignment restrictions have nothing to do with page or sub-page faults.
>
> I'm also fairly sure that passing in an unaligned buffer will send
> search_ioctl into an endless loop on architectures with copy_to_user()
> alignment restrictions; there don't seem to be any buffer alignment
> checks.

Let me retract that ...

The description in include/linux/uaccess.h leaves out permissible
reasons for fetching/storing less than requested. Thinking about it, if
the address range passed to one of the copy functions includes an
address that faults, it kind of makes sense to allow the copy function
to stop short instead of copying every last byte right up to the address
that fails.

If that's the only reason, then it would be great to have that included
in the description.  And then we can indeed deal with the alignment
effects in fault_in_writeable().

> > > copy_to_user_nofault_unaligned() should be further optimized, maybe as
> > > mm/maccess.c:copy_from_kernel_nofault() and/or per architecture
> > > depending on the actual alignment rules; I'm not sure.
> > [...]
> > > --- a/fs/btrfs/ioctl.c
> > > +++ b/fs/btrfs/ioctl.c
> > > @@ -2051,13 +2051,30 @@ static noinline int key_in_sk(struct btrfs_key *key,
> > >       return 1;
> > >  }
> > >
> > > +size_t copy_to_user_nofault_unaligned(void __user *to, void *from, size_t size)
> > > +{
> > > +     size_t rest = copy_to_user_nofault(to, from, size);
> > > +
> > > +     if (rest) {
> > > +             size_t n;
> > > +
> > > +             for (n = size - rest; n < size; n++) {
> > > +                     if (copy_to_user_nofault(to + n, from + n, 1))
> > > +                             break;
> > > +             }
> > > +             rest = size - n;
> > > +     }
> > > +     return rest;
> >
> > That's what I was trying to avoid. That's basically a fall-back to byte
> > at a time copy (we do this in copy_mount_options(); at some point we
> > even had a copy_from_user_exact() IIRC).
>
> We could try 8/4/2 byte chunks if both buffers are 8/4/2-byte aligned.
> It's just not clear that it's worth it.
>
> > Linus' idea (if I got it correctly) was instead to slightly extend the
> > probing in fault_in_writeable() for the beginning of the buffer from 1
> > byte to some per-arch range.
> >
> > I attempted the above here and works ok:
> >
> > https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git/log/?h=devel/btrfs-live-lock-fix
> >
> > but too late to post it this evening, I'll do it in the next day or so
> > as an alternative to this series.

I've taken a quick look.  Under the assumption that alignment effects
are tied to page / sub-page faults, I think we can really solve this
generically as Willy has proposed.  Maybe as shown below; no need for
arch-specific code.

Thanks,
Andreas

diff --git a/mm/gup.c b/mm/gup.c
index 2c51e9748a6a..a9b3d916b625 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -1658,6 +1658,8 @@ static long __get_user_pages_locked(struct mm_struct *mm, unsigned long start,
 }
 #endif /* !CONFIG_MMU */
 
+#define SUBPAGE_FAULT_SIZE 16
+
 /**
  * fault_in_writeable - fault in userspace address range for writing
  * @uaddr: start of address range
@@ -1673,8 +1675,19 @@ size_t fault_in_writeable(char __user *uaddr, size_t size)
 	if (unlikely(size == 0))
 		return 0;
 	if (!PAGE_ALIGNED(uaddr)) {
+		if (SUBPAGE_FAULT_SIZE &&
+		    !IS_ALIGNED((unsigned long)uaddr, SUBPAGE_FAULT_SIZE)) {
+			end = PTR_ALIGN(uaddr, SUBPAGE_FAULT_SIZE);
+			if (end - uaddr < size) {
+				if (unlikely(__put_user(0, uaddr) != 0))
+					return size;
+				uaddr = end;
+				if (unlikely(!end))
+					goto out;
+			}
+		}
 		if (unlikely(__put_user(0, uaddr) != 0))
-			return size;
+			goto out;
 		uaddr = (char __user *)PAGE_ALIGN((unsigned long)uaddr);
 	}
 	end = (char __user *)PAGE_ALIGN((unsigned long)start + size);

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ