lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YZ6idVy3zqQC4atv@arm.com>
Date:   Wed, 24 Nov 2021 20:37:09 +0000
From:   Catalin Marinas <catalin.marinas@....com>
To:     Matthew Wilcox <willy@...radead.org>
Cc:     Linus Torvalds <torvalds@...ux-foundation.org>,
        Josef Bacik <josef@...icpanda.com>,
        David Sterba <dsterba@...e.com>,
        Andreas Gruenbacher <agruenba@...hat.com>,
        Al Viro <viro@...iv.linux.org.uk>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Will Deacon <will@...nel.org>, linux-fsdevel@...r.kernel.org,
        linux-kernel@...r.kernel.org, linux-arm-kernel@...ts.infradead.org,
        linux-btrfs@...r.kernel.org
Subject: Re: [PATCH 3/3] btrfs: Avoid live-lock in search_ioctl() on hardware
 with sub-page faults

On Wed, Nov 24, 2021 at 08:03:58PM +0000, Matthew Wilcox wrote:
> On Wed, Nov 24, 2021 at 07:20:24PM +0000, Catalin Marinas wrote:
> > +++ b/fs/btrfs/ioctl.c
> > @@ -2223,7 +2223,8 @@ static noinline int search_ioctl(struct inode *inode,
> >  
> >  	while (1) {
> >  		ret = -EFAULT;
> > -		if (fault_in_writeable(ubuf + sk_offset, *buf_size - sk_offset))
> > +		if (fault_in_exact_writeable(ubuf + sk_offset,
> > +					     *buf_size - sk_offset))
> >  			break;
> >  
> >  		ret = btrfs_search_forward(root, &key, path, sk->min_transid);
> 
> Couldn't we avoid all of this nastiness by doing ...

I had a similar attempt initially but I concluded that it doesn't work:

https://lore.kernel.org/r/YS40qqmXL7CMFLGq@arm.com

> @@ -2121,10 +2121,9 @@ static noinline int copy_to_sk(struct btrfs_path *path,
>                  * problem. Otherwise we'll fault and then copy the buffer in
>                  * properly this next time through
>                  */
> -               if (copy_to_user_nofault(ubuf + *sk_offset, &sh, sizeof(sh))) {
> -                       ret = 0;
> +               ret = __copy_to_user_nofault(ubuf + *sk_offset, &sh, sizeof(sh));
> +               if (ret)

There is no requirement for the arch implementation to be exact and copy
the maximum number of bytes possible. It can fail early while there are
still some bytes left that would not fault. The only requirement is that
if it is restarted from where it faulted, it makes some progress (on
arm64 there is one extra byte).

>                         goto out;
> -               }
>  
>                 *sk_offset += sizeof(sh);
> @@ -2196,6 +2195,7 @@ static noinline int search_ioctl(struct inode *inode,
>         int ret;
>         int num_found = 0;
>         unsigned long sk_offset = 0;
> +       unsigned long next_offset = 0;
>  
>         if (*buf_size < sizeof(struct btrfs_ioctl_search_header)) {
>                 *buf_size = sizeof(struct btrfs_ioctl_search_header);
> @@ -2223,7 +2223,8 @@ static noinline int search_ioctl(struct inode *inode,
>  
>         while (1) {
>                 ret = -EFAULT;
> -               if (fault_in_writeable(ubuf + sk_offset, *buf_size - sk_offset))
> +               if (fault_in_writeable(ubuf + sk_offset + next_offset,
> +                                       *buf_size - sk_offset - next_offset))
>                         break;
>  
>                 ret = btrfs_search_forward(root, &key, path, sk->min_transid);
> @@ -2235,11 +2236,12 @@ static noinline int search_ioctl(struct inode *inode,
>                 ret = copy_to_sk(path, &key, sk, buf_size, ubuf,
>                                  &sk_offset, &num_found);
>                 btrfs_release_path(path);
> -               if (ret)
> +               if (ret > 0)
> +                       next_offset = ret;

So after this point, ubuf+sk_offset+next_offset is writeable by
fault_in_writable(). If copy_to_user() was attempted on
ubuf+sk_offset+next_offset, all would be fine, but copy_to_sk() restarts
the copy from ubuf+sk_offset, so it returns exacting the same ret as in
the previous iteration.

-- 
Catalin

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ