lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20181025135849.bu3cmjnrvz5yysye@macbook-pro-91.dhcp.thefacebook.com>
Date:   Thu, 25 Oct 2018 09:58:51 -0400
From:   Josef Bacik <josef@...icpanda.com>
To:     Jan Kara <jack@...e.cz>
Cc:     Josef Bacik <josef@...icpanda.com>, kernel-team@...com,
        hannes@...xchg.org, linux-kernel@...r.kernel.org, tj@...nel.org,
        david@...morbit.com, akpm@...ux-foundation.org,
        linux-fsdevel@...r.kernel.org, linux-btrfs@...r.kernel.org,
        riel@...com, linux-mm@...ck.org
Subject: Re: [PATCH 7/7] btrfs: drop mmap_sem in mkwrite for btrfs

On Thu, Oct 25, 2018 at 03:22:30PM +0200, Jan Kara wrote:
> On Thu 18-10-18 16:23:18, Josef Bacik wrote:
> > ->page_mkwrite is extremely expensive in btrfs.  We have to reserve
> > space, which can take 6 lifetimes, and we could possibly have to wait on
> > writeback on the page, another several lifetimes.  To avoid this simply
> > drop the mmap_sem if we didn't have the cached page and do all of our
> > work and return the appropriate retry error.  If we have the cached page
> > we know we did all the right things to set this page up and we can just
> > carry on.
> > 
> > Signed-off-by: Josef Bacik <josef@...icpanda.com>
> ...
> > @@ -8828,6 +8830,29 @@ vm_fault_t btrfs_page_mkwrite(struct vm_fault *vmf)
> >  
> >  	reserved_space = PAGE_SIZE;
> >  
> > +	/*
> > +	 * We have our cached page from a previous mkwrite, check it to make
> > +	 * sure it's still dirty and our file size matches when we ran mkwrite
> > +	 * the last time.  If everything is OK then return VM_FAULT_LOCKED,
> > +	 * otherwise do the mkwrite again.
> > +	 */
> > +	if (vmf->flags & FAULT_FLAG_USED_CACHED) {
> > +		lock_page(page);
> > +		if (vmf->cached_size == i_size_read(inode) &&
> > +		    PageDirty(page))
> > +			return VM_FAULT_LOCKED;
> > +		unlock_page(page);
> > +	}
> 
> I guess this is similar to Dave's comment: Why is i_size so special? What
> makes sure that file didn't get modified between time you've prepared
> cached_page and now such that you need to do the preparation again?
> And if indeed metadata prepared for a page cannot change, what's so special
> about it being that particular cached_page?
> 
> Maybe to phrase my objections differently: Your preparations in
> btrfs_page_mkwrite() are obviously related to your filesystem metadata. So
> why cannot you infer from that metadata (extent tree, whatever - I'd use
> extent status tree in ext4) whether that particular file+offset is already
> prepared for writing and just bail out with success in that case?
> 

I was just being overly paranoid, I was afraid of the case where we would
truncate and then extend in between, but now that I actually think about it that
would end up with the page not being on the mapping anymore so we would catch
that case.  I've dropped this part from my current version.  I'm getting some
testing on these patches in production and I'll post them sometime next week
once I'm happy with them.  Thanks,

Josef

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ