lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20150811093708.GB906@dastard>
Date:	Tue, 11 Aug 2015 19:37:08 +1000
From:	Dave Chinner <david@...morbit.com>
To:	Jan Kara <jack@...e.cz>
Cc:	"Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Matthew Wilcox <matthew.r.wilcox@...el.com>,
	linux-mm@...ck.org, linux-fsdevel@...r.kernel.org,
	linux-kernel@...r.kernel.org, Davidlohr Bueso <dbueso@...e.de>
Subject: Re: [PATCH, RFC 2/2] dax: use range_lock instead of i_mmap_lock

On Tue, Aug 11, 2015 at 10:19:09AM +0200, Jan Kara wrote:
> On Mon 10-08-15 18:14:24, Kirill A. Shutemov wrote:
> > As we don't have struct pages for DAX memory, Matthew had to find an
> > replacement for lock_page() to avoid fault vs. truncate races.
> > i_mmap_lock was used for that.
> > 
> > Recently, Matthew had to convert locking to exclusive to address fault
> > vs. fault races. And this kills scalability completely.

I'm assuming this locking change is in a for-next git tree branch
somewhere as there isn't anything that I can see in a 4.2-rc6
tree. Can you point me to the git tree that has these changes in it?

> > The patch below tries to recover some scalability for DAX by introducing
> > per-mapping range lock.
> 
> So this grows noticeably (3 longs if I'm right) struct address_space and
> thus struct inode just for DAX. That looks like a waste but I don't see an
> easy solution.
> 
> OTOH filesystems in normal mode might want to use the range lock as well to
> provide truncate / punch hole vs page fault exclusion (XFS already has a
> private rwsem for this and ext4 needs something as well) and at that point
> growing generic struct inode would be acceptable for me.

It sounds to me like the way DAX has tried to solve this race is the
wrong direction. We really need to drive the truncate/page fault
serialisation higher up the stack towards the filesystem, not deeper
into the mm subsystem where locking is greatly limited.

As Jan mentions, we already have this serialisation in XFS, and I
think it would be better first step to replicate that locking model
in each filesystem that is supports DAX. I think this is a better
direction because it moves towards solving a whole class of problems
fileystem face with page fault serialisation, not just for DAX.

> My grand plan was to use the range lock to also simplify locking
> rules for read, write and esp. direct IO but that has issues with
> mmap_sem ordering because filesystems get called under mmap_sem in
> page fault path. So probably just fixing the worst issue with
> punch hole vs page fault would be good for now.

Right, I think adding a rwsem to the ext4 inode to handle the
fault/truncate serialisation similar to XFS would be sufficient to
allow DAX to remove the i_mmap_lock serialisation...

> Also for a patch set like this, it would be good to show some numbers - how
> big hit do you take in the single-threaded case (the lock is more
> expensive) and how much scalability do you get in the multithreaded case?

And also, if you remove the serialisation and run the test on XFS,
what do you get in terms of performance and correctness?

Cheers,

Dave.
-- 
Dave Chinner
david@...morbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ