linux-kernel - Re: [RFC v6 00/51] Large pages in the page cache

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20200614162650.GP8681@bombadil.infradead.org>
Date:   Sun, 14 Jun 2020 09:26:50 -0700
From:   Matthew Wilcox <willy@...radead.org>
To:     linux-fsdevel@...r.kernel.org
Cc:     linux-mm@...ck.org, linux-kernel@...r.kernel.org,
        Hugh Dickins <hughd@...gle.com>
Subject: Re: [RFC v6 00/51] Large pages in the page cache

On Wed, Jun 10, 2020 at 01:12:54PM -0700, Matthew Wilcox wrote:
> Another fortnight, another dump of my current large pages work.

The generic/127 test has pointed out to me that range writeback is
broken by this patchset.  Here's how (may not be exactly what's going on,
but it's close):

page cache allocates an order-2 page covering indices 40-43.
bytes are written, page is dirtied
test then calls fallocate(FALLOC_FL_COLLAPSE_RANGE) for a range which
starts in page 41.
XFS calls filemap_write_and_wait_range() which calls
__filemap_fdatawrite_range() which calls
do_writepages() which calls
iomap_writepages() which calls
write_cache_pages() which calls
tag_pages_for_writeback() which calls
xas_for_each_marked() starting at page 41.  Which doesn't find page
  41 because when we dirtied pages 40-43, we only marked index 40 as
  being dirty.

Annoyingly, the XArray actually handles this just fine ... if we were
using multi-order entries, we'd find it.  But we're still storing 2^N
entries for an order N page.

I can see two ways to fix this.  One is to bite the bullet and do the
conversion of the page cache to use multi-order entries.  The second
is to set and clear the marks on all entries.  I'm concerned about the
performance of the latter solution.  Not so bad for order-2 pages, but for
an order-9 page we have 520 bits to set, spread over 9 non-consecutive
cachelines.  Also, I'm unenthusiastic about writing code that I want to
throw away as quickly as possible.

So unless somebody has a really good alternative idea, I'm going to
convert the page cache over to multi-order entries.  This will have
several positive effects:

 - Get DAX and regular page cache using the xarray in a more similar way
 - Saves about 4.5kB of memory for every 2MB page in tmpfs/shmem
 - Prep work for converting hugetlbfs to use the page cache the same
   way as tmpfs