linux-kernel - Re: [v2 PATCH] mm: shmem: allow split THP when truncating THP partially

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <alpine.LSU.2.11.2002261656300.1381@eggly.anvils>
Date:   Wed, 26 Feb 2020 17:37:22 -0800 (PST)
From:   Hugh Dickins <hughd@...gle.com>
To:     Yang Shi <yang.shi@...ux.alibaba.com>
cc:     Hugh Dickins <hughd@...gle.com>, kirill.shutemov@...ux.intel.com,
        aarcange@...hat.com, akpm@...ux-foundation.org, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org
Subject: Re: [v2 PATCH] mm: shmem: allow split THP when truncating THP
 partially

On Wed, 26 Feb 2020, Yang Shi wrote:
> On 2/24/20 7:46 PM, Hugh Dickins wrote:
> > 
> > I did willingly call my find_get_entries() stopping at PageTransCompound
> > a hack; but now think we should just document that behavior and accept it.
> > The contortions of your patch come from the need to release those 14 extra
> > unwanted references: much better not to get them in the first place.
> > 
> > Neither of us handle a failed split optimally, we treat every tail as an
> > opportunity to retry: which is good to recover from transient failures,
> > but probably excessive.  And we both have to restart the pagevec after
> > each attempt, but at least I don't get 14 unwanted extras each time.
> > 
> > What of other find_get_entries() users and its pagevec_lookup_entries()
> > wrapper: does an argument to select this "stop at PageTransCompound"
> > behavior need to be added?
> > 
> > No.  The pagevec_lookup_entries() calls from mm/truncate.c prefer the
> > new behavior - evicting the head from page cache removes all the tails
> > along with it, so getting the tails a waste of time there too, just as
> > it was in shmem_undo_range().
> 
> TBH I'm not a fun of this hack. This would bring in other confusion or
> complexity. Pagevec is supposed to count in the number of base page, now it
> would treat THP as one page, and there might be mixed base page and THP in
> one pagevec.

I agree that it would be horrid if find_get_entries() and
pagevec_lookup_entries() were switched to returning just one page
for a THP, demanding all callers to cope with its huge size along
with the small sizes of other pages in the vector.  I don't know how
to get such an interface to work at all: it's essential to be able
to deliver tail pages from a requested offset in the compound page.

No, that's not what the find_get_entries() modification does: it
takes advantage of the fact that no caller expects it to guarantee
a full pagevec, so terminates the pagevec early when it encounters
any head or tail subpage of the compound page.  Then the next call
to it (if caller does not have code to skip the extent - which
removal of head from page cache does) returns just the next tail,
etc, until all have been delivered.  All as small pages.

(Aside from the comments, I have made one adjustment to what I
showed before: though it appears now that hugetlbfs happens not
to use pagevec_lookup_entries(), not directly anyway, I'm more
comfortable checking PageTransHuge && !PageHuge, so that it would
not go one-at-a-time on hugetlbfs pages.  But found I was wrong
earlier when I said the "page->index + HPAGE_PMD_NR <= end" test
needed correcting for 32-bit: it's working on PageHead there, so
there's no chance of that "+ HPAGE_PMD_NR" wrapping around: left
unchanged, what it's doing is clearer that way than with macros.)

Hugh

> But, I tend to agree avoiding getting those 14 extra pins at the
> first place might be a better approach. All the complexity are used to
> release those extra pins.