linux-kernel - Re: [PATCH v3 7/8] mm/mempolicy: use a standard migration target allocation callback

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <alpine.LSU.2.11.2010091521460.12013@eggly.anvils>
Date:   Fri, 9 Oct 2020 15:23:03 -0700 (PDT)
From:   Hugh Dickins <hughd@...gle.com>
To:     Mike Kravetz <mike.kravetz@...cle.com>
cc:     Hugh Dickins <hughd@...gle.com>, Qian Cai <cai@....pw>,
        js1304@...il.com, Andrew Morton <akpm@...ux-foundation.org>,
        linux-mm@...ck.org, linux-kernel@...r.kernel.org,
        kernel-team@....com, Vlastimil Babka <vbabka@...e.cz>,
        Christoph Hellwig <hch@...radead.org>,
        Roman Gushchin <guro@...com>,
        Naoya Horiguchi <n-horiguchi@...jp.nec.com>,
        Michal Hocko <mhocko@...e.com>,
        Joonsoo Kim <iamjoonsoo.kim@....com>
Subject: Re: [PATCH v3 7/8] mm/mempolicy: use a standard migration target
 allocation callback

On Fri, 9 Oct 2020, Mike Kravetz wrote:
> On 10/8/20 10:50 PM, Hugh Dickins wrote:
> > 
> > It's a problem I've faced before in tmpfs, keeping a hold on the
> > mapping while page lock is dropped.  Quite awkward: igrab() looks as
> > if it's the right thing to use, but turns out to give no protection
> > against umount.  Last time around, I ended up with a stop_eviction
> > count in the shmem inode, which shmem_evict_inode() waits on if
> > necessary.  Something like that could be done for hugetlbfs too,
> > but I'd prefer to do it without adding extra, if there is a way.
> 
> Thanks.

I failed to come up with anything neater than a stop_eviction count
in the hugetlbfs inode.

But that's not unlike a very special purpose rwsem added into the
hugetlbfs inode: and now that you're reconsidering how i_mmap_rwsem
got repurposed, perhaps you will end up with an rwsem of your own in
the hugetlbfs inode.

So I won't distract you with a stop_eviction patch unless you ask:
that's easy, what you're looking into is hard - good luck!

Hugh

> >>
> >> As mentioned above, I hope all this can be removed.
> > 
> > If you continue to nest page lock inside i_mmap_rwsem for hugetlbfs,
> > then I think that part of hugetlb_page_mapping_lock_write() has to
> > remain.  I'd much prefer that hugetlbfs did not reverse the usual
> > nesting, but accept that you had reasons for doing it that way.
> 
> Yes, that is necessary with the change to lock order.
> 
> Yesterday I found another issue/case to handle in the hugetlb COW code
> caused by the difference in lock nesting.  As a result, I am rethinking
> the decision to change the locking order.
> 
> The primary reason for changing the lock order was to make the hugetlb
> page fault code not have to worry about pte pointers changing.  The issue
> is that the pte pointer you get (or create) while walking the table
> without the page table lock could go away due to unsharing the PMD.  We
> can walk the table again after acquiring the lock and validate and possibly
> retry.  However, the backout code in this area which previously had to
> deal with truncation as well, was quite fragile and did not work in all
> corner cases.  This was mostly due to lovely huge page reservations.
> I thought that adding more complexity to the backout code was going to
> cause more issues.  Changing the locking order eliminated the pte pointer
> race as well as truncation.  However, it created new locking issues. :(
> 
> In parallel to working the locking issue, I am also exploring enhanced
> backout code to handle all the needed cases.
> -- 
> Mike Kravetz