lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <d0abf3db-0726-81b9-5a44-36cb1736a155@nvidia.com>
Date:   Wed, 18 May 2022 00:12:18 -0700
From:   John Hubbard <jhubbard@...dia.com>
To:     Mike Kravetz <mike.kravetz@...cle.com>,
        Minchan Kim <minchan@...nel.org>
Cc:     Andrew Morton <akpm@...ux-foundation.org>,
        syzbot <syzbot+acf65ca584991f3cc447@...kaller.appspotmail.com>,
        linux-kernel@...r.kernel.org, linux-mm@...ck.org,
        llvm@...ts.linux.dev, nathan@...nel.org, ndesaulniers@...gle.com,
        syzkaller-bugs@...glegroups.com, trix@...hat.com,
        Matthew Wilcox <willy@...radead.org>,
        Stephen Rothwell <sfr@...b.auug.org.au>,
        David Hildenbrand <david@...hat.com>
Subject: Re: [syzbot] WARNING in follow_hugetlb_page

On 5/16/22 20:37, Mike Kravetz wrote:

> Later, that code was modified (for performance reasons) in commit  0fa5bc4023c18 to do a single try_grab_compound_page() instead of multiple
> calls to try_grab_page().  At the time it was not noticed that
> try_grab_compound_page had a check for 'pinnable' when try_grab_page did
> not.  So, I think this commit actually changed the behavior.

This makes me unhappy with the try_grab_page() situation: the name
doesn't give you any hint that it now has extra policy in there.
Furthermore, the policy is unaware of all of the call sites and
is already getting it wrong.

After looking through the call sites, I'm leaning slightly toward pulling
is_pinnable_page() out, and letting the call sites decide if they want
or need to apply that policy.

Just because try_grab_folio() is a choke point, does not mean that it
is the right place for the is_pinnable_page() checks. And while it's
nice to avoid scattering is_pinnable_page() all over the place, it's
more important to *not* have it when it is not correct.

There is a counter-argument, though, which goes something like,
"try_grab_folio(FOLL_PIN) is never ever correct if the page is not
pinnable". But that's not as clear cut as one might think. See below.

> 
>>
>> try_grab_folio()
>>      /*
>>       * Can't do FOLL_LONGTERM + FOLL_PIN gup fast path if not in a
>>       * right zone, so fail and let the caller fall back to the slow
>>       * path.
>>       */
>>      if (unlikely((flags & FOLL_LONGTERM) &&
>>               !is_pinnable_page(page))) /* which we just changed */
>>          return NULL;
>>
>> ...and now I'm starting to think that this warning might fire even with
>> the corrected check for MIGRATE_CMA || MIGRATE_ISOLATE. Because
>> try_grab_folio() didn't always have this early exit and it is starting
>> to look wrong.
>>
>> Simply attempting to pin a non-pinnable huge page would hit this
>> warning. Adding additional reasons that a page is not pinnable (which
>> the patch does) could make this more likely to fire.
> 
> Yes, that is correct.  One could easily allocate a hugetlb page from
> CMA and trigger this warning.
> 

Thanks for confirming that!

>>
>> I need to look at this a little more closely, it is making me wonder
>> whether the is_pinnable_page() check is a problem in this path. The
>> comment in try_grab_folio() indicates that the early return is a hack
>> (it assumes that the caller is in the gup fast path), and maybe the hack
>> is just wrong here--I think we're actually on the slow gup path. Not
>> good.
>>
>> Mike, any thoughts here?
>>
> 
> Do you know why try_grab_compound_page(now try_grab_folio) checks for
> pinnable when try_grab_page does not?

I think it's just an oversight. The CMA and migrate-out logic was sort of
retrofitted and I think it's just incomplete, yet.

> 
> Then I guess the next question is 'Should we allow pinning of hugetlb pages
> in these areas?'.  My first thought would be no.  But, recall it was 'allowed'
> until that commit which changed try_grab_page to try_grab_compound_page.
> In the 'common' case of compaction, we do not attempt to migrate/move hugetlb
> pages (last time I looked), so long term pinning should not be an issue.
> However, for things like memory offline or alloc_contig_range() we want to
> migrate hugetlb pages, so this would be an issue there.
> 
> At a minimum, I think the warning should go.

Agreed. That, and either a) bring try_grab_folio() and try_grab_page() into
the same behavior with respect to checking for pinnable, or b) lifting
is_pinnable_page() out of try_grab_folio() and letting the callers decide, as
mentioned above.

Your point above makes it seem like the flexibility of (b) might be better...

thanks,
-- 
John Hubbard
NVIDIA


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ