lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20210107083622.GA13207@dhcp22.suse.cz>
Date:   Thu, 7 Jan 2021 09:36:22 +0100
From:   Michal Hocko <mhocko@...e.com>
To:     Mike Kravetz <mike.kravetz@...cle.com>
Cc:     Muchun Song <songmuchun@...edance.com>, akpm@...ux-foundation.org,
        n-horiguchi@...jp.nec.com, ak@...ux.intel.com, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2 2/6] mm: hugetlbfs: fix cannot migrate the fallocated
 HugeTLB page

On Wed 06-01-21 13:07:40, Mike Kravetz wrote:
> On 1/6/21 12:02 PM, Michal Hocko wrote:
> > On Wed 06-01-21 11:30:25, Mike Kravetz wrote:
> >> On 1/6/21 8:35 AM, Michal Hocko wrote:
> >>> On Wed 06-01-21 16:47:35, Muchun Song wrote:
> >>>> Because we only can isolate a active page via isolate_huge_page()
> >>>> and hugetlbfs_fallocate() forget to mark it as active, we cannot
> >>>> isolate and migrate those pages.
> >>>
> >>> I've little bit hard time to understand this initially and had to dive
> >>> into the code to make sense of it. I would consider the following
> >>> wording easier to grasp. Feel free to reuse if you like.
> >>> "
> >>> If a new hugetlb page is allocated during fallocate it will not be
> >>> marked as active (set_page_huge_active) which will result in a later
> >>> isolate_huge_page failure when the page migration code would like to
> >>> move that page. Such a failure would be unexpected and wrong.
> >>> "
> >>>
> >>> Now to the fix. I believe that this patch shows that the
> >>> set_page_huge_active is just too subtle. Is there any reason why we
> >>> cannot make all freshly allocated huge pages active by default?
> >>
> >> I looked into that yesterday.  The primary issue is in page fault code,
> >> hugetlb_no_page is an example.  If page_huge_active is set, then it can
> >> be isolated for migration.  So, migration could race with the page fault
> >> and the page could be migrated before being added to the page table of
> >> the faulting task.  This was an issue when hugetlb_no_page set_page_huge_active
> >> right after allocating and clearing the huge page.  Commit cb6acd01e2e4
> >> moved the set_page_huge_active after adding the page to the page table
> >> to address this issue.
> > 
> > Thanks for the clarification. I was not aware of this subtlety. The
> > existing comment is not helping much TBH. I am still digesting the
> > suggested race. The page is new and exclusive and not visible via page
> > tables yet, so the only source of the migration would be pfn based
> > (hotplug, poisoning), right?
> 
> That is correct.
> 
> 
> > Btw. s@..._page_huge_active@..._page_huge_migrateable@ would help
> > readability IMHO. With a comment explaining that this _has_ to be called
> > after the page is fully initialized.
> 
> Agree, I will add that as a future enhancement.

Thanks!

-- 
Michal Hocko
SUSE Labs

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ