lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.LSU.2.11.2106020831590.6388@eggly.anvils>
Date:   Wed, 2 Jun 2021 08:57:25 -0700 (PDT)
From:   Hugh Dickins <hughd@...gle.com>
To:     Yu Xu <xuyu@...ux.alibaba.com>
cc:     Hugh Dickins <hughd@...gle.com>, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org, akpm@...ux-foundation.org,
        gavin.dg@...ux.alibaba.com, Greg Thelen <gthelen@...gle.com>,
        Wei Xu <weixugc@...gle.com>,
        Matthew Wilcox <willy@...radead.org>,
        Nicholas Piggin <npiggin@...il.com>,
        Vlastimil Babka <vbabka@...e.cz>
Subject: Re: [PATCH] mm, thp: relax migration wait when failed to get tail
 page

On Wed, 2 Jun 2021, Yu Xu wrote:
> On 6/2/21 12:55 AM, Hugh Dickins wrote:
> > On Wed, 2 Jun 2021, Xu Yu wrote:
> > 
> > > We notice that hung task happens in a conner but practical scenario when
> > > CONFIG_PREEMPT_NONE is enabled, as follows.
> > > 
> > > Process 0                       Process 1                     Process
> > > 2..Inf
> > > split_huge_page_to_list
> > >      unmap_page
> > >          split_huge_pmd_address
> > >                                  __migration_entry_wait(head)
> > >                                                                __migration_entry_wait(tail)
> > >      remap_page (roll back)
> > >          remove_migration_ptes
> > >              rmap_walk_anon
> > >                  cond_resched
> > > 
> > > Where __migration_entry_wait(tail) is occurred in kernel space, e.g.,
> > > copy_to_user, which will immediately fault again without rescheduling,
> > > and thus occupy the cpu fully.
> > > 
> > > When there are too many processes performing __migration_entry_wait on
> > > tail page, remap_page will never be done after cond_resched.
> > > 
> > > This relaxes __migration_entry_wait on tail page, thus gives remap_page
> > > a chance to complete.
> > > 
> > > Signed-off-by: Gang Deng <gavin.dg@...ux.alibaba.com>
> > > Signed-off-by: Xu Yu <xuyu@...ux.alibaba.com>
> > 
> > Well caught: you're absolutely right that there's a bug there.
> > But isn't cond_resched() just papering over the real bug, and
> > what it should do is a "page = compound_head(page);" before the
> > get_page_unless_zero()? How does that work out in your testing?
> 
> compound_head works. The patched kernel is alive for hours under
> our reproducer, which usually makes the vanilla kernel hung after
> tens of minutes at most.

Oh, that's good news, thanks.

(It's still likely that a well-placed cond_resched() somewhere in
mm/gup.c would also be a good idea, but none of us have yet got
around to identifying where.)

> 
> If we use compound_head, the behavior of __migration_entry_wait(tail)
> changes from "retry fault" to "prevent THP from being split". Is that
> right?  Then which is preferred? If it were me, I would prefer "retry
> fault".

As Matthew remarked, you are asking very good questions, and split
migration entries are difficult to think about.  But I believe you'll
find it works out okay.

The point of *put_and_* wait_on_page_locked() is that it does drop
the page reference you acquired with get_page_unless_zero, as soon
as the page is on the wait queue, before actually waiting.

So splitting the THP is only prevented for a brief interval.  Now,
it's true that if there are very many tasks faulting on portions
of the huge page, in that interval between inserting the migration
entries and freezing the huge page's refcount to 0, they can reduce
the chance of splitting considerably.  But that's not an excuse for 
for doing get_page_unless_zero() on the wrong thing, as it was doing.

Hugh

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ