lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.LSU.2.11.1811201710410.2061@eggly.anvils>
Date:   Tue, 20 Nov 2018 17:21:36 -0800 (PST)
From:   Hugh Dickins <hughd@...gle.com>
To:     Baoquan He <bhe@...hat.com>
cc:     Hugh Dickins <hughd@...gle.com>, Michal Hocko <mhocko@...nel.org>,
        Vlastimil Babka <vbabka@...e.cz>, pifang@...hat.com,
        David Hildenbrand <david@...hat.com>, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org, akpm@...ux-foundation.org,
        aarcange@...hat.com, Mel Gorman <mgorman@...e.de>
Subject: Re: Memory hotplug softlock issue

On Tue, 20 Nov 2018, Baoquan He wrote:
> On 11/20/18 at 02:38pm, Vlastimil Babka wrote:
> > On 11/20/18 6:44 AM, Hugh Dickins wrote:
> > > [PATCH] mm: put_and_wait_on_page_locked() while page is migrated
> > > 
> > > We have all assumed that it is essential to hold a page reference while
> > > waiting on a page lock: partly to guarantee that there is still a struct
> > > page when MEMORY_HOTREMOVE is configured, but also to protect against
> > > reuse of the struct page going to someone who then holds the page locked
> > > indefinitely, when the waiter can reasonably expect timely unlocking.
> > > 
> > > But in fact, so long as wait_on_page_bit_common() does the put_page(),
> > > and is careful not to rely on struct page contents thereafter, there is
> > > no need to hold a reference to the page while waiting on it.  That does
> > 
> > So there's still a moment where refcount is elevated, but hopefully
> > short enough, right? Let's see if it survives Baoquan's stress testing.
> 
> Yes, I applied Hugh's patch 8 hours ago, then our QE Ping operated on
> that machine, after many times of hot removing/adding, the endless
> looping during mirgrating is not seen any more. The test result for
> Hugh's patch is positive. I even suggested Ping increasing the memory
> pressure to "stress -m 250", it still succeeded to offline and remove.
> 
> So I think this patch works to solve the issue. Thanks a lot for your
> help, all of you. 

Very good to hear, thanks a lot for your quick feedback.

> 
> High, will you post a formal patch in a separate thread?

Yes, I promise that I shall do so in the next few days, but not today:
some other things have to take priority.

And Vlastimil has raised an excellent point about the interaction with
PSI "thrashing": I need to read up and decide which way to go on that
(and add Johannes to the Cc when I post).

I think I shall probably post it directly to Linus (lists and other
people Cc'ed of course): not because I think it should be rushed in
too quickly, nor to sidestep Andrew, but because Linus was very closely
involved in both the PG_waiters and WQ_FLAG_BOOKMARK discussions:
it is an area of special interest to him.

Hugh

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ