lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <CAD=FV=WLHZfNN5cGMUEnvv17obVK-MLmWHJHx=MV55Q1YxczOA@mail.gmail.com>
Date:   Tue, 2 May 2023 14:20:54 -0700
From:   Doug Anderson <dianders@...omium.org>
To:     Hillf Danton <hdanton@...a.com>
Cc:     Andrew Morton <akpm@...ux-foundation.org>,
        Mel Gorman <mgorman@...hsingularity.net>,
        Alexander Viro <viro@...iv.linux.org.uk>,
        Christian Brauner <brauner@...nel.org>, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org, Matthew Wilcox <willy@...radead.org>,
        Yu Zhao <yuzhao@...gle.com>
Subject: Re: [PATCH v3] migrate_pages: Avoid blocking for IO in MIGRATE_SYNC_LIGHT

Hi,

On Sun, Apr 30, 2023 at 1:53 AM Hillf Danton <hdanton@...a.com> wrote:
>
> On 28 Apr 2023 13:54:38 -0700 Douglas Anderson <dianders@...omium.org>
> > The MIGRATE_SYNC_LIGHT mode is intended to block for things that will
> > finish quickly but not for things that will take a long time. Exactly
> > how long is too long is not well defined, but waits of tens of
> > milliseconds is likely non-ideal.
> >
> > When putting a Chromebook under memory pressure (opening over 90 tabs
> > on a 4GB machine) it was fairly easy to see delays waiting for some
> > locks in the kcompactd code path of > 100 ms. While the laptop wasn't
> > amazingly usable in this state, it was still limping along and this
> > state isn't something artificial. Sometimes we simply end up with a
> > lot of memory pressure.
>
> Given longer than 100ms stall, this can not be a correct fix if the
> hardware fails to do more than ten IOs a second.
>
> OTOH given some pages reclaimed for compaction to make forward progress
> before kswapd wakes kcompactd up, this can not be a fix without spotting
> the cause of the stall.

Right that the system is in pretty bad shape when this happens and
it's not very effective at doing IO or much of anything because it's
under bad memory pressure.

I guess my first thought is that, when this happens then a process
holding the lock gets preempted and doesn't get scheduled back in for
a while. That _should_ be possible, right? In the case where I'm
reproducing this then all the CPUs would be super busy madly trying to
compress / decompress zram, so it doesn't surprise me that a process
could get context switched out for a while.

-Doug

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ