linux-kernel - Re: Machine lockups on extreme memory pressure

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CALvZod7jvxEdbMzrmmt6Vrse=Ui4yhhVYyxPkPmmzWC5Z_6rtw@mail.gmail.com>
Date:   Tue, 22 Sep 2020 09:29:48 -0700
From:   Shakeel Butt <shakeelb@...gle.com>
To:     Michal Hocko <mhocko@...e.com>
Cc:     Johannes Weiner <hannes@...xchg.org>,
        Linux MM <linux-mm@...ck.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Roman Gushchin <guro@...com>,
        LKML <linux-kernel@...r.kernel.org>,
        Greg Thelen <gthelen@...gle.com>
Subject: Re: Machine lockups on extreme memory pressure

On Tue, Sep 22, 2020 at 8:16 AM Michal Hocko <mhocko@...e.com> wrote:
>
> On Tue 22-09-20 06:37:02, Shakeel Butt wrote:
> [...]
> > > I would recommend to focus on tracking down the who is blocking the
> > > further progress.
> >
> > I was able to find the CPU next in line for the list_lock from the
> > dump. I don't think anyone is blocking the progress as such but more
> > like the spinlock in the irq context is starving the spinlock in the
> > process context. This is a high traffic machine and there are tens of
> > thousands of potential network ACKs on the queue.
>
> So there is a forward progress but it is too slow to have any reasonable
> progress in userspace?

Yes.

>
> > I talked about this problem with Johannes at LPC 2019 and I think we
> > talked about two potential solutions. First was to somehow give memory
> > reserves to oomd and second was in-kernel PSI based oom-killer. I am
> > not sure the first one will work in this situation but the second one
> > might help.
>
> Why does your oomd depend on memory allocation?
>

It does not but I think my concern was the potential allocations
during syscalls. Anyways, what do you think of the in-kernel PSI based
oom-kill trigger. I think Johannes had a prototype as well.