linux-kernel - Re: Machine lockups on extreme memory pressure

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <CALvZod4Ww1vzxD90HFePVweFaQn+3WDwu8G-gHMA1AeiJGprBg@mail.gmail.com>
Date:   Fri, 30 Oct 2020 10:01:38 -0700
From:   Shakeel Butt <shakeelb@...gle.com>
To:     Michal Hocko <mhocko@...e.com>
Cc:     Johannes Weiner <hannes@...xchg.org>,
        Linux MM <linux-mm@...ck.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Roman Gushchin <guro@...com>,
        LKML <linux-kernel@...r.kernel.org>,
        Greg Thelen <gthelen@...gle.com>
Subject: Re: Machine lockups on extreme memory pressure

On Tue, Sep 22, 2020 at 10:01 AM Michal Hocko <mhocko@...e.com> wrote:
>
> On Tue 22-09-20 09:51:30, Shakeel Butt wrote:
> > On Tue, Sep 22, 2020 at 9:34 AM Michal Hocko <mhocko@...e.com> wrote:
> > >
> > > On Tue 22-09-20 09:29:48, Shakeel Butt wrote:
> [...]
> > > > Anyways, what do you think of the in-kernel PSI based
> > > > oom-kill trigger. I think Johannes had a prototype as well.
> > >
> > > We have talked about something like that in the past and established
> > > that auto tuning for oom killer based on PSI is almost impossible to get
> > > right for all potential workloads and that so this belongs to userspace.
> > > The kernel's oom killer is there as a last resort when system gets close
> > > to meltdown.
> >
> > The system is already in meltdown state from the users perspective. I
> > still think allowing the users to optionally set the oom-kill trigger
> > based on PSI makes sense. Something like 'if all processes on the
> > system are stuck for 60 sec, trigger oom-killer'.
>
> We already do have watchdogs for that no? If you cannot really schedule
> anything then soft lockup detector should fire. In a meltdown state like
> that the reboot is likely the best way forward anyway.

Yes, soft lockup detector can catch this situation but I still think
we can do better than panic/reboot.

Anyways, I think we now know the reason for this extreme pressure and
I just wanted to share if someone else might be facing a similar
situation.

There were several thousand TCP delayed ACKs queued on the system. The
system was under memory pressure and alloc_skb(GFP_ATOMIC) for delayed
ACKs were either stealing from reclaimers or failing. For the delayed
ACKs whose allocation failed, the kernel reschedules them infinitely.
So, these failing allocations for delayed ACKs were keeping the system
in this lockup state for hours. The commit a37c2134bed6 ("tcp: add
exponential backoff in __tcp_send_ack()") recently added the fix for
this situation.