lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 8 Apr 2022 07:26:07 -0400
From:   Nico Pache <npache@...hat.com>
To:     Michal Hocko <mhocko@...e.com>
Cc:     Thomas Gleixner <tglx@...utronix.de>,
        Peter Zijlstra <peterz@...radead.org>, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org, Rafael Aquini <aquini@...hat.com>,
        Waiman Long <longman@...hat.com>, Baoquan He <bhe@...hat.com>,
        Christoph von Recklinghausen <crecklin@...hat.com>,
        Don Dutile <ddutile@...hat.com>,
        "Herton R . Krzesinski" <herton@...hat.com>,
        David Rientjes <rientjes@...gle.com>,
        Andrea Arcangeli <aarcange@...hat.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Davidlohr Bueso <dave@...olabs.net>,
        Ingo Molnar <mingo@...hat.com>,
        Joel Savitz <jsavitz@...hat.com>,
        Darren Hart <dvhart@...radead.org>, stable@...nel.org
Subject: Re: [PATCH v8] oom_kill.c: futex: Don't OOM reap the VMA containing
 the robust_list_head



On 4/8/22 06:51, Michal Hocko wrote:
> On Fri 08-04-22 06:36:40, Nico Pache wrote:
>>
>>
>> On 4/8/22 05:59, Michal Hocko wrote:
>>> On Fri 08-04-22 05:40:09, Nico Pache wrote:
>>>>
>>>>
>>>> On 4/8/22 05:36, Michal Hocko wrote:
>>>>> On Fri 08-04-22 04:52:33, Nico Pache wrote:
>>>>> [...]
>>>>>> In a heavily contended CPU with high memory pressure the delay may also
>>>>>> lead to other processes unnecessarily OOMing.
>>>>>
>>>>> Let me just comment on this part because there is likely a confusion
>>>>> inlved. Delaying the oom_reaper _cannot_ lead to additional OOM killing
>>>>> because the the oom killing is throttled by existence of a preexisting
>>>>> OOM victim. In other words as long as there is an alive victim no
>>>>> further victims are not selected and the oom killer backs off. The
>>>>> oom_repaer will hide the alive oom victim after it is processed.
>>>>> The longer the delay will be the longer an oom victim can block a
>>>>> further progress but it cannot really cause unnecessary OOMing.
>>>> Is it not the case that if we delay an OOM, the amount of available memory stays
>>>> limited and other processes that are allocating memory can become OOM candidates?
>>>
>>> No. Have a look at oom_evaluate_task (tsk_is_oom_victim check).
>> Ok I see.
>>
>> Doesnt the delay then allow the system to run into the following case more easily?:
>> pr_warn("Out of memory and no killable processes...\n");
>> panic("System is deadlocked on memory\n");
> 
> No. Aborting the oom victim search (above mentioned) will cause
> out_of_memory to bail out and return to the page allocator. 
Ok I see that now. I did my bit math incorrectly the first time around. I
thought abort lead to the !oc->chosen case.

> the only problem with delaying the oom_reaper is that _iff_ the oom
> victim cannot terminate (because it is stuck somewhere in the kernel)
> on its own then the oom situation (be it global, cpuset or memcg) will
> take longer so allocating tasks will not be able to make a forward
> progress.
Ok so if i understand that correctly, delaying can have some ugly effects and
kinda breaks the initial purpose of the OOM reaper?

I personally don't like the delay approach. Especially if we have a better one
we know is working, and that doesnt add regressions.

If someone can prove to me the private lock case, I'd be more willing to bite.

Thanks for all the OOM context :)
-- Nico

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ