lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 30 Mar 2022 15:36:25 -0600
From:   Nico Pache <npache@...hat.com>
To:     Michal Hocko <mhocko@...e.com>,
        Thomas Gleixner <tglx@...utronix.de>
Cc:     Davidlohr Bueso <dave@...olabs.net>, linux-mm@...ck.org,
        Andrea Arcangeli <aarcange@...hat.com>,
        Joel Savitz <jsavitz@...hat.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        linux-kernel@...r.kernel.org, Rafael Aquini <aquini@...hat.com>,
        Waiman Long <longman@...hat.com>, Baoquan He <bhe@...hat.com>,
        Christoph von Recklinghausen <crecklin@...hat.com>,
        Don Dutile <ddutile@...hat.com>,
        "Herton R . Krzesinski" <herton@...hat.com>,
        Ingo Molnar <mingo@...hat.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Darren Hart <dvhart@...radead.org>,
        Andre Almeida <andrealmeid@...labora.com>,
        David Rientjes <rientjes@...gle.com>
Subject: Re: [PATCH v5] mm/oom_kill.c: futex: Close a race between do_exit and
 the oom_reaper



On 3/30/22 12:18, Nico Pache wrote:
> 
> 
> On 3/30/22 03:18, Michal Hocko wrote:
>> Nico,
>>
>> On Wed 23-03-22 10:17:29, Michal Hocko wrote:
>>> Let me skip over futex part which I need to digest and only focus on the
>>> oom side of the things for clarification.
>>>
>>> On Tue 22-03-22 23:43:18, Thomas Gleixner wrote:
>> [...]
>>>> You can easily validate that by doing:
>>>>
>>>> wake_oom_reaper(task)
>>>>    task->reap_time = jiffies + HZ;
>>>>    queue_task(task);
>>>>    wakeup(reaper);
>>>>
>>>> and then:
>>>>
>>>> oom_reap_task(task)
>>>>     now = READ_ONCE(jiffies);
>>>>     if (time_before(now, task->reap_time)
>>>>         schedule_timeout_idle(task->reap_time - now);
>>>>
>>>> before trying to actually reap the mm.
>>>>
>>>> That will prevent the enforced race in most cases and allow the exiting
>>>> and/or killed processes to cleanup themself. Not pretty, but it should
>>>> reduce the chance of the reaper to win the race with the exiting and/or
>>>> killed process significantly.
>>>>
>>>> It's not going to work when the problem is combined with a heavy VM
>>>> overload situation which keeps a guest (or one/some it's vCPUs) away
>>>> from being scheduled. See below for a discussion of guarantees.
>>>>
>>>> If it failed to do so when the sleep returns, then you still can reap
>>>> it.
>>>
>>> Yes, this is certainly an option. Please note that the oom_reaper is not
>>> the only way to trigger this. process_mrelease syscall performs the same
>>> operation from the userspace. Arguably process_mrelease could be used
>>> sanely/correctly because the userspace oom killer can do pro-cleanup
>>> steps before going to final SIGKILL & process_mrelease. One way would be
>>> to send SIGTERM in the first step and allow the victim to perform its
>>> cleanup.
>>
>> are you working on another version of the fix/workaround based on the
>> discussion so far?
> 
> We are indeed! Sorry for the delay we've been taking the time to do our due
> diligence on some of the claims made. We are also spending time rewriting the
> reproducer to include more test cases that Thomas brought up.
> 
> Ill summarize here, and reply to the original emails in more detail....
> 
> Firstly, we have implemented & tested the VMA skipping... it does fix our case.
> Thomas brought up a few good points about the robust list head and the potential
> waiters being in different VMAs; however, I think its a moot point, given that
> the locks will only be reaped if allocated as ((private|anon)|| !shared).

Sorry... not completely moot.

As Thomas pointed out, a robust list with the following structure will probably
fail to recover its waiters:

TLS (robust head, skip)* --> private lock (reaped) --> shared lock (not reaped)

We are working on getting a test case with multiple locks and mixed mapping
types to prove this.

Skipping the robust list head VMA will be beneficial in cases were the robust
list is full of shared locks:

TLS (robust head, skip)* --> shared lock(not reaped) --> shared lock(not reaped)

-- Nico

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ