linux-kernel - Re: [PATCH] mm: io_uring: allow oom-killer from io_uring

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CALvZod5B_nQdU_MHmcYyOpHhGrv5YUnMY-rBPE11Tou6XU_mSA@mail.gmail.com>
Date:   Tue, 25 Jan 2022 14:57:52 -0800
From:   Shakeel Butt <shakeelb@...gle.com>
To:     David Rientjes <rientjes@...gle.com>
Cc:     Jens Axboe <axboe@...nel.dk>,
        Pavel Begunkov <asml.silence@...il.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Linux MM <linux-mm@...ck.org>,
        LKML <linux-kernel@...r.kernel.org>, io-uring@...r.kernel.org
Subject: Re: [PATCH] mm: io_uring: allow oom-killer from io_uring_setup

On Tue, Jan 25, 2022 at 10:35 AM David Rientjes <rientjes@...gle.com> wrote:
>
> On Mon, 24 Jan 2022, Shakeel Butt wrote:
>
> > On an overcommitted system which is running multiple workloads of
> > varying priorities, it is preferred to trigger an oom-killer to kill a
> > low priority workload than to let the high priority workload receiving
> > ENOMEMs. On our memory overcommitted systems, we are seeing a lot of
> > ENOMEMs instead of oom-kills because io_uring_setup callchain is using
> > __GFP_NORETRY gfp flag which avoids the oom-killer. Let's remove it and
> > allow the oom-killer to kill a lower priority job.
> >
>
> What is the size of the allocations that io_mem_alloc() is doing?
>
> If get_order(size) > PAGE_ALLOC_COSTLY_ORDER, then this will fail even
> without the __GFP_NORETRY.  To make the guarantee that workloads are not
> receiving ENOMEM, it seems like we'd need to guarantee that allocations
> going through io_mem_alloc() are sufficiently small.
>
> (And if we're really serious about it, then even something like a
> BUILD_BUG_ON().)
>

The test case provided to me for which the user was seeing ENOMEMs was
io_uring_setup() with 64 entries (nothing else).

If I understand rings_size() calculations correctly then the 0 order
allocation was requested in io_mem_alloc().

For order > PAGE_ALLOC_COSTLY_ORDER, maybe we can use
__GFP_RETRY_MAYFAIL. It will at least do more aggressive reclaim
though I think that is a separate discussion. For this issue, we are
seeing ENOMEMs even for order 0 allocations.