[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <a3bf6a4a-96ef-755e-12d5-56f4d792a34f@google.com>
Date: Tue, 25 Jan 2022 17:42:01 -0800 (PST)
From: David Rientjes <rientjes@...gle.com>
To: Shakeel Butt <shakeelb@...gle.com>
cc: Jens Axboe <axboe@...nel.dk>,
Pavel Begunkov <asml.silence@...il.com>,
Andrew Morton <akpm@...ux-foundation.org>,
Linux MM <linux-mm@...ck.org>,
LKML <linux-kernel@...r.kernel.org>, io-uring@...r.kernel.org
Subject: Re: [PATCH] mm: io_uring: allow oom-killer from io_uring_setup
On Tue, 25 Jan 2022, Shakeel Butt wrote:
> > > On an overcommitted system which is running multiple workloads of
> > > varying priorities, it is preferred to trigger an oom-killer to kill a
> > > low priority workload than to let the high priority workload receiving
> > > ENOMEMs. On our memory overcommitted systems, we are seeing a lot of
> > > ENOMEMs instead of oom-kills because io_uring_setup callchain is using
> > > __GFP_NORETRY gfp flag which avoids the oom-killer. Let's remove it and
> > > allow the oom-killer to kill a lower priority job.
> > >
> >
> > What is the size of the allocations that io_mem_alloc() is doing?
> >
> > If get_order(size) > PAGE_ALLOC_COSTLY_ORDER, then this will fail even
> > without the __GFP_NORETRY. To make the guarantee that workloads are not
> > receiving ENOMEM, it seems like we'd need to guarantee that allocations
> > going through io_mem_alloc() are sufficiently small.
> >
> > (And if we're really serious about it, then even something like a
> > BUILD_BUG_ON().)
> >
>
> The test case provided to me for which the user was seeing ENOMEMs was
> io_uring_setup() with 64 entries (nothing else).
>
> If I understand rings_size() calculations correctly then the 0 order
> allocation was requested in io_mem_alloc().
>
> For order > PAGE_ALLOC_COSTLY_ORDER, maybe we can use
> __GFP_RETRY_MAYFAIL. It will at least do more aggressive reclaim
> though I think that is a separate discussion. For this issue, we are
> seeing ENOMEMs even for order 0 allocations.
>
Ah, gotcha, thanks for the background. IIUC, io_uring_setup() can be done
with anything with CAP_SYS_NICE so my only concern would be whether this
could be used maliciously on a system not using memcg, but in that case we
can already fork many small processes that consume all memory and oom kill
everything else on the system already.
Powered by blists - more mailing lists