lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 25 Jan 2022 10:35:42 -0800 (PST)
From:   David Rientjes <rientjes@...gle.com>
To:     Shakeel Butt <shakeelb@...gle.com>
cc:     Jens Axboe <axboe@...nel.dk>,
        Pavel Begunkov <asml.silence@...il.com>,
        Andrew Morton <akpm@...ux-foundation.org>, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org, io-uring@...r.kernel.org
Subject: Re: [PATCH] mm: io_uring: allow oom-killer from io_uring_setup

On Mon, 24 Jan 2022, Shakeel Butt wrote:

> On an overcommitted system which is running multiple workloads of
> varying priorities, it is preferred to trigger an oom-killer to kill a
> low priority workload than to let the high priority workload receiving
> ENOMEMs. On our memory overcommitted systems, we are seeing a lot of
> ENOMEMs instead of oom-kills because io_uring_setup callchain is using
> __GFP_NORETRY gfp flag which avoids the oom-killer. Let's remove it and
> allow the oom-killer to kill a lower priority job.
> 

What is the size of the allocations that io_mem_alloc() is doing?

If get_order(size) > PAGE_ALLOC_COSTLY_ORDER, then this will fail even 
without the __GFP_NORETRY.  To make the guarantee that workloads are not 
receiving ENOMEM, it seems like we'd need to guarantee that allocations 
going through io_mem_alloc() are sufficiently small.

(And if we're really serious about it, then even something like a 
BUILD_BUG_ON().)

> Signed-off-by: Shakeel Butt <shakeelb@...gle.com>
> ---
>  fs/io_uring.c | 5 ++---
>  1 file changed, 2 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/io_uring.c b/fs/io_uring.c
> index e54c4127422e..d9eeb202363c 100644
> --- a/fs/io_uring.c
> +++ b/fs/io_uring.c
> @@ -8928,10 +8928,9 @@ static void io_mem_free(void *ptr)
>  
>  static void *io_mem_alloc(size_t size)
>  {
> -	gfp_t gfp_flags = GFP_KERNEL | __GFP_ZERO | __GFP_NOWARN | __GFP_COMP |
> -				__GFP_NORETRY | __GFP_ACCOUNT;
> +	gfp_t gfp = GFP_KERNEL_ACCOUNT | __GFP_ZERO | __GFP_NOWARN | __GFP_COMP;
>  
> -	return (void *) __get_free_pages(gfp_flags, get_order(size));
> +	return (void *) __get_free_pages(gfp, get_order(size));
>  }
>  
>  static unsigned long rings_size(unsigned sq_entries, unsigned cq_entries,
> -- 
> 2.35.0.rc0.227.g00780c9af4-goog
> 
> 
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ