lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Thu, 23 May 2024 06:45:08 +0000
From: Naveen Mamindlapalli <naveenm@...vell.com>
To: Håkon Bugge <haakon.bugge@...cle.com>,
        "linux-rdma@...r.kernel.org" <linux-rdma@...r.kernel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        "rds-devel@....oracle.com"
	<rds-devel@....oracle.com>
CC: Jason Gunthorpe <jgg@...pe.ca>, Leon Romanovsky <leon@...nel.org>,
        Saeed
 Mahameed <saeedm@...dia.com>, Tariq Toukan <tariqt@...dia.com>,
        "David S .
 Miller" <davem@...emloft.net>,
        Eric Dumazet <edumazet@...gle.com>, Jakub
 Kicinski <kuba@...nel.org>,
        Paolo Abeni <pabeni@...hat.com>, Tejun Heo
	<tj@...nel.org>,
        Lai Jiangshan <jiangshanlai@...il.com>,
        Allison Henderson
	<allison.henderson@...cle.com>,
        Manjunath Patil
	<manjunath.b.patil@...cle.com>,
        Mark Zhang <markzhang@...dia.com>,
        Chuck
 Lever <chuck.lever@...cle.com>,
        Shiraz Saleem <shiraz.saleem@...el.com>,
        Yang
 Li <yang.lee@...ux.alibaba.com>
Subject: RE: [PATCH v3 4/6] RDMA/cm: Brute force GFP_NOIO


> -----Original Message-----
> From: Håkon Bugge <haakon.bugge@...cle.com>
> Sent: Wednesday, May 22, 2024 7:25 PM
> To: linux-rdma@...r.kernel.org; linux-kernel@...r.kernel.org;
> netdev@...r.kernel.org; rds-devel@....oracle.com
> Cc: Jason Gunthorpe <jgg@...pe.ca>; Leon Romanovsky <leon@...nel.org>;
> Saeed Mahameed <saeedm@...dia.com>; Tariq Toukan <tariqt@...dia.com>;
> David S . Miller <davem@...emloft.net>; Eric Dumazet
> <edumazet@...gle.com>; Jakub Kicinski <kuba@...nel.org>; Paolo Abeni
> <pabeni@...hat.com>; Tejun Heo <tj@...nel.org>; Lai Jiangshan
> <jiangshanlai@...il.com>; Allison Henderson <allison.henderson@...cle.com>;
> Manjunath Patil <manjunath.b.patil@...cle.com>; Mark Zhang
> <markzhang@...dia.com>; Håkon Bugge <haakon.bugge@...cle.com>; Chuck
> Lever <chuck.lever@...cle.com>; Shiraz Saleem <shiraz.saleem@...el.com>;
> Yang Li <yang.lee@...ux.alibaba.com>
> Subject: [PATCH v3 4/6] RDMA/cm: Brute force GFP_NOIO
> 
> In ib_cm_init(), we call memalloc_noio_{save,restore} in a parenthetic fashion
> when enabled by the module parameter force_noio.
> 
> This in order to conditionally enable ib_cm to work aligned with block I/O devices.
> Any work queued later on work-queues created during module initialization will
> inherit the PF_MEMALLOC_{NOIO,NOFS} flag(s), due to commit ("workqueue:
> Inherit NOIO and NOFS alloc flags").
> 
> We do this in order to enable ULPs using the RDMA stack to be used as a
> network block I/O device. This to support a filesystem on top of a raw block
> device which uses said ULP(s) and the RDMA stack as the network transport
> layer.
> 
> Under intense memory pressure, we get memory reclaims. Assume the filesystem
> reclaims memory, goes to the raw block device, which calls into the ULP in
> question, which calls the RDMA stack. Now, if regular GFP_KERNEL allocations
> in ULP or the RDMA stack require reclaims to be fulfilled, we end up in a circular
> dependency.
> 
> We break this circular dependency by:
> 
> 1. Force all allocations in the ULP and the relevant RDMA stack to use
>    GFP_NOIO, by means of a parenthetic use of
>    memalloc_noio_{save,restore} on all relevant entry points.
> 
> 2. Make sure work-queues inherits current->flags
>    wrt. PF_MEMALLOC_{NOIO,NOFS}, such that work executed on the
>    work-queue inherits the same flag(s).
> 
> Signed-off-by: Håkon Bugge <haakon.bugge@...cle.com>
> ---
>  drivers/infiniband/core/cm.c | 15 ++++++++++++++-
>  1 file changed, 14 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c index
> 07fb8d3c037f0..767eec38eb57d 100644
> --- a/drivers/infiniband/core/cm.c
> +++ b/drivers/infiniband/core/cm.c
> @@ -22,6 +22,7 @@
>  #include <linux/workqueue.h>
>  #include <linux/kdev_t.h>
>  #include <linux/etherdevice.h>
> +#include <linux/sched/mm.h>
> 
>  #include <rdma/ib_cache.h>
>  #include <rdma/ib_cm.h>
> @@ -35,6 +36,11 @@ MODULE_DESCRIPTION("InfiniBand CM");
> MODULE_LICENSE("Dual BSD/GPL");
> 
>  #define CM_DESTROY_ID_WAIT_TIMEOUT 10000 /* msecs */
> +
> +static bool cm_force_noio;
> +module_param_named(force_noio, cm_force_noio, bool, 0444);
> +MODULE_PARM_DESC(force_noio, "Force the use of GFP_NOIO (Y/N)");
> +
>  static const char * const ibcm_rej_reason_strs[] = {
>  	[IB_CM_REJ_NO_QP]			= "no QP",
>  	[IB_CM_REJ_NO_EEC]			= "no EEC",
> @@ -4504,6 +4510,10 @@ static void cm_remove_one(struct ib_device
> *ib_device, void *client_data)  static int __init ib_cm_init(void)  {
>  	int ret;
> +	unsigned int noio_flags;

minor: please follow reverse xmas tree order

> +
> +	if (cm_force_noio)
> +		noio_flags = memalloc_noio_save();
> 
>  	INIT_LIST_HEAD(&cm.device_list);
>  	rwlock_init(&cm.device_lock);
> @@ -4527,10 +4537,13 @@ static int __init ib_cm_init(void)
>  	if (ret)
>  		goto error3;
> 
> -	return 0;
> +	goto error2;
>  error3:
>  	destroy_workqueue(cm.wq);
>  error2:
> +	if (cm_force_noio)
> +		memalloc_noio_restore(noio_flags);
> +
>  	return ret;
>  }
> 
> --
> 2.31.1
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ