[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID:
<SJ2PR18MB5635B31CDA691101A79ADA85A2F42@SJ2PR18MB5635.namprd18.prod.outlook.com>
Date: Thu, 23 May 2024 06:45:08 +0000
From: Naveen Mamindlapalli <naveenm@...vell.com>
To: Håkon Bugge <haakon.bugge@...cle.com>,
"linux-rdma@...r.kernel.org" <linux-rdma@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
"rds-devel@....oracle.com"
<rds-devel@....oracle.com>
CC: Jason Gunthorpe <jgg@...pe.ca>, Leon Romanovsky <leon@...nel.org>,
Saeed
Mahameed <saeedm@...dia.com>, Tariq Toukan <tariqt@...dia.com>,
"David S .
Miller" <davem@...emloft.net>,
Eric Dumazet <edumazet@...gle.com>, Jakub
Kicinski <kuba@...nel.org>,
Paolo Abeni <pabeni@...hat.com>, Tejun Heo
<tj@...nel.org>,
Lai Jiangshan <jiangshanlai@...il.com>,
Allison Henderson
<allison.henderson@...cle.com>,
Manjunath Patil
<manjunath.b.patil@...cle.com>,
Mark Zhang <markzhang@...dia.com>,
Chuck
Lever <chuck.lever@...cle.com>,
Shiraz Saleem <shiraz.saleem@...el.com>,
Yang
Li <yang.lee@...ux.alibaba.com>
Subject: RE: [PATCH v3 4/6] RDMA/cm: Brute force GFP_NOIO
> -----Original Message-----
> From: Håkon Bugge <haakon.bugge@...cle.com>
> Sent: Wednesday, May 22, 2024 7:25 PM
> To: linux-rdma@...r.kernel.org; linux-kernel@...r.kernel.org;
> netdev@...r.kernel.org; rds-devel@....oracle.com
> Cc: Jason Gunthorpe <jgg@...pe.ca>; Leon Romanovsky <leon@...nel.org>;
> Saeed Mahameed <saeedm@...dia.com>; Tariq Toukan <tariqt@...dia.com>;
> David S . Miller <davem@...emloft.net>; Eric Dumazet
> <edumazet@...gle.com>; Jakub Kicinski <kuba@...nel.org>; Paolo Abeni
> <pabeni@...hat.com>; Tejun Heo <tj@...nel.org>; Lai Jiangshan
> <jiangshanlai@...il.com>; Allison Henderson <allison.henderson@...cle.com>;
> Manjunath Patil <manjunath.b.patil@...cle.com>; Mark Zhang
> <markzhang@...dia.com>; Håkon Bugge <haakon.bugge@...cle.com>; Chuck
> Lever <chuck.lever@...cle.com>; Shiraz Saleem <shiraz.saleem@...el.com>;
> Yang Li <yang.lee@...ux.alibaba.com>
> Subject: [PATCH v3 4/6] RDMA/cm: Brute force GFP_NOIO
>
> In ib_cm_init(), we call memalloc_noio_{save,restore} in a parenthetic fashion
> when enabled by the module parameter force_noio.
>
> This in order to conditionally enable ib_cm to work aligned with block I/O devices.
> Any work queued later on work-queues created during module initialization will
> inherit the PF_MEMALLOC_{NOIO,NOFS} flag(s), due to commit ("workqueue:
> Inherit NOIO and NOFS alloc flags").
>
> We do this in order to enable ULPs using the RDMA stack to be used as a
> network block I/O device. This to support a filesystem on top of a raw block
> device which uses said ULP(s) and the RDMA stack as the network transport
> layer.
>
> Under intense memory pressure, we get memory reclaims. Assume the filesystem
> reclaims memory, goes to the raw block device, which calls into the ULP in
> question, which calls the RDMA stack. Now, if regular GFP_KERNEL allocations
> in ULP or the RDMA stack require reclaims to be fulfilled, we end up in a circular
> dependency.
>
> We break this circular dependency by:
>
> 1. Force all allocations in the ULP and the relevant RDMA stack to use
> GFP_NOIO, by means of a parenthetic use of
> memalloc_noio_{save,restore} on all relevant entry points.
>
> 2. Make sure work-queues inherits current->flags
> wrt. PF_MEMALLOC_{NOIO,NOFS}, such that work executed on the
> work-queue inherits the same flag(s).
>
> Signed-off-by: Håkon Bugge <haakon.bugge@...cle.com>
> ---
> drivers/infiniband/core/cm.c | 15 ++++++++++++++-
> 1 file changed, 14 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c index
> 07fb8d3c037f0..767eec38eb57d 100644
> --- a/drivers/infiniband/core/cm.c
> +++ b/drivers/infiniband/core/cm.c
> @@ -22,6 +22,7 @@
> #include <linux/workqueue.h>
> #include <linux/kdev_t.h>
> #include <linux/etherdevice.h>
> +#include <linux/sched/mm.h>
>
> #include <rdma/ib_cache.h>
> #include <rdma/ib_cm.h>
> @@ -35,6 +36,11 @@ MODULE_DESCRIPTION("InfiniBand CM");
> MODULE_LICENSE("Dual BSD/GPL");
>
> #define CM_DESTROY_ID_WAIT_TIMEOUT 10000 /* msecs */
> +
> +static bool cm_force_noio;
> +module_param_named(force_noio, cm_force_noio, bool, 0444);
> +MODULE_PARM_DESC(force_noio, "Force the use of GFP_NOIO (Y/N)");
> +
> static const char * const ibcm_rej_reason_strs[] = {
> [IB_CM_REJ_NO_QP] = "no QP",
> [IB_CM_REJ_NO_EEC] = "no EEC",
> @@ -4504,6 +4510,10 @@ static void cm_remove_one(struct ib_device
> *ib_device, void *client_data) static int __init ib_cm_init(void) {
> int ret;
> + unsigned int noio_flags;
minor: please follow reverse xmas tree order
> +
> + if (cm_force_noio)
> + noio_flags = memalloc_noio_save();
>
> INIT_LIST_HEAD(&cm.device_list);
> rwlock_init(&cm.device_lock);
> @@ -4527,10 +4537,13 @@ static int __init ib_cm_init(void)
> if (ret)
> goto error3;
>
> - return 0;
> + goto error2;
> error3:
> destroy_workqueue(cm.wq);
> error2:
> + if (cm_force_noio)
> + memalloc_noio_restore(noio_flags);
> +
> return ret;
> }
>
> --
> 2.31.1
>
Powered by blists - more mailing lists