[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20240517173048.GA69273@ziepe.ca>
Date: Fri, 17 May 2024 14:30:48 -0300
From: Jason Gunthorpe <jgg@...pe.ca>
To: Haakon Bugge <haakon.bugge@...cle.com>
Cc: OFED mailing list <linux-rdma@...r.kernel.org>,
open list <linux-kernel@...r.kernel.org>,
netdev <netdev@...r.kernel.org>,
"rds-devel@....oracle.com" <rds-devel@....oracle.com>,
Leon Romanovsky <leon@...nel.org>,
Saeed Mahameed <saeedm@...dia.com>,
Tariq Toukan <tariqt@...dia.com>,
"David S . Miller" <davem@...emloft.net>,
Eric Dumazet <edumazet@...gle.com>,
Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>,
Tejun Heo <tj@...nel.org>, Lai Jiangshan <jiangshanlai@...il.com>,
Allison Henderson <allison.henderson@...cle.com>,
Manjunath Patil <manjunath.b.patil@...cle.com>,
Mark Zhang <markzhang@...dia.com>,
Chuck Lever III <chuck.lever@...cle.com>,
Shiraz Saleem <shiraz.saleem@...el.com>,
Yang Li <yang.lee@...ux.alibaba.com>
Subject: Re: [PATCH 0/6] rds: rdma: Add ability to force GFP_NOIO
On Tue, May 14, 2024 at 06:19:53PM +0000, Haakon Bugge wrote:
> Hi Jason,
>
>
> > On 14 May 2024, at 01:03, Jason Gunthorpe <jgg@...pe.ca> wrote:
> >
> > On Mon, May 13, 2024 at 02:53:40PM +0200, HÃ¥kon Bugge wrote:
> >> This series enables RDS and the RDMA stack to be used as a block I/O
> >> device. This to support a filesystem on top of a raw block device
> >> which uses RDS and the RDMA stack as the network transport layer.
> >>
> >> Under intense memory pressure, we get memory reclaims. Assume the
> >> filesystem reclaims memory, goes to the raw block device, which calls
> >> into RDS, which calls the RDMA stack. Now, if regular GFP_KERNEL
> >> allocations in RDS or the RDMA stack require reclaims to be fulfilled,
> >> we end up in a circular dependency.
> >>
> >> We break this circular dependency by:
> >>
> >> 1. Force all allocations in RDS and the relevant RDMA stack to use
> >> GFP_NOIO, by means of a parenthetic use of
> >> memalloc_noio_{save,restore} on all relevant entry points.
> >
> > I didn't see an obvious explanation why each of these changes was
> > necessary. I expected this:
> >
> >> 2. Make sure work-queues inherits current->flags
> >> wrt. PF_MEMALLOC_{NOIO,NOFS}, such that work executed on the
> >> work-queue inherits the same flag(s).
>
> When the modules initialize, it does not help to have 2., unless
> PF_MEMALLOC_NOIO is set in current->flags. That is most probably not
> set, e.g. considering modprobe. That is why we have these steps in
> all the five modules. During module initialization, work queues are
> allocated in all mentioned modules. Therefore, the module
> initialization functions need the paranthetic use of
> memalloc_noio_{save,restore}.
And why would I need these work queues to have noio? they are never
called under a filesystem.
You need to explain in every single case how something in a NOIO
context becomes entangled with the unrelated thing you are taggin NIO.
Historically when we've tried to do this we gave up because the entire
subsystem end up being NOIO.
> > And further, is there any validation of this? There is some lockdep
> > tracking of reclaim, I feel like it should be more robustly hooked up
> > in RDMA if we expect this to really work..
>
> Oracle is about to launch a product using this series, so the
> techniques used have been thoroughly validated, allthough on an
> older kernel version.
That doesn't really help keep it working. I want to see some kind of
lockdep scheme to enforce this that can validate without ever
triggering reclaim.
Jason
Powered by blists - more mailing lists