[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <443775e1-ed94-452e-8f06-eadb777bede4@oracle.com>
Date: Tue, 14 Nov 2023 15:53:47 -0800
From: junxiao.bi@...cle.com
To: Tejun Heo <tj@...nel.org>
Cc: linux-kernel@...r.kernel.org, linux-raid@...r.kernel.org,
gregkh@...uxfoundation.org
Subject: Re: [PATCH] kernfs: support kernfs notify in memory recliam context
On 11/14/23 12:24 PM, Tejun Heo wrote:
> Hello,
>
> On Tue, Nov 14, 2023 at 12:09:19PM -0800, junxiao.bi@...cle.com wrote:
>> On 11/14/23 11:06 AM, Tejun Heo wrote:
>>> On Tue, Nov 14, 2023 at 10:59:47AM -0800, Junxiao Bi wrote:
>>>> kernfs notify is used in write path of md (md_write_start) to wake up
>>>> userspace daemon, like "mdmon" for updating md superblock of imsm raid,
>>>> md write will wait for that update done before issuing the write, if this
>>> How is forward progress guarnateed for that userspace daemon? This sounds
>>> like a really fragile setup.
>> For imsm raid, userspace daemon "mdmon" is responsible for updating raid
>> metadata, kernel will use kernfs_notify to wake up the daemon anywhere
>> metadata update is required. If the daemon can't move forward, write may
>> hung, but that will be a bug in the daemon?
> I see. That sounds very fragile and I'm not quite sure that can ever be made
> to work reliably. While memlocking everything needed and being really
> judicious about which syscalls to make will probably get you pretty far,
> there are things like task_work which gets scheduled and executed when a
> task is about to return to userspace. Those things are allowed to grab e.g.
> mutexes in the kernel and allocate memory and so on, and can deadlock under
> memory pressure.
>
> Even just looking at kernfs_notify, it down_reads()
> root->kernfs_supers_rwsem which might already be write-locked by somebody
> who's waiting on memory allocation.
>
> The patch you're proposing removes one link in the dependency chain but
> there are many on there. I'm not sure this is fixable. Nobody writes kernel
> code thinking that userspace code can be on the memory reclaim dependency
> chain.
Understood, thanks. Sound like depending on Userspace on memory reclaim
path is really bad idea and the only option for fixing it is to remove
that dependency, but i am not sure that is possible without breaking the
consistency of metadata.
Thanks,
Junxiao.
>
> Thanks.
>
Powered by blists - more mailing lists