[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <39f42a84-57d9-51bb-401d-b7ecf685bd78@hisilicon.com>
Date: Tue, 17 Dec 2024 14:09:11 +0800
From: Junxian Huang <huangjunxian6@...ilicon.com>
To: Jason Gunthorpe <jgg@...dia.com>
CC: <leon@...nel.org>, <linux-rdma@...r.kernel.org>, <linuxarm@...wei.com>,
<linux-kernel@...r.kernel.org>, <tangchengchang@...wei.com>
Subject: Re: [PATCH for-next] RDMA/hns: Support mmapping reset state to
userspace
On 2024/12/13 20:49, Jason Gunthorpe wrote:
> On Fri, Dec 13, 2024 at 05:37:58PM +0800, Junxian Huang wrote:
>>> But your reset flow partially disassociates the device, when the
>>> userspace goes back to sleep, or rearms the CQ, it should get a hard
>>> fail and do a full cleanup without relying on flushing.
>>
>> Not sure if I got your point, when you said "the userspace goes back to sleep",
>> did you mean the ibv_get_async_event() api? Are you suggesting that userspace
>> should call ibv_get_async_event() to monitor async events, and when it gets a
>> fatal event, it should stop polling CQs and clean up everything instead of
>> still waiting for the remaining CQEs?
>
> Yes, it should do that as well. This is wha the devce fatal event is
> for.
>
> I'm also saying that any kernel systems calls, like sleeping for CQ
> events should start failing too.
>
> Jason
Thanks. I took a cursory look at some open-source userspace projects,
UCX and SPDK handle the device fatal event properly by doing cleanup.
But Ceph doesn't seem to have any special handling except for logs..
Junxian
Powered by blists - more mailing lists