linux-kernel - Re: [RFC PATCH 05/14] nvmet: Send an AEN on CCR completion

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <eaf73b5a-a438-4afe-a76e-148a7d9744ef@grimberg.me>
Date: Sat, 27 Dec 2025 11:48:49 +0200
From: Sagi Grimberg <sagi@...mberg.me>
To: Mohamed Khalfella <mkhalfella@...estorage.com>
Cc: Chaitanya Kulkarni <kch@...dia.com>, Christoph Hellwig <hch@....de>,
 Jens Axboe <axboe@...nel.dk>, Keith Busch <kbusch@...nel.org>,
 Aaron Dailey <adailey@...estorage.com>,
 Randy Jennings <randyj@...estorage.com>, John Meneghini
 <jmeneghi@...hat.com>, Hannes Reinecke <hare@...e.de>,
 linux-nvme@...ts.infradead.org, linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH 05/14] nvmet: Send an AEN on CCR completion



On 25/12/2025 20:13, Mohamed Khalfella wrote:
> On Thu 2025-12-25 15:23:51 +0200, Sagi Grimberg wrote:
>>
>> On 26/11/2025 4:11, Mohamed Khalfella wrote:
>>> Send an AEN to initiator when impacted controller exists. The
>>> notification points to CCR log page that initiator can read to check
>>> which CCR operation completed.
>>>
>>> Signed-off-by: Mohamed Khalfella <mkhalfella@...estorage.com>
>>> ---
>>>    drivers/nvme/target/core.c  | 27 +++++++++++++++++++++++----
>>>    drivers/nvme/target/nvmet.h |  3 ++-
>>>    include/linux/nvme.h        |  3 +++
>>>    3 files changed, 28 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/drivers/nvme/target/core.c b/drivers/nvme/target/core.c
>>> index 7dbe9255ff42..60173833c3eb 100644
>>> --- a/drivers/nvme/target/core.c
>>> +++ b/drivers/nvme/target/core.c
>>> @@ -202,7 +202,7 @@ static void nvmet_async_event_work(struct work_struct *work)
>>>    	nvmet_async_events_process(ctrl);
>>>    }
>>>    
>>> -void nvmet_add_async_event(struct nvmet_ctrl *ctrl, u8 event_type,
>>> +static void nvmet_add_async_event_locked(struct nvmet_ctrl *ctrl, u8 event_type,
>>>    		u8 event_info, u8 log_page)
>>>    {
>>>    	struct nvmet_async_event *aen;
>>> @@ -215,12 +215,17 @@ void nvmet_add_async_event(struct nvmet_ctrl *ctrl, u8 event_type,
>>>    	aen->event_info = event_info;
>>>    	aen->log_page = log_page;
>>>    
>>> -	mutex_lock(&ctrl->lock);
>>>    	list_add_tail(&aen->entry, &ctrl->async_events);
>>> -	mutex_unlock(&ctrl->lock);
>>>    
>>>    	queue_work(nvmet_wq, &ctrl->async_event_work);
>>>    }
>>> +void nvmet_add_async_event(struct nvmet_ctrl *ctrl, u8 event_type,
>>> +		u8 event_info, u8 log_page)
>>> +{
>>> +	mutex_lock(&ctrl->lock);
>>> +	nvmet_add_async_event_locked(ctrl, event_type, event_info, log_page);
>>> +	mutex_unlock(&ctrl->lock);
>>> +}
>>>    
>>>    static void nvmet_add_to_changed_ns_log(struct nvmet_ctrl *ctrl, __le32 nsid)
>>>    {
>>> @@ -1788,6 +1793,18 @@ struct nvmet_ctrl *nvmet_alloc_ctrl(struct nvmet_alloc_ctrl_args *args)
>>>    }
>>>    EXPORT_SYMBOL_GPL(nvmet_alloc_ctrl);
>>>    
>>> +static void nvmet_ctrl_notify_ccr(struct nvmet_ctrl *ctrl)
>>> +{
>>> +	lockdep_assert_held(&ctrl->lock);
>>> +
>>> +	if (nvmet_aen_bit_disabled(ctrl, NVME_AEN_BIT_CCR_COMPLETE))
>>> +		return;
>>> +
>>> +	nvmet_add_async_event_locked(ctrl, NVME_AER_NOTICE,
>>> +				     NVME_AER_NOTICE_CCR_COMPLETED,
>>> +				     NVME_LOG_CCR);
>>> +}
>>> +
>>>    static void nvmet_ctrl_complete_pending_ccr(struct nvmet_ctrl *ctrl)
>>>    {
>>>    	struct nvmet_subsys *subsys = ctrl->subsys;
>>> @@ -1801,8 +1818,10 @@ static void nvmet_ctrl_complete_pending_ccr(struct nvmet_ctrl *ctrl)
>>>    	list_for_each_entry(sctrl, &subsys->ctrls, subsys_entry) {
>>>    		mutex_lock(&sctrl->lock);
>>>    		list_for_each_entry(ccr, &sctrl->ccrs, entry) {
>>> -			if (ccr->ctrl == ctrl)
>>> +			if (ccr->ctrl == ctrl) {
>>> +				nvmet_ctrl_notify_ccr(sctrl);
>>>    				ccr->ctrl = NULL;
>>> +			}
>> Is this double loop necessary? Would you have more than one controller
>> cross resetting the same
> As it is implemented now CCRs are linked to sctrl. This decision can be
> revisited if found suboptimal. At some point I had CCRs linked to
> ctrl->subsys but that led to lock ordering issues. Double loop is
> necessary to find all CCRs in all controllers and mark them done.
> Yes, it is possible to have more than one sctrl resetting the same
> ictrl.

I'm more interested in simplifying.

>
>> controller? Won't it be better to install a callback+opaque that the
>> controller removal will call?
> Can you elaborate more on that? Better in what terms?
>
> nvmet_ctrl_complete_pending_ccr() is called from nvmet_ctrl_free() when
> we know that ctrl->ref is zero and no new CCRs will be added to this
> controller because nvmet_ctrl_find_get_ccr() will not be able to get it.

In nvmet, the controller is serving a single host. Hence I am not sure I
understand how multiple source controllers will try to reset the impacted
controller. So, if there is a 1-1 relationship between source and impacted
controller, I'd perhaps suggest to simplify and install on the impacted 
controller
callback+opaque (e.g. void *data) instead of having it iterate and then 
actually send
the AEN from the impacted controller.