lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Message-ID: <8aeea62b-c947-6414-bca1-3bd3f427cd56@gmail.com> Date: Mon, 3 Oct 2022 10:48:31 -0700 From: James Smart <jsmart2021@...il.com> To: duoming@....edu.cn Cc: linux-nvme@...ts.infradead.org, linux-kernel@...r.kernel.org, james.smart@...adcom.com, kbusch@...nel.org, axboe@...com, hch@....de, sagi@...mberg.me Subject: Re: [PATCH] nvme-fc: fix sleep-in-atomic-context bug caused by nvme_fc_rcv_ls_req On 10/2/2022 7:56 PM, James Smart wrote: > On 10/2/2022 6:50 PM, duoming@....edu.cn wrote: >> Hello, >> >> On Sun, 2 Oct 2022 10:12:15 -0700 James Smart wrote: >> >>> On 10/1/2022 5:19 PM, Duoming Zhou wrote: >>>> The function lpfc_poll_timeout() is a timer handler that runs in an >>>> atomic context, but it calls "kzalloc(.., GFP_KERNEL)" that may sleep. >>>> As a result, the sleep-in-atomic-context bug will happen. The processes >>>> is shown below: >>>> >>>> lpfc_poll_timeout() >>>> lpfc_sli_handle_fast_ring_event() >>>> lpfc_sli_process_unsol_iocb() >>>> lpfc_complete_unsol_iocb() >>>> lpfc_nvme_unsol_ls_handler() >>>> lpfc_nvme_handle_lsreq() >>>> nvme_fc_rcv_ls_req() >>>> kzalloc(sizeof(.., GFP_KERNEL) //may sleep >>>> >>>> This patch changes the gfp_t parameter of kzalloc() from GFP_KERNEL to >>>> GFP_ATOMIC in order to mitigate the bug. >>>> >>>> Fixes: 14fd1e98afaf ("nvme-fc: Add Disconnect Association Rcv support") >>>> Signed-off-by: Duoming Zhou <duoming@....edu.cn> >>>> --- >>>> drivers/nvme/host/fc.c | 2 +- >>>> 1 file changed, 1 insertion(+), 1 deletion(-) >>>> >>>> diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c >>>> index 127abaf9ba5..36698dfc8b3 100644 >>>> --- a/drivers/nvme/host/fc.c >>>> +++ b/drivers/nvme/host/fc.c >>>> @@ -1754,7 +1754,7 @@ nvme_fc_rcv_ls_req(struct nvme_fc_remote_port >>>> *portptr, >>>> lsop = kzalloc(sizeof(*lsop) + >>>> sizeof(union nvmefc_ls_requests) + >>>> sizeof(union nvmefc_ls_responses), >>>> - GFP_KERNEL); >>>> + GFP_ATOMIC); >>>> if (!lsop) { >>>> dev_info(lport->dev, >>>> "RCV %s LS failed: No memory\n", >>> >>> I would prefer this was fixed within lpfc rather than introducing atomic >>> allocations (1st in either host or target transport). It was introduced >>> by lpfc change in irq handling style. >> >> Thank your for your reply and suggestions! >> >> Do you think change the lpfc_poll_timeout() to a delayed_work is better? >> >> Best regards, >> Duoming Zhou > > as a minimum: the lpfc_complete_unsol_iocb handler should be passing off > the iocb to a work queue routine - so that the context changes so that > either nvme host or nvmet ls callback routines can be called. If > possible, it should do the axchg alloc - to avoid a GFP_ATOMIC there as > well... > > It's usually best for these nvme LS's and ELS's to be done in a slow > path thread/work queue element. That may mean segmenting a little > earlier in the path. > > -- james > looking further... lpfc_poll_timeout() should only be used on an SLI-3 adapter. The existing SLI-3 adapters don't support NVMe. So I'm a little confused by this stack trace. Can you describe what the system config/software setup is and specifically what lpfc adapter is being used (dmesg attachment logs are sufficient, or lspci output). -- james
Powered by blists - more mailing lists