lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Message-ID: <118cdb2e.1067d5.183a2b07fa7.Coremail.duoming@zju.edu.cn> Date: Tue, 4 Oct 2022 19:10:19 +0800 (GMT+08:00) From: duoming@....edu.cn To: "James Smart" <jsmart2021@...il.com> Cc: linux-nvme@...ts.infradead.org, linux-kernel@...r.kernel.org, james.smart@...adcom.com, kbusch@...nel.org, axboe@...com, hch@....de, sagi@...mberg.me Subject: Re: [PATCH] nvme-fc: fix sleep-in-atomic-context bug caused by nvme_fc_rcv_ls_req Hello, On Mon, 3 Oct 2022 10:48:31 -0700 James Smart wrote: > On 10/2/2022 7:56 PM, James Smart wrote: > > On 10/2/2022 6:50 PM, duoming@....edu.cn wrote: > >> Hello, > >> > >> On Sun, 2 Oct 2022 10:12:15 -0700 James Smart wrote: > >> > >>> On 10/1/2022 5:19 PM, Duoming Zhou wrote: > >>>> The function lpfc_poll_timeout() is a timer handler that runs in an > >>>> atomic context, but it calls "kzalloc(.., GFP_KERNEL)" that may sleep. > >>>> As a result, the sleep-in-atomic-context bug will happen. The processes > >>>> is shown below: > >>>> > >>>> lpfc_poll_timeout() > >>>> lpfc_sli_handle_fast_ring_event() > >>>> lpfc_sli_process_unsol_iocb() > >>>> lpfc_complete_unsol_iocb() > >>>> lpfc_nvme_unsol_ls_handler() > >>>> lpfc_nvme_handle_lsreq() > >>>> nvme_fc_rcv_ls_req() > >>>> kzalloc(sizeof(.., GFP_KERNEL) //may sleep > >>>> > >>>> This patch changes the gfp_t parameter of kzalloc() from GFP_KERNEL to > >>>> GFP_ATOMIC in order to mitigate the bug. > >>>> > >>>> Fixes: 14fd1e98afaf ("nvme-fc: Add Disconnect Association Rcv support") > >>>> Signed-off-by: Duoming Zhou <duoming@....edu.cn> > >>>> --- > >>>> drivers/nvme/host/fc.c | 2 +- > >>>> 1 file changed, 1 insertion(+), 1 deletion(-) > >>>> > >>>> diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c > >>>> index 127abaf9ba5..36698dfc8b3 100644 > >>>> --- a/drivers/nvme/host/fc.c > >>>> +++ b/drivers/nvme/host/fc.c > >>>> @@ -1754,7 +1754,7 @@ nvme_fc_rcv_ls_req(struct nvme_fc_remote_port > >>>> *portptr, > >>>> lsop = kzalloc(sizeof(*lsop) + > >>>> sizeof(union nvmefc_ls_requests) + > >>>> sizeof(union nvmefc_ls_responses), > >>>> - GFP_KERNEL); > >>>> + GFP_ATOMIC); > >>>> if (!lsop) { > >>>> dev_info(lport->dev, > >>>> "RCV %s LS failed: No memory\n", > >>> > >>> I would prefer this was fixed within lpfc rather than introducing atomic > >>> allocations (1st in either host or target transport). It was introduced > >>> by lpfc change in irq handling style. > >> > >> Thank your for your reply and suggestions! > >> > >> Do you think change the lpfc_poll_timeout() to a delayed_work is better? > >> > >> Best regards, > >> Duoming Zhou > > > > as a minimum: the lpfc_complete_unsol_iocb handler should be passing off > > the iocb to a work queue routine - so that the context changes so that > > either nvme host or nvmet ls callback routines can be called. If > > possible, it should do the axchg alloc - to avoid a GFP_ATOMIC there as > > well... > > > > It's usually best for these nvme LS's and ELS's to be done in a slow > > path thread/work queue element. That may mean segmenting a little > > earlier in the path. > > > > -- james > > > > looking further... lpfc_poll_timeout() should only be used on an SLI-3 > adapter. The existing SLI-3 adapters don't support NVMe. So I'm a > little confused by this stack trace. I found this problem through a static analysis tool wroten by myself. I think the hacker may simulate the hardware to trigger this stack trace. So, I send the patch to correct the problem. > Can you describe what the system config/software setup is and > specifically what lpfc adapter is being used (dmesg attachment logs are > sufficient, or lspci output). Best regards, Duoming Zhou
Powered by blists - more mailing lists