linux-kernel - Re: [PATCH] nvme-fc: fix sleep-in-atomic-context bug caused by nvme_fc_rcv_ls

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <7bd9e071.1063f1.1839b89cefa.Coremail.duoming@zju.edu.cn>
Date:   Mon, 3 Oct 2022 09:50:43 +0800 (GMT+08:00)
From:   duoming@....edu.cn
To:     "James Smart" <jsmart2021@...il.com>
Cc:     linux-nvme@...ts.infradead.org, linux-kernel@...r.kernel.org,
        james.smart@...adcom.com, kbusch@...nel.org, axboe@...com,
        hch@....de, sagi@...mberg.me
Subject: Re: [PATCH] nvme-fc: fix sleep-in-atomic-context bug caused by
 nvme_fc_rcv_ls_req

Hello,

On Sun, 2 Oct 2022 10:12:15 -0700 James Smart wrote:

> On 10/1/2022 5:19 PM, Duoming Zhou wrote:
> > The function lpfc_poll_timeout() is a timer handler that runs in an
> > atomic context, but it calls "kzalloc(.., GFP_KERNEL)" that may sleep.
> > As a result, the sleep-in-atomic-context bug will happen. The processes
> > is shown below:
> > 
> > lpfc_poll_timeout()
> >   lpfc_sli_handle_fast_ring_event()
> >    lpfc_sli_process_unsol_iocb()
> >     lpfc_complete_unsol_iocb()
> >      lpfc_nvme_unsol_ls_handler()
> >       lpfc_nvme_handle_lsreq()
> >        nvme_fc_rcv_ls_req()
> >         kzalloc(sizeof(.., GFP_KERNEL) //may sleep
> > 
> > This patch changes the gfp_t parameter of kzalloc() from GFP_KERNEL to
> > GFP_ATOMIC in order to mitigate the bug.
> > 
> > Fixes: 14fd1e98afaf ("nvme-fc: Add Disconnect Association Rcv support")
> > Signed-off-by: Duoming Zhou <duoming@....edu.cn>
> > ---
> >   drivers/nvme/host/fc.c | 2 +-
> >   1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c
> > index 127abaf9ba5..36698dfc8b3 100644
> > --- a/drivers/nvme/host/fc.c
> > +++ b/drivers/nvme/host/fc.c
> > @@ -1754,7 +1754,7 @@ nvme_fc_rcv_ls_req(struct nvme_fc_remote_port *portptr,
> >   	lsop = kzalloc(sizeof(*lsop) +
> >   			sizeof(union nvmefc_ls_requests) +
> >   			sizeof(union nvmefc_ls_responses),
> > -			GFP_KERNEL);
> > +			GFP_ATOMIC);
> >   	if (!lsop) {
> >   		dev_info(lport->dev,
> >   			"RCV %s LS failed: No memory\n",
> 
> I would prefer this was fixed within lpfc rather than introducing atomic 
> allocations (1st in either host or target transport).  It was introduced 
> by lpfc change in irq handling style.

Thank your for your reply and suggestions!

Do you think change the lpfc_poll_timeout() to a delayed_work is better?

Best regards,
Duoming Zhou