linux-kernel - Re: [PATCH] scsi: pm80xx: Remove msleep() loop from pm8001_dev_gone

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <CAL54JgdupgdfeBQwETPv3jCh8iYROqA_DthLQ8cJf7Th1XSV_g@mail.gmail.com>
Date: Mon, 18 Nov 2024 10:00:09 -0800
From: TJ Adams <tadamsjr@...gle.com>
To: John Garry <john.g.garry@...cle.com>
Cc: Jack Wang <jinpu.wang@...ud.ionos.com>, 
	"James E . J . Bottomley" <James.Bottomley@...senpartnership.com>, 
	"Martin K . Petersen" <martin.petersen@...cle.com>, linux-scsi@...r.kernel.org, 
	linux-kernel@...r.kernel.org, Igor Pylypiv <ipylypiv@...gle.com>
Subject: Re: [PATCH] scsi: pm80xx: Remove msleep() loop from pm8001_dev_gone_notify()

Sorry for the late response.

> > It's possible to end up in a state where pm8001_dev->running_req never
> > reaches zero.
>
> Is that a driver bug then?

I haven't seen this unless artificially creating the situation. This
is a preventative change rather than a response to a specific issue
seen.

> > In that state we will be sleeping forever.
> >
> > sas_execute_internal_abort_dev() can wait for a response for
> > up to 60 seconds (3 retries x 20 seconds). 60 seconds should be enough
> > for pm8001_dev->running_req to get to zero.

> May I suggest you drop running_req at some stage, and use other methods
> to find how many IOs are active?
I haven't given much thought about better ways to keep track of active
ios, so it will have to come later but definitely noted!

On Tue, Jul 9, 2024 at 9:09 AM John Garry <john.g.garry@...cle.com> wrote:
>
> On 09/07/2024 17:00, TJ Adams wrote:
> > From: Igor Pylypiv <ipylypiv@...gle.com>
> >
> > It's possible to end up in a state where pm8001_dev->running_req never
> > reaches zero.
>
> Is that a driver bug then?
>
> > In that state we will be sleeping forever.
> >
> > sas_execute_internal_abort_dev() can wait for a response for
> > up to 60 seconds (3 retries x 20 seconds). 60 seconds should be enough
> > for pm8001_dev->running_req to get to zero.
>
> May I suggest you drop running_req at some stage, and use other methods
> to find how many IOs are active?
>
> >
> > Signed-off-by: Igor Pylypiv <ipylypiv@...gle.com>
> > Signed-off-by: TJ Adams <tadamsjr@...gle.com>
> > ---
> >   drivers/scsi/pm8001/pm8001_sas.c | 7 +++++--
> >   1 file changed, 5 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/scsi/pm8001/pm8001_sas.c b/drivers/scsi/pm8001/pm8001_sas.c
> > index a5a31dfa4512..513e9a49838c 100644
> > --- a/drivers/scsi/pm8001/pm8001_sas.c
> > +++ b/drivers/scsi/pm8001/pm8001_sas.c
> > @@ -712,8 +712,11 @@ static void pm8001_dev_gone_notify(struct domain_device *dev)
> >               if (atomic_read(&pm8001_dev->running_req)) {
> >                       spin_unlock_irqrestore(&pm8001_ha->lock, flags);
> >                       sas_execute_internal_abort_dev(dev, 0, NULL);
> > -                     while (atomic_read(&pm8001_dev->running_req))
> > -                             msleep(20);
> > +                     if (atomic_read(&pm8001_dev->running_req)) {
> > +                             pm8001_dbg(pm8001_ha, FAIL,
> > +                                        "device_id: %u: Failed to abort %d requests!\n",
> > +                                        device_id, atomic_read(&pm8001_dev->running_req));
> > +                     }
> >                       spin_lock_irqsave(&pm8001_ha->lock, flags);
> >               }
> >               PM8001_CHIP_DISP->dereg_dev_req(pm8001_ha, device_id);
>