linux-kernel - Re: [PATCH] PCI/DOE: Remove asynchronous task support

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <Y3oyQhHUbYZGA80M@iweiny-mobl>
Date:   Sun, 20 Nov 2022 05:57:22 -0800
From:   Ira Weiny <ira.weiny@...el.com>
To:     Hillf Danton <hdanton@...a.com>
CC:     Dan Williams <dan.j.williams@...el.com>,
        Bjorn Helgaas <bhelgaas@...gle.com>,
        Li Ming <ming4.li@...el.com>,
        Vishal Verma <vishal.l.verma@...el.com>,
        Lukas Wunner <lukas@...ner.de>,
        Alison Schofield <alison.schofield@...el.com>,
        <linux-kernel@...r.kernel.org>, <linux-pci@...r.kernel.org>
Subject: Re: [PATCH] PCI/DOE: Remove asynchronous task support

On Sun, Nov 20, 2022 at 10:27:35AM +0800, Hillf Danton wrote:
> On Sat, 19 Nov 2022 14:25:27 -0800 Ira Weiny <ira.weiny@...el.com>
> > @@ -529,8 +492,18 @@ int pci_doe_submit_task(struct pci_doe_mb *doe_mb, struct pci_doe_task *task)
> >  		return -EIO;
> >  
> >  	task->doe_mb = doe_mb;
> > -	INIT_WORK(&task->work, doe_statemachine_work);
> > -	queue_work(doe_mb->work_queue, &task->work);
> > +
> > +again:
> > +	if (!mutex_trylock(&doe_mb->exec_lock)) {
> > +		if (wait_event_timeout(task->doe_mb->wq,
> > +				test_bit(PCI_DOE_FLAG_CANCEL, &doe_mb->flags),
> > +				PCI_DOE_POLL_INTERVAL))
> > +			return -EIO;
> 
> Is EIO worth a line of pr_warn()?

Maybe but I'm not sure it is worth it.  This was paralleling the original code
which called pci_doe_flush_mb() to shut down the mailbox.  So this is likely to
never happen.  The callers could print something if needed.

> 
> > +		goto again;
> > +	}
> > +	exec_task(task);
> > +	mutex_unlock(&doe_mb->exec_lock);
> > +
> 
> If it is likely to take two minutes to acquire the exec_lock after
> rounds of trying again, trylock + wait timeout barely make sense given EIO.

I'm not sure where 2 minutes come from?

#define PCI_DOE_TIMEOUT HZ
#define PCI_DOE_POLL_INTERVAL   (PCI_DOE_TIMEOUT / 128)

It is also not anticipated that more than 1 task is being given to the mailbox
but the protection needs to be there because exec_task() will get confused if
more than 1 thread submits at the same time.

All this said I've now convinced myself that there is a race in the use of
PCI_DOE_FLAG_CANCEL even with the existing code.

I believe that if the pci device goes away the doe_mb structure may get free'ed
prior to other threads having a chance to check doe_mb->flags.  Worse yet the
work queue itself (doe_mb->wq) may become invalid...

I don't believe this can currently happen because anyone using the doe_mb
structure has a reference to the pci device.

With this patch I think all the doe_mb->flags and the wait queue can go away.
pci_doe_wait() can be replaced with a simple msleep_interruptible().

Let me work through that a bit.

Ira

> 
> Hillf
> 
> /**
>  * wait_event_timeout - sleep until a condition gets true or a timeout elapses
>  * @wq_head: the waitqueue to wait on
>  * @condition: a C expression for the event to wait for
>  * @timeout: timeout, in jiffies
>  *
>  * The process is put to sleep (TASK_UNINTERRUPTIBLE) until the
>  * @condition evaluates to true. The @condition is checked each time
>  * the waitqueue @wq_head is woken up.
>  *
>  * wake_up() has to be called after changing any variable that could
>  * change the result of the wait condition.
>  *
>  * Returns:
>  * 0 if the @condition evaluated to %false after the @timeout elapsed,
>  * 1 if the @condition evaluated to %true after the @timeout elapsed,
>  * or the remaining jiffies (at least 1) if the @condition evaluated
>  * to %true before the @timeout elapsed.
>  */