linux-kernel - Re: [PATCH v3 1/4] PCI: Clean up pci_scan

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <8225cf74d6d81c3ce149d677b5bf47689d8e3fd2.camel@linux.ibm.com>
Date:   Thu, 21 Apr 2022 11:27:42 +0200
From:   Niklas Schnelle <schnelle@...ux.ibm.com>
To:     Bjorn Helgaas <helgaas@...nel.org>
Cc:     Bjorn Helgaas <bhelgaas@...gle.com>,
        Jan Kiszka <jan.kiszka@...mens.com>,
        Matthew Rosato <mjrosato@...ux.ibm.com>,
        Pierre Morel <pmorel@...ux.ibm.com>,
        linux-kernel@...r.kernel.org,
        virtualization@...ts.linux-foundation.org,
        linux-s390@...r.kernel.org, linux-pci@...r.kernel.org
Subject: Re: [PATCH v3 1/4] PCI: Clean up pci_scan_slot()

On Wed, 2022-04-20 at 21:14 -0500, Bjorn Helgaas wrote:
> Hi Niklas,
> 
> I'm sure this makes good sense, but I need a little more hand-holding.
> Sorry this is long and rambling.
> 
> On Tue, Apr 19, 2022 at 12:28:00PM +0200, Niklas Schnelle wrote:
> > While determining the next PCI function is factored out of
> > pci_scan_slot() into next_fn() the former still handles the first
> > function as a special case duplicating the code from the scan loop and
> > splitting the condition that the first function exits from it being
> > multifunction which is tested in next_fn().
> > 
> > Furthermore the non ARI branch of next_fn() mixes the case that
> > multifunction devices may have non-contiguous function ranges and dev
> > may thus be NULL with the multifunction requirement. It also signals
> > that no further functions need to be scanned by returning 0 which is
> > a valid function number.
> > 
> > Improve upon this by moving all conditions for having to scan for more
> > functions into next_fn() and make them obvious and commented.
> > 
> > By changing next_fn() to return -ENODEV instead of 0 when there is no
> > next function we can then handle the initial function inside the loop
> > and deduplicate the shared handling.
> > 
> > No functional change is intended.
> > 
> > Signed-off-by: Niklas Schnelle <schnelle@...ux.ibm.com>
> > ---
> >  drivers/pci/probe.c | 41 +++++++++++++++++++----------------------
> >  1 file changed, 19 insertions(+), 22 deletions(-)
> > 
> > diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> > index 17a969942d37..389aa1f9cb2c 100644
> > --- a/drivers/pci/probe.c
> > +++ b/drivers/pci/probe.c
> > @@ -2579,33 +2579,35 @@ struct pci_dev *pci_scan_single_device(struct pci_bus *bus, int devfn)
> >  }
> >  EXPORT_SYMBOL(pci_scan_single_device);
> >  
> > -static unsigned int next_fn(struct pci_bus *bus, struct pci_dev *dev,
> > -			    unsigned int fn)
> > +static int next_fn(struct pci_bus *bus, struct pci_dev *dev, int fn)
> >  {
> >  	int pos;
> >  	u16 cap = 0;
> >  	unsigned int next_fn;
> >  
> > -	if (pci_ari_enabled(bus)) {
> > -		if (!dev)
> > -			return 0;
> > +	if (dev && pci_ari_enabled(bus)) {
> 
> I think this would be easier to verify if we kept the explicit error
> return, e.g.,
> 
>   if (pci_ari_enabled(bus)) {
>     if (!dev)
>       return -ENODEV;
>     pos = pci_find_ext_capability(...);
> 
> Otherwise we have to sort through the !dev cases below.  I guess
> -ENODEV would come from either the "!fn && !dev" case or the "fn > 6"
> case, but it's not obvious to me that those are equivalent to the
> previous code.

We could keep this the same for this patch but I think for jailhouse
(patch 2) we need the "!dev" case not to fail here such that we can
handle the missing function 0 below even if ARI is enabled. For s390
this doesn't currently matter because pci_ari_enabled(bus) is always
false but I assumed that this isn't necessarily so for jailhouse. I
sent a follow up mail on a slight behavior change I can think of for
this case for v2 but forgot to send it also for v3. Quoted below:

"This part here theoretically changes the behavior slightly. If the ARI
information is wrong/lands us in a "hole" we may look for more
functions via the non-ARI path. Not sure if that is relevant though as
in the worst case we might find functions that we otherwise wouldn't
have seen. Seems rather obsure to me but I might be wrong, we currently
don't see the ARI capability in Linux on IBM Z so I have less
experience with this. I did of course boot test on my x86_64
workstation."

> 
> >  		pos = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_ARI);
> >  		if (!pos)
> > -			return 0;
> > +			return -ENODEV;
> >  
> >  		pci_read_config_word(dev, pos + PCI_ARI_CAP, &cap);
> >  		next_fn = PCI_ARI_CAP_NFN(cap);
> >  		if (next_fn <= fn)
> > -			return 0;	/* protect against malformed list */
> > +			return -ENODEV;	/* protect against malformed list */
> >  
> >  		return next_fn;
> >  	}
> >  
> > -	/* dev may be NULL for non-contiguous multifunction devices */
> > -	if (!dev || dev->multifunction)
> > -		return (fn + 1) % 8;
> > -
> > -	return 0;
> > +	/* only multifunction devices may have more functions */
> > +	if (dev && !dev->multifunction)
> > +		return -ENODEV;
> 
> I don't understand why the "!dev || dev->multifunction" test needs to
> change.  Isn't that valid even in the hypervisor case?  IIUC, you want
> to return success in some cases that currently return failure, so this
> case that was already success should be fine as it was.

This isn't a change to the test. It's the negation of the logical
condition *and* a switch of the branches i.e. keeps the overall
behavior exactly the same. The equivalence is !(!A || B) == (A && !B).
There are two reasons I did this.

1. I find (!dev || dev->multifunction) to be much harder to grasp than
(dev && !dev->multifunction).

2. The whole next_fn() in my opinion becomes easier to read if it bails
for all bad cases early and the "this is the next fn" is the final
return if we didn't bail. This becomes even more true as another
condition is added in patch 2.

> 
> Is this because "(fn + 1) % 8" may be zero, which previously
> terminated the loop, but now it doesn't because "fn == 0" is the
> *first* execution of the loop?

Yes with function 0 handled in the loop we can't use 0 as the
termination indication. Also I find it generally weird to use a wrap
around for this.

> 
> If so, I wonder if we could avoid that case by adding:
> 
>   if (fn >= 7)
>     return -ENODEV;
> 
> at the very beginning.  Maybe that would allow a more trivial patch
> that just changed the error return from 0 to -ENODEV, i.e., leaving
> all the logic in next_fn() unchanged?

I think this is equivalent to the ternary at the return. Both return
-ENODEV for fn >= 7. I do like your idea better though as it keeps with
the scheme of my point 2 above and ternaries are ever so slightly
harder to read.

> 
> I'm wondering if this could end up like:
> 
>     if (fn >= 7)
>       return -ENODEV;
> 
>     if (pci_ari_enabled(bus)) {
>       if (!dev)
> 	return -ENODEV;
>       ...
>       return next_fn;
>     }
> 
>     if (!dev || dev->multifunction)
>       return (fn + 1) % 8;
> 
>  +  if (hypervisor_isolated_pci_functions())
>  +    return (fn + 1) % 8;
> 
>     return -ENODEV;
> 
> (The hypervisor part being added in a subsequent patch, and I'm not
> sure exactly what logic you need there -- the point being that it's
> just an additional success case.)

Yes pretty much only that by negating the success case and switching
the branches we end up with a list of fail/bail checks and a single
success return even with the hyperisor check added. Also not sure if
the "fn >= 7" check should rather go after the ARI path to keep them
separate doesn't really matter of course.

> 
> The "% 8" seems possibly superfluous then, since previously that
> caused a zero return that terminated the loop.  If we're using -ENODEV
> to terminate the loop, we probably don't care about the mod 8.

Yes

> 
> > +	/*
> > +	 * A function 0 is required but multifunction devices may
> > +	 * be non-contiguous so dev can be NULL otherwise.
> 
> I understood the original "dev may be NULL ..." comment, but I can't
> quite parse this.  "dev can be NULL" for non-zero functions?  That's
> basically what it said before, but it's not clear what "otherwise"
> refers to.

I agree this can probably be improved. I'm trying to say that dev can
be NULL if it is not function 0 which must exist. Maybe:

"dev may be NULL as multifunction devices may be non-contiguous but a
function 0 is required"

> 
> > +	 */
> > +	if (!fn && !dev)
> > +		return -ENODEV;
> 
> This part isn't obvious to me yet, partly because of the "!fn && !dev"
> construction.  The negatives make it hard to parse.
> 
> Since "fn" isn't a boolean or a pointer, I think "fn == 0" is easier
> to read than "!fn".  I would test "dev" first since it logically
> precedes "fn".

I agree about the "fn == 0", I only used "!fn" because I remember
getting checkpatch warnings for "foo == 0" in the past. I'll change to
fn == 0. As for the order see below.

> 
> IIUC !dev means we haven't found a function at this device number yet.
> So this:
> 
>   if (!dev && fn == 0)
>     return -ENODEV;
> 
> means we called pci_scan_single_device(bus, devfn + 0) the first time
> through the loop, and it didn't find a device so it returned NULL.

Yes. This is "dev may be NULL unless we're looking at function 0". The
fn came before dev because I wrote it as "function 0 must not be NULL"
but it could also be "dev is NULL and we're looking at function 0",
I have no clear preference.

This is also the case that gets changed by patch 2 to become:

"function 0 must not be NULL unless we have isolated PCI functions"

or with the order switched:

"dev is NULL and we're looking at function 0 and don't have isolated
PCI functions"

> 
> > +	return (fn <= 6) ? fn + 1 : -ENODEV;
> >  }
> >  
> >  static int only_one_child(struct pci_bus *bus)
> > @@ -2643,24 +2645,19 @@ static int only_one_child(struct pci_bus *bus)
> >   */
> >  int pci_scan_slot(struct pci_bus *bus, int devfn)
> >  {
> > -	unsigned int fn, nr = 0;
> > -	struct pci_dev *dev;
> > +	int fn, nr = 0;
> > +	struct pci_dev *dev = NULL;
> >  
> >  	if (only_one_child(bus) && (devfn > 0))
> >  		return 0; /* Already scanned the entire slot */
> >  
> > -	dev = pci_scan_single_device(bus, devfn);
> > -	if (!dev)
> > -		return 0;
> > -	if (!pci_dev_is_added(dev))
> > -		nr++;
> > -
> > -	for (fn = next_fn(bus, dev, 0); fn > 0; fn = next_fn(bus, dev, fn)) {
> > +	for (fn = 0; fn >= 0; fn = next_fn(bus, dev, fn)) {
> >  		dev = pci_scan_single_device(bus, devfn + fn);
> 
> "devfn + fn" (in the existing, unchanged code) is a little bit weird.
> In almost all cases, devfn is the result of "PCI_DEVFN(slot, 0)", so
> we could make the interface:
> 
>   pci_scan_slot(struct pci_bus *bus, int dev)
> 
> where "dev" is 0-31.
> 
> The only exceptions are a couple hotplug drivers where the fn probably
> is or should be 0, too, but I haven't verified that.
> 
> But this would be scope creep, so possibly something we could consider
> in the future, but not for this series.

Hmm, I see your point. It makes little sense to have a devfn that isn't
from PCI_DEVFN(slot, 0) and not use pci_scan_single_device() instead.

> 
> >  		if (dev) {
> >  			if (!pci_dev_is_added(dev))
> >  				nr++;
> > -			dev->multifunction = 1;
> > +			if (nr > 1)
> > +				dev->multifunction = 1;
> >  		}
> >  	}
> >  
> > -- 
> > 2.32.0
> > 


Download attachment "signature.asc" of type "application/pgp-signature" (229 bytes)