[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <195F50E8-2E4D-416B-A24A-4602321FABB7@vmware.com>
Date: Thu, 20 Jan 2022 17:31:40 +0000
From: Vikash Bansal <bvikas@...are.com>
To: Greg KH <gregkh@...uxfoundation.org>
CC: "bhelgaas@...gle.com" <bhelgaas@...gle.com>,
"linux-pci@...r.kernel.org" <linux-pci@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
Srivatsa Bhat <srivatsab@...are.com>,
"srivatsa@...il.mit.edu" <srivatsa@...il.mit.edu>,
Alexey Makhalov <amakhalov@...are.com>,
Srinidhi Rao <srinidhir@...are.com>,
Anish Swaminathan <anishs@...are.com>,
Vasavi Sirnapalli <vsirnapalli@...are.com>,
Ajay Kaher <akaher@...are.com>
Subject: Re: [PATCH] PCI: Speed up device init by parsing capabilities all at
once
On 20/01/22, 11:56 AM, "Greg KH" <gregkh@...uxfoundation.org> wrote:
Hi Greg,
Thanks for the comments
>On Tue, Jan 18, 2022 at 09:16:01AM -0800, Vikash Bansal wrote:
>> In the current implementation, the PCI capability list is parsed from
>> the beginning to find each capability, which results in a large number
>> of redundant PCI reads.
>>
>> Instead, we can parse the complete list just once, store it in the
>> pci_dev structure, and get the offset of each capability directly from
>> the pci_dev structure.
>>
>> This implementation improves pci devices initialization time by ~2-3% in
>> case of bare metal and 7-8% in case of VM running on ESXi.
>
>What is that in terms of "wall clock" time? % is hard to know here, and
>of course it will depend on the PCI bus speed, right?
>
In terms of "wall clock" time:
For bare-metal it reduced from 270ms to 261ms
And for VM it reduced from 201ms to 184ms.
>> It also adds a memory overhead of 20bytes (value of PCI_CAP_ID_MAX) per
>> PCI device.
>>
>> Signed-off-by: Vikash Bansal <bvikas@...are.com>
>> ---
>> drivers/pci/pci.c | 43 ++++++++++++++++++++++++++++++++++++-------
>> drivers/pci/probe.c | 5 +++++
>> include/linux/pci.h | 2 ++
>> 3 files changed, 43 insertions(+), 7 deletions(-)
>>
>> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
>> index 3d2fb394986a..8e024db30262 100644
>> --- a/drivers/pci/pci.c
>> +++ b/drivers/pci/pci.c
>> @@ -468,6 +468,41 @@ static u8 __pci_bus_find_cap_start(struct pci_bus *bus,
>> return 0;
>> }
>>
>> +
>> +/**
>> + * pci_find_all_capabilities - Read all capabilities
>> + * @dev: the PCI device
>> + *
>> + * Read all capabilities and store offsets in cap_off
>> + * array in pci_dev structure.
>> + */
>> +void pci_find_all_capabilities(struct pci_dev *dev)
>> +{
>> + int ttl = PCI_FIND_CAP_TTL;
>> + u16 ent;
>> + u8 pos;
>> + u8 id;
>> +
>> + pos = __pci_bus_find_cap_start(dev->bus, dev->devfn, dev->hdr_type);
>> + if (!pos)
>> + return;
>> + pci_bus_read_config_byte(dev->bus, dev->devfn, pos, &pos);
>> + while (ttl--) {
>> + if (pos < 0x40)
>
>What is this magic value of 0x40?
>
0x40 is the start address of capability list. This code is copied from function __pci_find_next_cap_ttl
>> + break;
>> + pos &= ~3;
>
>Why ~3?
>
Capability start address is 4 byte aligned. This code is also copied from __pci_find_next_cap_ttl.
>> + pci_bus_read_config_word(dev->bus, dev->devfn, pos, &ent);
>> + id = ent & 0xff;
>
>Do you really need the & if you are truncating it?
>
Yes, this is not really required. But again, this code is copied from __pci_find_next_cap_ttl.
>> + if (id == 0xff)
>> + break;
>> +
>> + /* Read first instance of capability */
>> + if (!(dev->cap_off[id]))
>> + dev->cap_off[id] = pos;
>
>Shouldn't you have checked this before you read the value?
>
Yes, will move this code
>> + pos = (ent >> 8);
>
>What about walking the list using __pci_find_next_cap() like before?
>Why is this somehow the same as the old function?
>
__pci_find_next_cap() is used to find a given capability,
It can't be used to walk through the list in this case.
>> + }
>> +}
>> +
>> /**
>> * pci_find_capability - query for devices' capabilities
>> * @dev: PCI device to query
>> @@ -489,13 +524,7 @@ static u8 __pci_bus_find_cap_start(struct pci_bus *bus,
>> */
>> u8 pci_find_capability(struct pci_dev *dev, int cap)
>> {
>> - u8 pos;
>> -
> - pos = __pci_bus_find_cap_start(dev->bus, dev->devfn, dev->hdr_type);
>> - if (pos)
>> - pos = __pci_find_next_cap(dev->bus, dev->devfn, pos, cap);
>> -
>> - return pos;
>> + return dev->cap_off[cap];
>> }
>> EXPORT_SYMBOL(pci_find_capability);
>>
>> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
>> index 087d3658f75c..bacab12cedbb 100644
>> --- a/drivers/pci/probe.c
>> +++ b/drivers/pci/probe.c
>> @@ -1839,6 +1839,11 @@ int pci_setup_device(struct pci_dev *dev)
>> dev->hdr_type = hdr_type & 0x7f;
>> dev->multifunction = !!(hdr_type & 0x80);
>> dev->error_state = pci_channel_io_normal;
>> + /*
>> + * Read all capabilities and store offsets in cap_off
>> + * array in pci_dev structure.
>> + */
>
>Comment is not needed if the function name is descriptive.
>
ok
>> + pci_find_all_capabilities(dev);
>
>And it is, so no need for the comment.
>
>> set_pcie_port_type(dev);
>>
>> pci_set_of_node(dev);
>> diff --git a/include/linux/pci.h b/include/linux/pci.h
>> index 18a75c8e615c..d221c73e67f8 100644
>> --- a/include/linux/pci.h
>> +++ b/include/linux/pci.h
>> @@ -326,6 +326,7 @@ struct pci_dev {
>> unsigned int class; /* 3 bytes: (base,sub,prog-if) */
>> u8 revision; /* PCI revision, low byte of class word */
>> u8 hdr_type; /* PCI header type (`multi' flag masked out) */
>> + u8 cap_off[PCI_CAP_ID_MAX]; /* Offsets of all pci capabilities */
>
>Did you run 'pahole' to ensure you are not adding extra padding bytes
>here?
>
>> #ifdef CONFIG_PCIEAER
>> u16 aer_cap; /* AER capability offset */
>> struct aer_stats *aer_stats; /* AER stats for this device */
>> @@ -1128,6 +1129,7 @@ void pci_sort_breadthfirst(void);
>>
>> u8 pci_bus_find_capability(struct pci_bus *bus, unsigned int devfn, int cap);
>> u8 pci_find_capability(struct pci_dev *dev, int cap);
>> +void pci_find_all_capabilities(struct pci_dev *dev);
>
>Why is this now a global function and not one just local to the pci
>core? Who else would ever need to call it?
Will make pci_find_all_capabilitie local and move it to probe.c
>
>thanks,
>
>greg k-h
Powered by blists - more mailing lists