[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <3e6b9d7d-5d18-645e-5ef9-6b8a77fa62e9@ozlabs.ru>
Date: Thu, 30 May 2019 13:39:47 +1000
From: Alexey Kardashevskiy <aik@...abs.ru>
To: Shawn Anastasio <shawn@...stas.io>, Oliver <oohall@...il.com>
Cc: Sam Bobroff <sbobroff@...ux.ibm.com>, linux-pci@...r.kernel.org,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
rppt@...ux.ibm.com, Paul Mackerras <paulus@...ba.org>,
Bjorn Helgaas <bhelgaas@...gle.com>, xyjxie@...ux.vnet.ibm.com,
linuxppc-dev <linuxppc-dev@...ts.ozlabs.org>
Subject: Re: [PATCH v3 1/3] PCI: Introduce pcibios_ignore_alignment_request
On 28/05/2019 17:39, Shawn Anastasio wrote:
>
>
> On 5/28/19 1:27 AM, Alexey Kardashevskiy wrote:
>>
>>
>> On 28/05/2019 15:36, Oliver wrote:
>>> On Tue, May 28, 2019 at 2:03 PM Shawn Anastasio <shawn@...stas.io>
>>> wrote:
>>>>
>>>> Introduce a new pcibios function pcibios_ignore_alignment_request
>>>> which allows the PCI core to defer to platform-specific code to
>>>> determine whether or not to ignore alignment requests for PCI
>>>> resources.
>>>>
>>>> The existing behavior is to simply ignore alignment requests when
>>>> PCI_PROBE_ONLY is set. This is behavior is maintained by the
>>>> default implementation of pcibios_ignore_alignment_request.
>>>>
>>>> Signed-off-by: Shawn Anastasio <shawn@...stas.io>
>>>> ---
>>>> drivers/pci/pci.c | 9 +++++++--
>>>> include/linux/pci.h | 1 +
>>>> 2 files changed, 8 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
>>>> index 8abc843b1615..8207a09085d1 100644
>>>> --- a/drivers/pci/pci.c
>>>> +++ b/drivers/pci/pci.c
>>>> @@ -5882,6 +5882,11 @@ resource_size_t __weak
>>>> pcibios_default_alignment(void)
>>>> return 0;
>>>> }
>>>>
>>>> +int __weak pcibios_ignore_alignment_request(void)
>>>> +{
>>>> + return pci_has_flag(PCI_PROBE_ONLY);
>>>> +}
>>>> +
>>>> #define RESOURCE_ALIGNMENT_PARAM_SIZE COMMAND_LINE_SIZE
>>>> static char
>>>> resource_alignment_param[RESOURCE_ALIGNMENT_PARAM_SIZE] = {0};
>>>> static DEFINE_SPINLOCK(resource_alignment_lock);
>>>> @@ -5906,9 +5911,9 @@ static resource_size_t
>>>> pci_specified_resource_alignment(struct pci_dev *dev,
>>>> p = resource_alignment_param;
>>>> if (!*p && !align)
>>>> goto out;
>>>> - if (pci_has_flag(PCI_PROBE_ONLY)) {
>>>> + if (pcibios_ignore_alignment_request()) {
>>>> align = 0;
>>>> - pr_info_once("PCI: Ignoring requested alignments
>>>> (PCI_PROBE_ONLY)\n");
>>>> + pr_info_once("PCI: Ignoring requested alignments\n");
>>>> goto out;
>>>> }
>>>
>>> I think the logic here is questionable to begin with. If the user has
>>> explicitly requested re-aligning a resource via the command line then
>>> we should probably do it even if PCI_PROBE_ONLY is set. When it breaks
>>> they get to keep the pieces.
>>>
>>> That said, the real issue here is that PCI_PROBE_ONLY probably
>>> shouldn't be set under qemu/kvm. Under the other hypervisor (PowerVM)
>>> hotplugged devices are configured by firmware before it's passed to
>>> the guest and we need to keep the FW assignments otherwise things
>>> break. QEMU however doesn't do any BAR assignments and relies on that
>>> being handled by the guest. At boot time this is done by SLOF, but
>>> Linux only keeps SLOF around until it's extracted the device-tree.
>>> Once that's done SLOF gets blown away and the kernel needs to do it's
>>> own BAR assignments. I'm guessing there's a hack in there to make it
>>> work today, but it's a little surprising that it works at all...
>>
>>
>> The hack is to run a modified qemu-aware "/usr/sbin/rtas_errd" in the
>> guest which receives an event from qemu (RAS_EPOW from
>> /proc/interrupts), fetches device tree chunks (and as I understand it -
>> they come with BARs from phyp but without from qemu) and writes "1" to
>> "/sys/bus/pci/rescan" which calls pci_assign_resource() eventually:
>
> Interesting. Does this mean that the PHYP hotplug path doesn't
> call pci_assign_resource?
I'd expect dlpar_add_slot() to be called under phyp and eventually
pci_device_add() which (I think) may or may not trigger later reassignment.
> If so it means the patch may not
> break that platform after all, though it still may not be
> the correct way of doing things.
We should probably stop enforcing the PCI_PROBE_ONLY flag - it seems
that (unless resource_alignment= is used) the pseries guest should just
walk through all allocated resources and leave them unchanged.
>> [c000000006e6f960] [c0000000005f62d4] pci_assign_resource+0x44/0x360
>>
>> [c000000006e6fa10] [c0000000005f8b54]
>> assign_requested_resources_sorted+0x84/0x110
>> [c000000006e6fa60] [c0000000005f9540]
>> __assign_resources_sorted+0xd0/0x750
>> [c000000006e6fb40] [c0000000005fb2e0]
>> __pci_bus_assign_resources+0x80/0x280
>> [c000000006e6fc00] [c0000000005fb95c]
>> pci_assign_unassigned_bus_resources+0xbc/0x100
>> [c000000006e6fc60] [c0000000005e3d74] pci_rescan_bus+0x34/0x60
>>
>> [c000000006e6fc90] [c0000000005f1ef4] rescan_store+0x84/0xc0
>>
>> [c000000006e6fcd0] [c00000000068060c] bus_attr_store+0x3c/0x60
>>
>> [c000000006e6fcf0] [c00000000037853c] sysfs_kf_write+0x5c/0x80
>>
>>
>>
>>
>>
>>>
>>> IIRC Sam Bobroff was looking at hotplug under pseries recently so he
>>> might have something to add. He's sick at the moment, but I'll ask him
>>> to take a look at this once he's back among the living
>>>
>>>> diff --git a/include/linux/pci.h b/include/linux/pci.h
>>>> index 4a5a84d7bdd4..47471dcdbaf9 100644
>>>> --- a/include/linux/pci.h
>>>> +++ b/include/linux/pci.h
>>>> @@ -1990,6 +1990,7 @@ static inline void
>>>> pcibios_penalize_isa_irq(int irq, int active) {}
>>>> int pcibios_alloc_irq(struct pci_dev *dev);
>>>> void pcibios_free_irq(struct pci_dev *dev);
>>>> resource_size_t pcibios_default_alignment(void);
>>>> +int pcibios_ignore_alignment_request(void);
>>>>
>>>> #ifdef CONFIG_HIBERNATE_CALLBACKS
>>>> extern struct dev_pm_ops pcibios_pm_ops;
>>>> --
>>>> 2.20.1
>>>>
>>
--
Alexey
Powered by blists - more mailing lists