[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <275e859a-0ddd-ea7f-a681-67e42a5233fe@amd.com>
Date: Thu, 23 Nov 2017 09:11:02 +0100
From: Christian König <christian.koenig@....com>
To: Boris Ostrovsky <boris.ostrovsky@...cle.com>, helgaas@...nel.org,
linux-pci@...r.kernel.org, dri-devel@...ts.freedesktop.org,
linux-kernel@...r.kernel.org, amd-gfx@...ts.freedesktop.org,
xen-devel <xen-devel@...ts.xen.org>
Subject: Re: [PATCH v9 4/5] x86/PCI: Enable a 64bit BAR on AMD Family 15h
(Models 30h-3fh) Processors v5
Am 22.11.2017 um 18:27 schrieb Boris Ostrovsky:
> On 11/22/2017 11:54 AM, Christian König wrote:
>> Am 22.11.2017 um 17:24 schrieb Boris Ostrovsky:
>>> On 11/22/2017 05:09 AM, Christian König wrote:
>>>> Am 21.11.2017 um 23:26 schrieb Boris Ostrovsky:
>>>>> On 11/21/2017 08:34 AM, Christian König wrote:
>>>>>> Hi Boris,
>>>>>>
>>>>>> attached are two patches.
>>>>>>
>>>>>> The first one is a trivial fix for the infinite loop issue, it now
>>>>>> correctly aborts the fixup when it can't find address space for the
>>>>>> root window.
>>>>>>
>>>>>> The second is a workaround for your board. It simply checks if there
>>>>>> is exactly one Processor Function to apply this fix on.
>>>>>>
>>>>>> Both are based on linus current master branch. Please test if they
>>>>>> fix
>>>>>> your issue.
>>>>> Yes, they do fix it but that's because the feature is disabled.
>>>>>
>>>>> Do you know what the actual problem was (on Xen)?
>>>> I still haven't understood what you actually did with Xen.
>>>>
>>>> When you used PCI pass through with those devices then you have made a
>>>> major configuration error.
>>>>
>>>> When the problem happened on dom0 then the explanation is most likely
>>>> that some PCI device ended up in the configured space, but the routing
>>>> was only setup correctly on one CPU socket.
>>> The problem is that dom0 can be (and was in my case() booted with less
>>> than full physical memory and so the "rest" of the host memory is not
>>> necessarily reflected in iomem. Your patch then tried to configure that
>>> memory for MMIO and the system hang.
>>>
>>> And so my guess is that this patch will break dom0 on a single-socket
>>> system as well.
>> Oh, thanks!
>>
>> I've thought about that possibility before, but wasn't able to find a
>> system which actually does that.
>>
>> May I ask why the rest of the memory isn't reported to the OS?
> That memory doesn't belong to the OS (dom0), it is owned by the hypervisor.
>
>> Sounds like I can't trust Linux resource management and probably need
>> to read the DRAM config to figure things out after all.
>
> My question is whether what you are trying to do should ever be done for
> a guest at all (any guest, not necessarily Xen).
The issue is probably that I don't know enough about Xen: What exactly
is dom0? My understanding was that dom0 is the hypervisor, but that
seems to be incorrect.
The issue is that under no circumstances *EVER* a virtualized guest
should have access to the PCI devices marked as "Processor Function" on
AMD platforms. Otherwise it is trivial to break out of the virtualization.
When dom0 is something like the system domain with all hardware access
then the approach seems legitimate, but then the hypervisor should
report the stolen memory to the OS using the e820 table.
When the hypervisor doesn't do that and the Linux kernel isn't aware
that there is memory at a given location mapping PCI space there will
obviously crash the hypervisor.
Possible solutions as far as I can see are either disabling this feature
when we detect that we are a Xen dom0, scanning the DRAM settings to
update Linux resource handling or fixing Xen to report stolen memory to
the dom0 OS as reserved.
Opinions?
Thanks,
Christian.
>
> -boris
>
Powered by blists - more mailing lists