[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aWoPY1OahFOF9r-C@Mac.lan>
Date: Fri, 16 Jan 2026 11:13:55 +0100
From: Roger Pau Monné <roger.pau@...rix.com>
To: James Dingwall <james@...gwall.me.uk>
Cc: linux-kernel@...r.kernel.org
Subject: Re: xen pci passthrough stops working after xen/x86: fix initial
memory balloon target
On Fri, Jan 16, 2026 at 09:27:15AM +0000, James Dingwall wrote:
> On Thu, Jan 15, 2026 at 06:55:15PM +0100, Roger Pau Monné wrote:
> > On Thu, Jan 15, 2026 at 02:50:12PM +0000, James Dingwall wrote:
> > > On Thu, Jan 15, 2026 at 01:03:49PM +0100, Roger Pau Monné wrote:
> > > > On Thu, Jan 15, 2026 at 11:23:37AM +0000, James Dingwall wrote:
> > > > > Hi,
> > > > >
> > > > > We have encountered a regression with pci passthrough since the
> > > > > Ubuntu 6.8.0-91.92 which included this commit:
> > > >
> > > > Hello,
> > > >
> > > > Thanks for the report. Could you also send me your kernel Kconfig, to
> > > > see which combination of options are you using?
> > > >
> > >
> > Can you confirm that the config used to build the non-working kernel
> > also has CONFIG_XEN_UNPOPULATED_ALLOC=y?
>
> The config is the same for both builds and CONFIG_XEN_UNPOPULATED_ALLOC=y
> is always set.
>
> > Can you also provide the output of `cat /proc/iomem` for both the
> > working and non-working kernels?
>
> non-working Ubuntu-6.8.0-100.100:
>
> 00000000-00000fff : Reserved
> 00001000-0009ffff : System RAM
> 000a0000-000fffff : Reserved
> 000f0000-000fffff : System ROM
> 00100000-2007ffff : System RAM
> 01000000-025fffff : Kernel code
> 02600000-033bcfff : Kernel rodata
> 03400000-0385613f : Kernel data
> 03d54000-041fffff : Kernel bss
> 20081000-73b57fff : Unusable memory
> 76c58000-76d76fff : ACPI Tables
> 76d77000-76ea0fff : ACPI Non-volatile Storage
> 77fff000-77ffffff : Unusable memory
> 80000000-87ffffff : System RAM
> 88000000-8fffffff : Xen scratch
> 100000000-103f7ffff : System RAM
> 4000200000-400021ffff : 0000:01:00.0
> 4000220000-400023ffff : 0000:01:00.0
> 4000240000-400025ffff : 0000:01:00.1
> 4000260000-400027ffff : 0000:01:00.1
>
>
> working Ubuntu-6.8.0-100.100:
>
> 00000000-00000fff : Reserved
> 00001000-0009ffff : System RAM
> 000a0000-000fffff : Reserved
> 000f0000-000fffff : System ROM
> 00100000-2007ffff : System RAM
> 01000000-025fffff : Kernel code
> 02600000-033bcfff : Kernel rodata
> 03400000-0385613f : Kernel data
> 03d54000-041fffff : Kernel bss
> 20081000-73b57fff : Unusable memory
> 76c58000-76d76fff : ACPI Tables
> 76d77000-76ea0fff : ACPI Non-volatile Storage
> 77fff000-77ffffff : Unusable memory
> 81100000-811fffff : 0000:01:00.1
> 81100000-811fffff : igb
> 81200000-812fffff : 0000:01:00.0
> 81200000-812fffff : igb
> 81300000-8137ffff : 0000:01:00.1
> 81380000-813fffff : 0000:01:00.0
> 81400000-81403fff : 0000:01:00.1
> 81400000-81403fff : igb
> 81404000-81407fff : 0000:01:00.0
> 81404000-81407fff : igb
> 81500000-815fffff : 0000:03:00.0
> 81600000-816fffff : 0000:03:00.0
> 81600000-816fffff : igc
> 81700000-81703fff : 0000:03:00.0
> 81700000-81703fff : igc
> 88000000-8fffffff : Xen scratch
> 100000000-103f7ffff : System RAM
> 4000200000-400021ffff : 0000:01:00.0
> 4000220000-400023ffff : 0000:01:00.0
> 4000240000-400025ffff : 0000:01:00.1
> 4000260000-400027ffff : 0000:01:00.1
For some reason (which I still haven't figure out), the fictitious PFN memory layout
created by Linux ends up placing a RAM region over the BAR MMIO space
used by igc, the difference:
81200000-812fffff : 0000:01:00.0
81200000-812fffff : igb
81300000-8137ffff : 0000:01:00.1
81380000-813fffff : 0000:01:00.0
81400000-81403fff : 0000:01:00.1
81400000-81403fff : igb
81404000-81407fff : 0000:01:00.0
81404000-81407fff : igb
81500000-815fffff : 0000:03:00.0
81600000-816fffff : 0000:03:00.0
81600000-816fffff : igc
81700000-81703fff : 0000:03:00.0
81700000-81703fff : igc
88000000-8fffffff : Xen scratch
100000000-103f7ffff : System RAM
VS
80000000-87ffffff : System RAM
88000000-8fffffff : Xen scratch
100000000-103f7ffff : System RAM
In the non-working case there's a chunk of RAM in the space that
covers the device MMIO BARs. I fear my balloon accounting "fix" has
instead introduced a miss accounting in the balloon driver that causes
Linux to attempt to balloon up memory and it ends up instantiating a
hotplug memory region over the device MMIO BARs.
I'm still confused as to how the change in balloon_add_regions() has
an effect when CONFIG_XEN_UNPOPULATED_ALLOC=y, as it should become a
no-op in that case, but I will debug this myself.
>
> Just for completeness the working build also reverts "xen/x86: fix initial
> memory balloon target" because of a conflict in drivers/xen/balloon.c.
Hm, but that's the same commit that you mentioned in the first email,
there you said reverting:
commit 74287971dbb3fe322bb316afd9e7fb5807e23bee
Author: Roger Pau Monne <roger.pau@...rix.com>
Date: Wed May 14 10:04:26 2025 +0200
xen/x86: fix initial memory balloon target
Fixes the issue. Is there an additional commit that also needs
reverting to fix the issue? That would make more sense, as IMO that
commit should be a no-op given your Kconfig.
I don't think I will be able to get into this until Monday, sorry. In
the meantime, does disabling the balloon driver mitigate the issue?
Thanks, Roger.
Powered by blists - more mailing lists