[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <aWohGZYEVhU3q8Ji@Mac.lan>
Date: Fri, 16 Jan 2026 12:29:29 +0100
From: Roger Pau Monné <roger.pau@...rix.com>
To: James Dingwall <james@...gwall.me.uk>
Cc: linux-kernel@...r.kernel.org
Subject: Re: xen pci passthrough stops working after xen/x86: fix initial
memory balloon target
On Fri, Jan 16, 2026 at 10:34:14AM +0000, James Dingwall wrote:
> On Fri, Jan 16, 2026 at 11:13:55AM +0100, Roger Pau Monné wrote:
> > On Fri, Jan 16, 2026 at 09:27:15AM +0000, James Dingwall wrote:
> > > On Thu, Jan 15, 2026 at 06:55:15PM +0100, Roger Pau Monné wrote:
> > > > On Thu, Jan 15, 2026 at 02:50:12PM +0000, James Dingwall wrote:
> > > > > On Thu, Jan 15, 2026 at 01:03:49PM +0100, Roger Pau Monné wrote:
> > > > > > On Thu, Jan 15, 2026 at 11:23:37AM +0000, James Dingwall wrote:
> > > > > > > Hi,
> > > > > > >
> > > > > > > We have encountered a regression with pci passthrough since the
> > > > > > > Ubuntu 6.8.0-91.92 which included this commit:
> > > > > >
> > > > > > Hello,
> > > > > >
> > > > > > Thanks for the report. Could you also send me your kernel Kconfig, to
> > > > > > see which combination of options are you using?
> > > > > >
> > > > >
> > > > Can you confirm that the config used to build the non-working kernel
> > > > also has CONFIG_XEN_UNPOPULATED_ALLOC=y?
> > >
> > > The config is the same for both builds and CONFIG_XEN_UNPOPULATED_ALLOC=y
> > > is always set.
> > >
> > > > Can you also provide the output of `cat /proc/iomem` for both the
> > > > working and non-working kernels?
> > >
> > > non-working Ubuntu-6.8.0-100.100:
> > >
> > > 00000000-00000fff : Reserved
> > > 00001000-0009ffff : System RAM
> > > 000a0000-000fffff : Reserved
> > > 000f0000-000fffff : System ROM
> > > 00100000-2007ffff : System RAM
> > > 01000000-025fffff : Kernel code
> > > 02600000-033bcfff : Kernel rodata
> > > 03400000-0385613f : Kernel data
> > > 03d54000-041fffff : Kernel bss
> > > 20081000-73b57fff : Unusable memory
> > > 76c58000-76d76fff : ACPI Tables
> > > 76d77000-76ea0fff : ACPI Non-volatile Storage
> > > 77fff000-77ffffff : Unusable memory
> > > 80000000-87ffffff : System RAM
> > > 88000000-8fffffff : Xen scratch
> > > 100000000-103f7ffff : System RAM
> > > 4000200000-400021ffff : 0000:01:00.0
> > > 4000220000-400023ffff : 0000:01:00.0
> > > 4000240000-400025ffff : 0000:01:00.1
> > > 4000260000-400027ffff : 0000:01:00.1
> > >
> > >
> > > working Ubuntu-6.8.0-100.100:
> > >
> > > 00000000-00000fff : Reserved
> > > 00001000-0009ffff : System RAM
> > > 000a0000-000fffff : Reserved
> > > 000f0000-000fffff : System ROM
> > > 00100000-2007ffff : System RAM
> > > 01000000-025fffff : Kernel code
> > > 02600000-033bcfff : Kernel rodata
> > > 03400000-0385613f : Kernel data
> > > 03d54000-041fffff : Kernel bss
> > > 20081000-73b57fff : Unusable memory
> > > 76c58000-76d76fff : ACPI Tables
> > > 76d77000-76ea0fff : ACPI Non-volatile Storage
> > > 77fff000-77ffffff : Unusable memory
> > > 81100000-811fffff : 0000:01:00.1
> > > 81100000-811fffff : igb
> > > 81200000-812fffff : 0000:01:00.0
> > > 81200000-812fffff : igb
> > > 81300000-8137ffff : 0000:01:00.1
> > > 81380000-813fffff : 0000:01:00.0
> > > 81400000-81403fff : 0000:01:00.1
> > > 81400000-81403fff : igb
> > > 81404000-81407fff : 0000:01:00.0
> > > 81404000-81407fff : igb
> > > 81500000-815fffff : 0000:03:00.0
> > > 81600000-816fffff : 0000:03:00.0
> > > 81600000-816fffff : igc
> > > 81700000-81703fff : 0000:03:00.0
> > > 81700000-81703fff : igc
> > > 88000000-8fffffff : Xen scratch
> > > 100000000-103f7ffff : System RAM
> > > 4000200000-400021ffff : 0000:01:00.0
> > > 4000220000-400023ffff : 0000:01:00.0
> > > 4000240000-400025ffff : 0000:01:00.1
> > > 4000260000-400027ffff : 0000:01:00.1
> >
> > For some reason (which I still haven't figure out), the fictitious PFN memory layout
> > created by Linux ends up placing a RAM region over the BAR MMIO space
> > used by igc, the difference:
> >
> > 81200000-812fffff : 0000:01:00.0
> > 81200000-812fffff : igb
> > 81300000-8137ffff : 0000:01:00.1
> > 81380000-813fffff : 0000:01:00.0
> > 81400000-81403fff : 0000:01:00.1
> > 81400000-81403fff : igb
> > 81404000-81407fff : 0000:01:00.0
> > 81404000-81407fff : igb
> > 81500000-815fffff : 0000:03:00.0
> > 81600000-816fffff : 0000:03:00.0
> > 81600000-816fffff : igc
> > 81700000-81703fff : 0000:03:00.0
> > 81700000-81703fff : igc
> > 88000000-8fffffff : Xen scratch
> > 100000000-103f7ffff : System RAM
> >
> > VS
> >
> > 80000000-87ffffff : System RAM
> > 88000000-8fffffff : Xen scratch
> > 100000000-103f7ffff : System RAM
> >
> > In the non-working case there's a chunk of RAM in the space that
> > covers the device MMIO BARs. I fear my balloon accounting "fix" has
> > instead introduced a miss accounting in the balloon driver that causes
> > Linux to attempt to balloon up memory and it ends up instantiating a
> > hotplug memory region over the device MMIO BARs.
> >
> > I'm still confused as to how the change in balloon_add_regions() has
> > an effect when CONFIG_XEN_UNPOPULATED_ALLOC=y, as it should become a
> > no-op in that case, but I will debug this myself.
> >
> > >
> > > Just for completeness the working build also reverts "xen/x86: fix initial
> > > memory balloon target" because of a conflict in drivers/xen/balloon.c.
> >
> > Hm, but that's the same commit that you mentioned in the first email,
> > there you said reverting:
> >
> > commit 74287971dbb3fe322bb316afd9e7fb5807e23bee
> > Author: Roger Pau Monne <roger.pau@...rix.com>
> > Date: Wed May 14 10:04:26 2025 +0200
> >
> > xen/x86: fix initial memory balloon target
> >
> > Fixes the issue. Is there an additional commit that also needs
> > reverting to fix the issue? That would make more sense, as IMO that
> > commit should be a no-op given your Kconfig.
>
> Argh! This is my mistake reading the two reverts in the wrong order. The
> bisect landed on "x86/xen: fix balloon target initialization for PVH dom0"
> but "xen/x86: fix initial memory balloon target" had to be reverted first.
> I'm sorry if that left you scratching your head.
>
> > I don't think I will be able to get into this until Monday, sorry. In
> > the meantime, does disabling the balloon driver mitigate the issue?
>
> I can try this if it could still be relevant?
I'm positive it will workaround the issue, by simply not enabling the
balloon driver. I don't think it will specially help me debug it, but
it should get you going without needing a custom build kernel (if you
don't care about ballooning).
Hopefully I will be able to get back to you on Monday with a fix for
the issue, sorry for the inconvenience this might have caused, and
thanks for bisecting it.
Regards, Roger.
Powered by blists - more mailing lists