lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aWoPY1OahFOF9r-C@Mac.lan>
Date: Fri, 16 Jan 2026 11:13:55 +0100
From: Roger Pau Monné <roger.pau@...rix.com>
To: James Dingwall <james@...gwall.me.uk>
Cc: linux-kernel@...r.kernel.org
Subject: Re: xen pci passthrough stops working after xen/x86: fix initial
 memory balloon target

On Fri, Jan 16, 2026 at 09:27:15AM +0000, James Dingwall wrote:
> On Thu, Jan 15, 2026 at 06:55:15PM +0100, Roger Pau Monné wrote:
> > On Thu, Jan 15, 2026 at 02:50:12PM +0000, James Dingwall wrote:
> > > On Thu, Jan 15, 2026 at 01:03:49PM +0100, Roger Pau Monné wrote:
> > > > On Thu, Jan 15, 2026 at 11:23:37AM +0000, James Dingwall wrote:
> > > > > Hi,
> > > > > 
> > > > > We have encountered a regression with pci passthrough since the
> > > > > Ubuntu 6.8.0-91.92 which included this commit:
> > > > 
> > > > Hello,
> > > > 
> > > > Thanks for the report.  Could you also send me your kernel Kconfig, to
> > > > see which combination of options are you using?
> > > > 
> > > 
> > Can you confirm that the config used to build the non-working kernel
> > also has CONFIG_XEN_UNPOPULATED_ALLOC=y?
> 
> The config is the same for both builds and CONFIG_XEN_UNPOPULATED_ALLOC=y
> is always set. 
> 
> > Can you also provide the output of `cat /proc/iomem` for both the
> > working and non-working kernels?
> 
> non-working Ubuntu-6.8.0-100.100:
> 
> 00000000-00000fff : Reserved
> 00001000-0009ffff : System RAM
> 000a0000-000fffff : Reserved
>   000f0000-000fffff : System ROM
> 00100000-2007ffff : System RAM
>   01000000-025fffff : Kernel code
>   02600000-033bcfff : Kernel rodata
>   03400000-0385613f : Kernel data
>   03d54000-041fffff : Kernel bss
> 20081000-73b57fff : Unusable memory
> 76c58000-76d76fff : ACPI Tables
> 76d77000-76ea0fff : ACPI Non-volatile Storage
> 77fff000-77ffffff : Unusable memory
> 80000000-87ffffff : System RAM
> 88000000-8fffffff : Xen scratch
> 100000000-103f7ffff : System RAM
> 4000200000-400021ffff : 0000:01:00.0
> 4000220000-400023ffff : 0000:01:00.0
> 4000240000-400025ffff : 0000:01:00.1
> 4000260000-400027ffff : 0000:01:00.1
> 
> 
> working Ubuntu-6.8.0-100.100:
> 
> 00000000-00000fff : Reserved
> 00001000-0009ffff : System RAM
> 000a0000-000fffff : Reserved
>   000f0000-000fffff : System ROM
> 00100000-2007ffff : System RAM
>   01000000-025fffff : Kernel code
>   02600000-033bcfff : Kernel rodata
>   03400000-0385613f : Kernel data
>   03d54000-041fffff : Kernel bss
> 20081000-73b57fff : Unusable memory
> 76c58000-76d76fff : ACPI Tables
> 76d77000-76ea0fff : ACPI Non-volatile Storage
> 77fff000-77ffffff : Unusable memory
> 81100000-811fffff : 0000:01:00.1
>   81100000-811fffff : igb
> 81200000-812fffff : 0000:01:00.0
>   81200000-812fffff : igb
> 81300000-8137ffff : 0000:01:00.1
> 81380000-813fffff : 0000:01:00.0
> 81400000-81403fff : 0000:01:00.1
>   81400000-81403fff : igb
> 81404000-81407fff : 0000:01:00.0
>   81404000-81407fff : igb
> 81500000-815fffff : 0000:03:00.0
> 81600000-816fffff : 0000:03:00.0
>   81600000-816fffff : igc
> 81700000-81703fff : 0000:03:00.0
>   81700000-81703fff : igc
> 88000000-8fffffff : Xen scratch
> 100000000-103f7ffff : System RAM
> 4000200000-400021ffff : 0000:01:00.0
> 4000220000-400023ffff : 0000:01:00.0
> 4000240000-400025ffff : 0000:01:00.1
> 4000260000-400027ffff : 0000:01:00.1

For some reason (which I still haven't figure out), the fictitious PFN memory layout
created by Linux ends up placing a RAM region over the BAR MMIO space
used by igc, the difference:

81200000-812fffff : 0000:01:00.0
  81200000-812fffff : igb
81300000-8137ffff : 0000:01:00.1
81380000-813fffff : 0000:01:00.0
81400000-81403fff : 0000:01:00.1
  81400000-81403fff : igb
81404000-81407fff : 0000:01:00.0
  81404000-81407fff : igb
81500000-815fffff : 0000:03:00.0
81600000-816fffff : 0000:03:00.0
  81600000-816fffff : igc
81700000-81703fff : 0000:03:00.0
  81700000-81703fff : igc
88000000-8fffffff : Xen scratch
100000000-103f7ffff : System RAM

VS

80000000-87ffffff : System RAM
88000000-8fffffff : Xen scratch
100000000-103f7ffff : System RAM

In the non-working case there's a chunk of RAM in the space that
covers the device MMIO BARs.  I fear my balloon accounting "fix" has
instead introduced a miss accounting in the balloon driver that causes
Linux to attempt to balloon up memory and it ends up instantiating a
hotplug memory region over the device MMIO BARs.

I'm still confused as to how the change in balloon_add_regions() has
an effect when CONFIG_XEN_UNPOPULATED_ALLOC=y, as it should become a
no-op in that case, but I will debug this myself.

> 
> Just for completeness the working build also reverts "xen/x86: fix initial
> memory balloon target" because of a conflict in drivers/xen/balloon.c.

Hm, but that's the same commit that you mentioned in the first email,
there you said reverting:

commit 74287971dbb3fe322bb316afd9e7fb5807e23bee
Author: Roger Pau Monne <roger.pau@...rix.com>
Date:   Wed May 14 10:04:26 2025 +0200

    xen/x86: fix initial memory balloon target

Fixes the issue.  Is there an additional commit that also needs
reverting to fix the issue?  That would make more sense, as IMO that
commit should be a no-op given your Kconfig.

I don't think I will be able to get into this until Monday, sorry.  In
the meantime, does disabling the balloon driver mitigate the issue?

Thanks, Roger.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ