lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20120727175408.GG17427@andromeda.dapyr.net>
Date:	Fri, 27 Jul 2012 13:54:08 -0400
From:	Konrad Rzeszutek Wilk <konrad@...nok.org>
To:	Jan Beulich <JBeulich@...e.com>
Cc:	Konrad Rzeszutek Wilk <konrad.wilk@...cle.com>,
	linux-kernel@...r.kernel.org, xen-devel <xen-devel@...ts.xen.org>
Subject: Re: [Xen-devel] [PATCH 1/2] xen/swiotlb: If iommu=soft was not passed in on > 4GB, don't turn it on.

On Fri, Jul 27, 2012 at 08:27:39AM +0100, Jan Beulich wrote:
> >>> On 26.07.12 at 22:43, Konrad Rzeszutek Wilk <konrad.wilk@...cle.com> wrote:
> > If we boot a 64-bit guest with more than 4GB memory, the SWIOTLB
> > gets turned on:
> > PCI-DMA: Using software bounce buffering for IO (SWIOTLB)
> > software IO TLB [mem 0xfb43d000-0xff43cfff] (64MB) mapped at 
> > [ffff8800fb43d000-ffff8800ff43cfff]
> > 
> > which is OK if we had PCI devices, but not if we did not. In a PV
> > guest the SWIOTLB ends up asking the hypervisor for precious lowmem
> > memory - and 64MB of it per guest. On a 32GB machine, this limits the
> > amount of guests that are 4GB to start due to lowmem exhaustion.
> > 
> > What we do is detect whether the user supplied e820_hole=1
> > parameter, which is used to construct an E820 that is similar to
> > the machine  - so that the PCI regions do not overlap with RAM regions.
> > We check for that by looking at the E820 and seeing if it diverges
> > from the standard - and if so (and if iommu=soft was not turned on),
> > we disable the check pci_swiotlb_detect_4gb code.
> > 
> > Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@...cle.com>
> > ---
> >  arch/x86/xen/pci-swiotlb-xen.c |   26 ++++++++++++++++++++++++++
> >  1 files changed, 26 insertions(+), 0 deletions(-)
> > 
> > diff --git a/arch/x86/xen/pci-swiotlb-xen.c b/arch/x86/xen/pci-swiotlb-xen.c
> > index 967633a..56f373e 100644
> > --- a/arch/x86/xen/pci-swiotlb-xen.c
> > +++ b/arch/x86/xen/pci-swiotlb-xen.c
> > @@ -8,6 +8,10 @@
> >  #include <xen/xen.h>
> >  #include <asm/iommu_table.h>
> >  
> > +#include <asm/e820.h>
> > +#include <asm/dma.h>
> > +#include <asm/iommu.h>
> > +
> >  int xen_swiotlb __read_mostly;
> >  
> >  static struct dma_map_ops xen_swiotlb_dma_ops = {
> > @@ -24,7 +28,19 @@ static struct dma_map_ops xen_swiotlb_dma_ops = {
> >  	.unmap_page = xen_swiotlb_unmap_page,
> >  	.dma_supported = xen_swiotlb_dma_supported,
> >  };
> > +bool __init e820_has_acpi(void)
> > +{
> > +	int i;
> >  
> > +	/* Check if the user supplied the e820_hole parameter
> > +	 * which would create a machine looking E820 region. */
> > +	for (i = 0; i < e820.nr_map; i++) {
> > +		if ((e820.map[i].type == E820_ACPI) ||
> > +		    (e820.map[i].type == E820_NVS))
> > +			return true;
> 
> Tying this decision to the presence of ACPI regions in E820 is
> problematic for two reasons imo: For one, it precludes cleaning
> up this (bogus!) construct where it gets produced (PV DomU-s
> really shouldn't ever see such E820 entries, they should get
> converted to simple reserved entries, to wipe any notion of
> ACPI presence). And second it ties you to running on systems
> that actually have ACPI, whereas it is my rudimentary
> understanding that systems with e.g. SFI would not have any
> ACPI).

Right. The other idea was to check the XenBus for the existence
of vpci backend. But at this stage it is not up yet.

Perhaps what I should check for is the existence of two E820_RSV
and two E820_RAM regions - and that would be a normal PV guest.
Anything that is outside of that scope would be considered
a PCI PV guest?

The other thought I had was to skip this check altogether and
either do:
1). initialize SWIOTLB when xen-pcifront start up and detects
    that it has devices (so later on initialization - similar to
    how IA64 does it) - but I am not sure how the PCI-DMA works
    with these late bloomers (especially as one could just make
    xen-pcifront be a module).
2). If xen-pcifront starts and does not detect any backends
    it calls swiotlb_free. But that also requires the PCI-DMA
    to swap in the dma_ops, and I am not entirely sure how
    that would work out.
3). Have an "early_init" xen-pcifront components that does a
    a quick XenBus init (similar to how hvmloader checks for
    DMI overwrites) and if it finds vpci then declare its
    time to turn SWIOTLB on.
4). The other thing is to wrap this code with something like
    this:

#ifdef CONFIG_SWIOTLB
#ifdef CONFIG_XEN_PCI_FRONTEND
	if (.. blah balh) do the check as outlined in 3).
#else // PCI_FRONTEND is not present, so we won't need SWIOTLB
	swiotlb = 0;
	iommu = 1;
#endif
#endif

That would take care of the built-in issues.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ