[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1351084618.18035.27.camel@zakaz.uk.xensource.com>
Date: Wed, 24 Oct 2012 14:16:58 +0100
From: Ian Campbell <Ian.Campbell@...rix.com>
To: Eric Dumazet <eric.dumazet@...il.com>
CC: "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
Eric Dumazet <edumazet@...gle.com>,
Konrad Rzeszutek Wilk <konrad.wilk@...cle.com>,
"xen-devel@...ts.xen.org" <xen-devel@...ts.xen.org>
Subject: Re: [PATCH] net: allow configuration of the size of page in
__netdev_alloc_frag
On Wed, 2012-10-24 at 13:28 +0100, Eric Dumazet wrote:
> On Wed, 2012-10-24 at 12:42 +0100, Ian Campbell wrote:
> > The commit 69b08f62e174 "net: use bigger pages in __netdev_alloc_frag"
> > lead to 70%+ packet loss under Xen when transmitting from physical (as
> > opposed to virtual) network devices.
> >
> > This is because under Xen pages which are contiguous in the physical
> > address space may not be contiguous in the DMA space, in fact it is
> > very likely that they are not. I think there are other architectures
> > where this is true, although perhaps non quite so aggressive as to
> > have this property at a per-order-0-page granularity.
> >
> > The real underlying bug here most likely lies in the swiotlb not
> > correctly handling compound pages, and Konrad is investigating this.
> > However even with the swiotlb issue fixed the current arrangement
> > seems likely to result in a lot of bounce buffering which seems likely
> > to more than offset any benefit from the use of larger pages.
> >
> > Therefore make NETDEV_FRAG_PAGE_MAX_ORDER configurable at runtime and
> > use this to request order-0 frags under Xen. Also expose this setting
> > via sysctl.
> >
> > Signed-off-by: Ian Campbell <ian.campbell@...rix.com>
> > Cc: Eric Dumazet <edumazet@...gle.com>
> > Cc: Konrad Rzeszutek Wilk <konrad.wilk@...cle.com>
> > Cc: netdev@...r.kernel.org
> > Cc: xen-devel@...ts.xen.org
> > ---
>
> I understand your concern, but this seems a quick/dirty hack at this
> moment. After setting the sysctl to 0, some tasks may still have some
> order-3 pages in their cache.
Right, the sysctl thing might be overkill, I just figured it was useful
for debugging. When booting in a Xen VM the patch sets it to zero very
early on, during setup_arch(), which is before any tasks even exist.
> Your driver must already cope with skb->head being split on several
> pages.
>
> So what fundamental difference exists with frags ?
The issue here is with drivers for physical network devices when running
under Xen not with the Xen paravirtualised network drivers (AKA
netback/netfront).
The problem is that pages which are contiguous in the physical address
space may not be contiguous in the DMA address space. With order>0 pages
this becomes a problem when you poke down the DMA address and length of
a compound page into the hardware registers. The DMA address will be
right for the head of the page but once the hardware steps off the end
of that it'll get the wrong page.
I don't think this non-contiguousness between physical and DMA addresses
is specific to Xen, although it is more frequent under Xen than any real
hardware platform. (Xen has often been a good canary for these sorts of
issues which turn out later on to impact other arches too.)
In theory this could be fixed in all the drivers for physical network
devices, but that would be a lot of effort (and probably a fair bit of
ugliness in the drivers) for a gain which was only relevant to Xen.
Ian.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists