linux-kernel - Re: [PATCH 2/2] virtio_net: Improve the recv buffer allocation scheme

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20081009153035.GA21542@gondor.apana.org.au>
Date:	Thu, 9 Oct 2008 23:30:35 +0800
From:	Herbert Xu <herbert@...dor.apana.org.au>
To:	Rusty Russell <rusty@...tcorp.com.au>
Cc:	Mark McLoughlin <markmc@...hat.com>, linux-kernel@...r.kernel.org,
	virtualization@...ts.osdl.org, netdev@...r.kernel.org
Subject: Re: [PATCH 2/2] virtio_net: Improve the recv buffer allocation
	scheme

On Thu, Oct 09, 2008 at 11:55:59AM +1100, Rusty Russell wrote:
>
> There are three approaches we should investigate before adding YA feature.  
> Obviously, we can simply increase the number of ring entries.

That's not going to work so well as you need to increase the ring
size by MAX_SKB_FRAGS times to achieve the same level of effect.

Basically the current scheme is either going to suck at non-TSO
traffic or it's going to chew too much resources.

> Secondly, we can put the virtio_net_hdr at the head of the skb data (this is 
> also worth considering for xmit I think if we have headroom) and drop 
> MAX_SKB_FRAGS which contains a gratuitous +2.

That's fine but having skb->data in the ring still means two
different kinds of memory in there and it sucks when you only
have 1500-byte packets.

> Thirdly, we can try to coalesce contiguous buffers.  The page caching scheme 
> we have might help here, I don't know.  Maybe we should be explicitly trying 
> to allocate higher orders.

That's not really the key problem here.  The problem here is
that the scheme we're currently using in virtio-net is simply
broken when it comes to 1500-byte sized packets.  Most of the
entries on the ring buffer go to waste.

We need a scheme that handles both 1500-byte packets as well
as 64K-byte size ones, and without holding down 16M of memory
per guest.

> > The size of the logical buffer is
> > returned to the guest rather than the size of the individual smaller
> > buffers.
> 
> That's a virtio transport breakage: can you use the standard virtio mechanism, 
> just put the extended length or number of extra buffers inside the 
> virtio_net_hdr?

Sure that sounds reasonable.

> > Make use of this support by supplying single page receive buffers to
> > the host. On receive, we extract the virtio_net_hdr, copy 128 bytes of
> > the payload to the skb's linear data buffer and adjust the fragment
> > offset to point to the remaining data. This ensures proper alignment
> > and allows us to not use any paged data for small packets. If the
> > payload occupies multiple pages, we simply append those pages as
> > fragments and free the associated skbs.
> 
> > +		char *p = page_address(skb_shinfo(skb)->frags[0].page);
> ...
> > +		memcpy(hdr, p, sizeof(*hdr));
> > +		p += sizeof(*hdr);
> 
> I think you need kmap_atomic() here to access the page.  And yes, that will 
> effect performance :(

No we don't.  kmap would only be necessary for highmem which we
did not request.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@...dor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/