lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Sun, 10 Nov 2013 23:40:33 +0000
From:	Thomas Graf <tgraf@...g.ch>
To:	Ben Hutchings <bhutchings@...arflare.com>
Cc:	jesse@...ira.com, davem@...emloft.net, dev@...nvswitch.org,
	netdev@...r.kernel.org, eric.dumazet@...il.com
Subject: Re: [PATCH 2/2 net-next] openvswitch: Use skb_zerocopy() for upcall

On 11/09/13 at 10:02pm, Ben Hutchings wrote:
> On Fri, 2013-11-08 at 10:15 +0100, Thomas Graf wrote:
> > Use of skb_zerocopy() avoids the expensive call to memcpy() when
> > copying the packet data into the Netlink skb. Completes checksum
> > through skb_checksum_help() if needed.
> > 
> > Netlink messaged must be properly padded and aligned to meet
> > sanity checks of the user space counterpart.
> > 
> > Cost of memcpy is significantly reduced from:
> > +   7.48%       vhost-8471  [k] memcpy
> > +   5.57%     ovs-vswitchd  [k] memcpy
> > +   2.81%       vhost-8471  [k] csum_partial_copy_generic
> > 
> > to:
> > +   5.72%     ovs-vswitchd  [k] memcpy
> > +   3.32%       vhost-5153  [k] memcpy
> > +   0.68%       vhost-5153  [k] skb_zerocopy
> > 
> > (megaflows disabled)
> > 
> > Signed-off-by: Thomas Graf <tgraf@...g.ch>
> > ---
> >  net/openvswitch/datapath.c | 52 +++++++++++++++++++++++++++++++++++++++-------
> >  1 file changed, 45 insertions(+), 7 deletions(-)
> > 
> > diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
> > index 1408adc..3f170e3 100644
> > --- a/net/openvswitch/datapath.c
> > +++ b/net/openvswitch/datapath.c
> [...]
> > @@ -441,13 +449,43 @@ static int queue_userspace_packet(struct net *net, int dp_ifindex,
> >  			  nla_len(upcall_info->userdata),
> >  			  nla_data(upcall_info->userdata));
> >  
> > -	nla = __nla_reserve(user_skb, OVS_PACKET_ATTR_PACKET, skb->len);
> > +	/* Only reserve room for attribute header, packet data is added
> > +	 * in skb_zerocopy() */
> > +	if (!(nla = nla_reserve(user_skb, OVS_PACKET_ATTR_PACKET, 0)))
> > +		goto out;
> > +	nla->nla_len = nla_attr_size(skb->len);
> >  
> > -	skb_copy_and_csum_dev(skb, nla_data(nla));
> > +	skb_zerocopy(user_skb, skb, skb->len, hlen);
> >  
> > -	genlmsg_end(user_skb, upcall);
> > -	err = genlmsg_unicast(net, user_skb, upcall_info->portid);
> > +	/* OVS user space expects the size of the message to be aligned to
> > +	 * NLA_ALIGNTO. Aligning nlmsg_len is not enough, the actual bytes
> > +	 * read must match nlmsg_len.
> > +	 */
> > +	plen = NLA_ALIGN(user_skb->len) - user_skb->len;
> > +	if (plen > 0) {
> > +		int nr_frags = skb_shinfo(user_skb)->nr_frags;
> > +
> > +		if (nr_frags) {
> > +			skb_frag_t *frag;
> > +
> > +			frag = &skb_shinfo(user_skb)->frags[nr_frags -1];
> > +			skb_frag_size_add(frag, plen);
> 
> It looks like this is effectively padding with whatever happens to
> follow the original packet content.  This could result in a small
> information leak.  If the fragment has non-zero offset and already
> extends to the end of a page, this could result in a segfault as the
> next page may be unmapped.
> 
> Perhaps you could add the padding as an extra fragment pointing to a
> preallocated zero page.  If the skb already has the maximum number of
> fragments, you would have to copy the last fragment in order to add
> padding.

You are right and thanks for the review Ben.

Realizing how complex this becomes I'm leaning towards avoiding
padding alltogether by fixing OVS user space to no longer enforce
it, signal this capability via a flag to the kernel and only
perform zerocopy for enabled OVS user space counterparts.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ