lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20160929121004.1db2d543@jkicinski-Precision-T1700>
Date:   Thu, 29 Sep 2016 12:10:04 +0100
From:   Jakub Kicinski <kubakici@...pl>
To:     John Fastabend <john.fastabend@...il.com>
Cc:     Jakub Kicinski <jakub.kicinski@...ronome.com>,
        "Samudrala, Sridhar" <sridhar.samudrala@...el.com>,
        Jiri Benc <jbenc@...hat.com>, Jiri Pirko <jiri@...nulli.us>,
        netdev@...r.kernel.org, Thomas Graf <tgraf@...g.ch>,
        Roopa Prabhu <roopa@...ulusnetworks.com>,
        ogerlitz@...lanox.com, ast@...nel.org, daniel@...earbox.net,
        simon.horman@...ronome.com, Paolo Abeni <pabeni@...hat.com>,
        Pravin B Shelar <pshelar@...ira.com>,
        hannes@...essinduktion.org
Subject: Re: [RFC] net: store port/representative id in metadata_dst

On Fri, 23 Sep 2016 14:20:40 -0700, John Fastabend wrote:
> On 16-09-23 01:45 PM, Jakub Kicinski wrote:
> > On Fri, 23 Sep 2016 13:25:10 -0700, John Fastabend wrote:  
> >> On 16-09-23 01:17 PM, Jakub Kicinski wrote:  
> >>> On Fri, 23 Sep 2016 10:22:59 -0700, Samudrala, Sridhar wrote:    
> >>>> On 9/23/2016 8:29 AM, Jakub Kicinski wrote:    
> >>  [...]  
> >>  [...]    
> >>>>
> >>>> The 'accel' parameter in dev_queue_xmit_accel() is currently only passed
> >>>> to ndo_select_queue() via netdev_pick_tx() and is used to select the tx 
> >>>> queue.
> >>>> Also, it is not passed all the way to the driver specific xmit routine.  
> >>>> Doesn't it require
> >>>> changing all the driver xmit routines if we want to pass this parameter?
> >>>>    
> >>  [...]    
> >>>>
> >>>> Yes.  The VFPR netdevs don't have any HW queues associated with them and 
> >>>> we would like
> >>>> to use the PF queues for the xmit.
> >>>> I was also looking into some way of passing the port id via skb 
> >>>> parameter to the
> >>>> dev_queue_xmit() call so that the PF xmit routine can do a directed 
> >>>> transmit to a specifc VF.
> >>>> Is skb->cb an option to pass this info?
> >>>> dst_metadata approach would work  too if it is acceptable.    
> >>>
> >>> I don't think we can trust skb->cb to be set to anything meaningful
> >>> when the skb is received by the lower device.   
> >>
> >> Agreed. I wouldn't recommend using skb->cb. How about passing it through
> >> dev_queue_xmit_accel() through to the driver?
> >>
> >> If you pass the metadata through the dev_queue_xmit_accel() handle tx
> >> queue  selection would work using normal mechanisms (xps, select_queue,
> >> cls  hook, etc.). If you wanted to pick some specific queue based on
> >> policy the policy could be loaded into one of those hooks.  
> > 
> > Do you mean without extending how accel is handled by
> > dev_queue_xmit_accel() today?  If my goal is to not have extra HW
> > queues then I don't see how I could mux in the lower dev without extra
> > locking (as I tried to explain two emails ago).  Sorry for being slow
> > here :(
> >   
> 
> Not slow here I think I was overly optimistic...
> 
> Yeh let me try this, roughly the current flow is,
> 
>    dev_queue_xmit_accel(struct sk_buff *skb, void *accel_priv);
>    __dev_queue_xmit(skb, accel_priv);
>    netdev_pick_tx(dev, skb, accel_priv);
> 	ndo_select_queue(dev, skb, accel_priv, ...);
>    [...]
>    q->enqueue();
>    [...]
>    dev_hard_start_xmit();
>    [...]
>     <driver code here>
> 
> So in this flow the VFR netdev driver handles its xmit routine by
> calling dev_queue_xmit_accel after setting skb->dev to the physical
> device and passing a cookie via accel that the select_queue() routine
> can use to pick a tx queue. The rest of the stack q->enqueue() and
> friends will ensure that locking and qdisc is handled correctly.
> 
> But accel_priv was lost at queue selection and so its not being passed
> down to the driver so no way to set your descriptor bits or whatever
> needed to push to the VF. I was sort of thinking we could map it from
> the select_queue routine but I can't figure out how to do that either.
> 
> The metadata idea doesn't seem that bad now that I've spent some more
> time going through it. Either that or hijack some field in the skb but
> I think that might be worse than the proposal here.
> 
> I'm trying to think up some other alternative now and will let you know
> if I think of anything clever but got nothing at the moment.
	
Cool, I'm happy to discuss this further at netdev but it seems like
there is no strong opposition so far?

FWIW in the example I gave I didn't do refcounting on the dst but I
think that's incorrect since we don't have control over lifetime of
redirected/stolen skbs.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ