[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20161102073502.GB1713@nanopsycho.orion>
Date:   Wed, 2 Nov 2016 08:35:02 +0100
From:   Jiri Pirko <jiri@...nulli.us>
To:     David Miller <davem@...emloft.net>
Cc:     roopa@...ulusnetworks.com, eric.dumazet@...il.com,
        idosch@...sch.org, netdev@...r.kernel.org, jiri@...lanox.com,
        mlxsw@...lanox.com, dsa@...ulusnetworks.com,
        nikolay@...ulusnetworks.com, andy@...yhouse.net,
        vivien.didelot@...oirfairelinux.com, andrew@...n.ch,
        f.fainelli@...il.com, alexander.h.duyck@...el.com,
        kuznet@....inr.ac.ru, jmorris@...ei.org, yoshfuji@...ux-ipv6.org,
        kaber@...sh.net, idosch@...lanox.com
Subject: Re: [PATCH net-next v2] ipv4: fib: Replay events when registering
 FIB notifier
Tue, Nov 01, 2016 at 04:36:50PM CET, davem@...emloft.net wrote:
>From: Roopa Prabhu <roopa@...ulusnetworks.com>
>Date: Tue, 01 Nov 2016 08:14:14 -0700
>
>> On 11/1/16, 7:19 AM, Eric Dumazet wrote:
>>> On Tue, 2016-11-01 at 00:57 +0200, Ido Schimmel wrote:
>>>> On Mon, Oct 31, 2016 at 02:24:06PM -0700, Eric Dumazet wrote:
>>>>> How well will this work for large FIB tables ?
>>>>>
>>>>> Holding rtnl while sending thousands of skb will prevent consumers to
>>>>> make progress ?
>>>> Can you please clarify what do you mean by "while sending thousands of
>>>> skb"? This patch doesn't generate notifications to user space, but
>>>> instead invokes notification routines inside the kernel. I probably
>>>> misunderstood you.
>>>>
>>>> Are you suggesting this be done using RCU instead? Well, there are a
>>>> couple of reasons why I took RTNL here:
>>>>
>>> No, I do not believe RCU is wanted here, in control path where we might
>>> sleep anyway.
>>>
>>>> 1) The FIB notification chain is blocking, so listeners are expected to
>>>> be able to sleep. This isn't possible if we use RCU. Note that this
>>>> chain is mainly useful for drivers that reflect the FIB table into a
>>>> capable device and hardware operations usually involve sleeping.
>>>>
>>>> 2) The insertion of a single route is done with RTNL held. I didn't want
>>>> to differentiate between both cases. This property is really useful for
>>>> listeners, as they don't need to worry about locking in writer-side.
>>>> Access to data structs is serialized by RTNL.
>>> My concern was that for large iterations, you might hold RTNL and/or
>>> current cpu for hundred of ms or even seconds...
>>>
>> I have the same concern as Eric here.
>> 
>> I understand why you need it, but can the driver request for an initial dump and that
>> dump be made more efficient somehow ie not hold rtnl for the whole dump ?.
>> instead of making the fib notifier registration code doing it.
>> 
>> these routing table sizes can be huge and an analogy for this in user-space:
>> We do request a netlink dump of  routing tables at initialization (on driver starts or resets)...
>> but, existing netlink routing table dumps for that scale don't hold rtnl for the whole dump.
>> The dump is split into multiple responses to the user and hence it does not starve other rtnl users.
>> 
>> In-fact I don't think netlink routing table dumps from user-space hold rtnl_lock for the whole dump.
>> IIRC this was done to allow route add/dels to be allowed in parallel for performance reasons.
>> (I will need to double check to confirm this).
>
>I've always had some reservations about using notifiers for getting
>the FIB entries down to the offloaded device.
Yeah, me too. But there is no really other way to do it. I thought about
it quite long. But maybe I missed something.
>
>And this problem is just another symptom that it is the wrong
>mechanism for propagating this information.
>
>As suggested by Roopa here, perhaps we're looking at the problem from
>the wrong direction.  We tend to design NDO ops and notifiers, to
>"push" things to the driver, but maybe something more like a push+pull
>model is better.
How do you imagine this mode should looks like? Could you draw me some
example?
Thanks!
Powered by blists - more mailing lists
 
