netdev - Re: [PATCH net-next v2] ipv4: fib: Replay events when registering FIB notifier

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20161101170345.pq2ewecw35mrurkp@splinter>
Date:   Tue, 1 Nov 2016 19:03:45 +0200
From:   Ido Schimmel <idosch@...sch.org>
To:     Roopa Prabhu <roopa@...ulusnetworks.com>
Cc:     Eric Dumazet <eric.dumazet@...il.com>, netdev@...r.kernel.org,
        davem@...emloft.net, jiri@...lanox.com, mlxsw@...lanox.com,
        dsa@...ulusnetworks.com, nikolay@...ulusnetworks.com,
        andy@...yhouse.net, vivien.didelot@...oirfairelinux.com,
        andrew@...n.ch, f.fainelli@...il.com, alexander.h.duyck@...el.com,
        kuznet@....inr.ac.ru, jmorris@...ei.org, yoshfuji@...ux-ipv6.org,
        kaber@...sh.net, Ido Schimmel <idosch@...lanox.com>
Subject: Re: [PATCH net-next v2] ipv4: fib: Replay events when registering
 FIB notifier

Hi Roopa,

On Tue, Nov 01, 2016 at 08:14:14AM -0700, Roopa Prabhu wrote:
> On 11/1/16, 7:19 AM, Eric Dumazet wrote:
> > On Tue, 2016-11-01 at 00:57 +0200, Ido Schimmel wrote:
> >> On Mon, Oct 31, 2016 at 02:24:06PM -0700, Eric Dumazet wrote:
> >>> How well will this work for large FIB tables ?
> >>>
> >>> Holding rtnl while sending thousands of skb will prevent consumers to
> >>> make progress ?
> >> Can you please clarify what do you mean by "while sending thousands of
> >> skb"? This patch doesn't generate notifications to user space, but
> >> instead invokes notification routines inside the kernel. I probably
> >> misunderstood you.
> >>
> >> Are you suggesting this be done using RCU instead? Well, there are a
> >> couple of reasons why I took RTNL here:
> >>
> > No, I do not believe RCU is wanted here, in control path where we might
> > sleep anyway.
> >
> >> 1) The FIB notification chain is blocking, so listeners are expected to
> >> be able to sleep. This isn't possible if we use RCU. Note that this
> >> chain is mainly useful for drivers that reflect the FIB table into a
> >> capable device and hardware operations usually involve sleeping.
> >>
> >> 2) The insertion of a single route is done with RTNL held. I didn't want
> >> to differentiate between both cases. This property is really useful for
> >> listeners, as they don't need to worry about locking in writer-side.
> >> Access to data structs is serialized by RTNL.
> > My concern was that for large iterations, you might hold RTNL and/or
> > current cpu for hundred of ms or even seconds...
> >
> I have the same concern as Eric here.
> 
> I understand why you need it, but can the driver request for an initial dump and that
> dump be made more efficient somehow ie not hold rtnl for the whole dump ?.
> instead of making the fib notifier registration code doing it.

We can do what we suggested in the last bi-weekly meeting, which is
still holding rtnl, but moving the hardware operation to delayed work.
This is possible because upper layers always assume operation was
successful and driver is responsible for invoking its abort mechanism in
case of failure.

> these routing table sizes can be huge and an analogy for this in user-space:
> We do request a netlink dump of  routing tables at initialization (on driver starts or resets)...
> but, existing netlink routing table dumps for that scale don't hold rtnl for the whole dump.
> The dump is split into multiple responses to the user and hence it does not starve other rtnl users.

In my reply to Eric I mentioned that when we register and unregister
from this chain the tables aren't really huge, but instead quite small.
I understand your concerns, but I don't wish to make things more
complicated than they should be only to address concerns that aren't
really realistic.

I believe current patch is quite simple and also consistent with other
notification chains in the kernel, such as the netdevice, where rtnl is
held during replay of events.
http://lxr.free-electrons.com/source/net/core/dev.c#L1535