lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 17 Nov 2016 15:36:48 +0100
From:   Hannes Frederic Sowa <hannes@...essinduktion.org>
To:     Ido Schimmel <idosch@...sch.org>
Cc:     Jiri Pirko <jiri@...nulli.us>, netdev@...r.kernel.org,
        davem@...emloft.net, idosch@...lanox.com, eladr@...lanox.com,
        yotamg@...lanox.com, nogahf@...lanox.com, arkadis@...lanox.com,
        ogerlitz@...lanox.com, roopa@...ulusnetworks.com,
        dsa@...ulusnetworks.com, nikolay@...ulusnetworks.com,
        andy@...yhouse.net, vivien.didelot@...oirfairelinux.com,
        andrew@...n.ch, f.fainelli@...il.com, alexander.h.duyck@...el.com,
        kuznet@....inr.ac.ru, jmorris@...ei.org, yoshfuji@...ux-ipv6.org,
        kaber@...sh.net
Subject: Re: [patch net-next 6/8] ipv4: fib: Add an API to request a FIB dump

On 17.11.2016 14:10, Ido Schimmel wrote:
> Hi Hannes,
> 
> On Wed, Nov 16, 2016 at 08:43:25PM +0100, Hannes Frederic Sowa wrote:
>> On 16.11.2016 19:51, Ido Schimmel wrote:
>>> On Wed, Nov 16, 2016 at 06:35:45PM +0100, Hannes Frederic Sowa wrote:
>>>> On 16.11.2016 16:18, Ido Schimmel wrote:
>>>>> On Wed, Nov 16, 2016 at 03:51:01PM +0100, Hannes Frederic Sowa wrote:
>>>>>> On 16.11.2016 15:09, Jiri Pirko wrote:
>>>>>>> From: Ido Schimmel <idosch@...lanox.com>
>>>>>>>
>>>>>>> Commit b90eb7549499 ("fib: introduce FIB notification infrastructure")
>>>>>>> introduced a new notification chain to notify listeners (f.e., switchdev
>>>>>>> drivers) about addition and deletion of routes.
>>>>>>>
>>>>>>> However, upon registration to the chain the FIB tables can already be
>>>>>>> populated, which means potential listeners will have an incomplete view
>>>>>>> of the tables.
>>>>>>>
>>>>>>> Solve that by adding an API to request a FIB dump. The dump itself it
>>>>>>> done using RCU in order not to starve consumers that need RTNL to make
>>>>>>> progress.
>>>>>>>
>>>>>>> Signed-off-by: Ido Schimmel <idosch@...lanox.com>
>>>>>>> Signed-off-by: Jiri Pirko <jiri@...lanox.com>
>>>>>>
>>>>>> Have you looked at potential inconsistencies resulting of RCU walking
>>>>>> the table and having concurrent inserts?
>>>>>
>>>>> Yes. I did try to think about situations in which this approach will
>>>>> fail, but I could only find problems with concurrent removals, which I
>>>>> addressed in 5/8. In case of concurrent insertions, even if you missed
>>>>> the node, you would still get the ENTRY_ADD event to your listener.
>>>>
>>>> Theoretically a node could still be installed while the deletion event
>>>> fired before registering the notifier. E.g. a synchronize_net before
>>>> dumping could help here?
>>>
>>> If the deletion event fired for some fib alias, then by 5/8 we are
>>> guaranteed that it was already unlinked from the fib alias list in the
>>> leaf in which it was contained. So, while it's possible we didn't
>>> register our listener in time for the deletion event, we won't traverse
>>> this fib alias while dumping the trie anyway. Did I understand you
>>> correctly?
>>>
>>
>> Theoretically we can have the same problem for insertion:
>>
>> You receive a delete event from the notifier that is queued up first but
>> the dump will still see the entry in the fib due to being managed by RCU
>> (the notifier running on another CPU).
>>
>> The problem is that the fib_remove_alias->hlist_del_rcu->WRITE_ONCE is
>> still not strongly ordered against the local fib dump trie walk.
> 
> OK, so I believe my analysis in 5/8 was wrong. Despite CPU A invoking
> fib_remove_alias() in t0 for fa1, it's possible for CPU B doing the fib
> dump to see fa1 in t1, which will lead to fa1 being permanently present
> in the listener's table.

Yep. :(

Also the delayed work queue is only partial ordered (only on one CPU),
so you don't know about when an event is processed from the dump and the
notifier. I think you need to create your own workqueue for doing so.

> Given the above, I think Dave's suggestion is the only applicable
> solution. Do you agree? Any other suggestions?

As I wrote before the problem with seqcounter is, that with a quagga
running on top and pushing updates you end up never getting a stable
view, so maybe some logic to abort in case it cannot be loaded after a
few tries would be fine.

The other way is the journal idea I had, which uses an rb-tree with
timestamps as keys (can be lamport timestamps). You insert into the tree
until the dump is finished and use it as queue later to shuffle stuff
into the hardware.

Because you should be able to hold refs to the fa_info or other structs
directly, I don't expect gigantic memory overhead, just for a cell to
store timetamp and pointer (rb_node).

Bye,
Hannes

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ