netdev - Re: [PATCH 1/6] bridge: learn dst metadata in FDB

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <88c29511-ec61-8eba-3d37-8f467cef42eb@cumulusnetworks.com>
Date:   Thu, 17 Aug 2017 16:04:20 +0300
From:   Nikolay Aleksandrov <nikolay@...ulusnetworks.com>
To:     David Lamparter <equinox@...c24.net>
Cc:     netdev@...r.kernel.org, amine.kherbouche@...nd.com,
        roopa@...ulusnetworks.com, stephen@...workplumber.org,
        "bridge@...ts.linux-foundation.org" 
        <bridge@...ts.linux-foundation.org>
Subject: Re: [PATCH 1/6] bridge: learn dst metadata in FDB

On 17/08/17 15:45, David Lamparter wrote:
> On Thu, Aug 17, 2017 at 02:39:43PM +0300, Nikolay Aleksandrov wrote:
>> On 17/08/17 14:03, David Lamparter wrote:
>>> On Wed, Aug 16, 2017 at 11:38:06PM +0300, Nikolay Aleksandrov wrote:
>>>> On 16/08/17 20:01, David Lamparter wrote:
>>>> and hitting the fast path for everyone in a few different places for a
>>>> feature that the majority will not use does not sound acceptable to
>>>> me. We've been trying hard to optimize it, trying to avoid additional
>>>> cache lines, removing tests and keeping special cases to a minimum. 
>>>
>>> skb->dst is on the same cacheline as skb->len.
>>> fdb->md_dst is on the same cacheline as fdb->dst.
>>> Both will be 0 in a lot of cases, so this should be two null checks on
>>> data that is hot in the cache.  Are you sure this is an actual problem?
>>>
>>
>> Sure - no, I haven't benchmarked it, but I don't see skb->len being on
>> the same cache line as _skb_refdst assuming 64 byte cache lines.
>> But again any special cases, in my opinion, should be handled on their own,
>> it is both about the fast path and the code complexity that they bring in.
> 
> (separate thread)
> 
> [cut]
>>> I really hope you're not suggesting the entire MDB with IPv4 & IPv6
>>> snooping be duplicated into both VPLS and mac80211?
>>
>> Code can always be shared if there are more users, no need to stuff
>> everything in the bridge,
> 
> The MDB code is far from trivial, has several configuration knobs, and
> even sends its own queries if configured to do so.  It can also use
> quite a bit of memory of there's a nontrivial number of multicast
> groups.  I *really* think it shouldn't be duplicated.
> 
>> but I'm not that familiar with this case, once patches are out I can
>> comment further.
> 
> I've pushed my hacks to:
> https://github.com/eqvinox/vpls-linux-kernel/commits/mdb-hack
> (top two commits)
> 
> THIS IS ABSOLUTELY A PROOF OF CONCEPT.  It doesn't un-learn dst
> metadata, it probably leaks buckets, and it may kill your cat.
> (I haven't pushed my attempts at mac80211, because I haven't gotten
> anywhere useful there just yet.)
> 
>>>> As you've noted this is only an RFC so I will not point out every issue, but there seems
>>>> to be a major problem with br_fdb_update(), note that it runs without any locks except RCU.
>>>
>>> Right, Thanks! ... I only thought about concurrent access, forgetting
>>> about concurrent modification...  I'll replace it with an xchg I think.
>>> (No need for a lock that way)
>>
>> I think you can still lose references to a dst that way, what if someone changes the
>> dst you read before the xchg and you xchg it ?
> 
> The dst to be released is the return from (atomic) xchg, not the value
> read earlier for comparison.  This can happen in parallel, but apart
> from a little extra churn in the update case it has no ill effects.
> 
> If someone changes it in the meantime, they have new dst information for
> the fdb entry, and so do we.  With xchg'ing it, either one of the
> updates will stick and the other will be properly released.  Considering
> that there is no correct ordering here (either packet could be processed
> a nanosecond later or earlier), this is perfectly fine as an outcome.

Yep right you are, my bad.

> 
> 
> -David
>