netdev - Re: [PATCH v2 net-next 0/5] Make SWITCHDEV_FDB_{ADD,DEL}_TO

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <fbfce8d7-5bd8-723a-8ab3-0ce5bc6b073a@nvidia.com>
Date:   Sun, 22 Aug 2021 12:12:02 +0300
From:   Nikolay Aleksandrov <nikolay@...dia.com>
To:     Ido Schimmel <idosch@...sch.org>
Cc:     Vladimir Oltean <olteanv@...il.com>,
        Vladimir Oltean <vladimir.oltean@....com>,
        netdev@...r.kernel.org, Jakub Kicinski <kuba@...nel.org>,
        "David S. Miller" <davem@...emloft.net>,
        Roopa Prabhu <roopa@...dia.com>, Andrew Lunn <andrew@...n.ch>,
        Florian Fainelli <f.fainelli@...il.com>,
        Vivien Didelot <vivien.didelot@...il.com>,
        Vadym Kochan <vkochan@...vell.com>,
        Taras Chornyi <tchornyi@...vell.com>,
        Jiri Pirko <jiri@...dia.com>, Ido Schimmel <idosch@...dia.com>,
        UNGLinuxDriver@...rochip.com,
        Grygorii Strashko <grygorii.strashko@...com>,
        Marek Behun <kabel@...ckhole.sk>,
        DENG Qingfang <dqfext@...il.com>,
        Kurt Kanzenbach <kurt@...utronix.de>,
        Hauke Mehrtens <hauke@...ke-m.de>,
        Woojung Huh <woojung.huh@...rochip.com>,
        Sean Wang <sean.wang@...iatek.com>,
        Landen Chao <Landen.Chao@...iatek.com>,
        Claudiu Manoil <claudiu.manoil@....com>,
        Alexandre Belloni <alexandre.belloni@...tlin.com>,
        George McCollister <george.mccollister@...il.com>,
        Ioana Ciornei <ioana.ciornei@....com>,
        Saeed Mahameed <saeedm@...dia.com>,
        Leon Romanovsky <leon@...nel.org>,
        Lars Povlsen <lars.povlsen@...rochip.com>,
        Steen Hegelund <Steen.Hegelund@...rochip.com>,
        Julian Wiedmann <jwi@...ux.ibm.com>,
        Karsten Graul <kgraul@...ux.ibm.com>,
        Heiko Carstens <hca@...ux.ibm.com>,
        Vasily Gorbik <gor@...ux.ibm.com>,
        Christian Borntraeger <borntraeger@...ibm.com>,
        Ivan Vecera <ivecera@...hat.com>,
        Vlad Buslov <vladbu@...dia.com>,
        Jianbo Liu <jianbol@...dia.com>,
        Mark Bloch <mbloch@...dia.com>, Roi Dayan <roid@...dia.com>,
        Tobias Waldekranz <tobias@...dekranz.com>,
        Vignesh Raghavendra <vigneshr@...com>,
        Jesse Brandeburg <jesse.brandeburg@...el.com>
Subject: Re: [PATCH v2 net-next 0/5] Make SWITCHDEV_FDB_{ADD,DEL}_TO_DEVICE
 blocking

On 22/08/2021 09:48, Ido Schimmel wrote:
> On Sat, Aug 21, 2021 at 02:36:26AM +0300, Nikolay Aleksandrov wrote:
>> On 20/08/2021 20:06, Vladimir Oltean wrote:
>>> On Fri, Aug 20, 2021 at 07:09:18PM +0300, Ido Schimmel wrote:
>>>> On Fri, Aug 20, 2021 at 12:37:23PM +0300, Vladimir Oltean wrote:
>>>>> On Fri, Aug 20, 2021 at 12:16:10PM +0300, Ido Schimmel wrote:
>>>>>> On Thu, Aug 19, 2021 at 07:07:18PM +0300, Vladimir Oltean wrote:
>>>>>>> Problem statement:
>>>>>>>
>>>>>>> Any time a driver needs to create a private association between a bridge
>>>>>>> upper interface and use that association within its
>>>>>>> SWITCHDEV_FDB_{ADD,DEL}_TO_DEVICE handler, we have an issue with FDB
>>>>>>> entries deleted by the bridge when the port leaves. The issue is that
>>>>>>> all switchdev drivers schedule a work item to have sleepable context,
>>>>>>> and that work item can be actually scheduled after the port has left the
>>>>>>> bridge, which means the association might have already been broken by
>>>>>>> the time the scheduled FDB work item attempts to use it.
>>>>>>
>>>>>> This is handled in mlxsw by telling the device to flush the FDB entries
>>>>>> pointing to the {port, FID} when the VLAN is deleted (synchronously).
>>>>>
>>>>> Again, central solution vs mlxsw solution.
>>>>
>>>> Again, a solution is forced on everyone regardless if it benefits them
>>>> or not. List is bombarded with version after version until patches are
>>>> applied. *EXHAUSTING*.
>>>
>>> So if I replace "bombarded" with a more neutral word, isn't that how
>>> it's done though? What would you do if you wanted to achieve something
>>> but the framework stood in your way? Would you work around it to avoid
>>> bombarding the list?
>>>
>>>> With these patches, except DSA, everyone gets another queue_work() for
>>>> each FDB entry. In some cases, it completely misses the purpose of the
>>>> patchset.
>>>
>>> I also fail to see the point. Patch 3 will have to make things worse
>>> before they get better. It is like that in DSA too, and made more
>>> reasonable only in the last patch from the series.
>>>
>>> If I saw any middle-ground way, like keeping the notifiers on the atomic
>>> chain for unconverted drivers, I would have done it. But what do you do
>>> if more than one driver listens for one event, one driver wants it
>>> blocking, the other wants it atomic. Do you make the bridge emit it
>>> twice? That's even worse than having one useless queue_work() in some
>>> drivers.
>>>
>>> So if you think I can avoid that please tell me how.
>>>
>>
>> Hi,
>> I don't like the double-queuing for each fdb for everyone either, it's forcing them
>> to rework it asap due to inefficiency even though that shouldn't be necessary. In the
>> long run I hope everyone would migrate to such scheme, but perhaps we can do it gradually.
> 
> The fundamental problem is that these operations need to be deferred in
> the first place. It would have been much better if user space could get
> a synchronous feedback.
> 
> It all stems from the fact that control plane operations need to be done
> under a spin lock because the shared databases (e.g., FDB, MDB) or
> states (e.g., STP) that they are updating can also be updated from the
> data plane in softIRQ.
> 

Right, but changing that, as you've noted below, would require moving
the delaying to the bridge, I'd like to avoid that.

> I don't have a clean solution to this problem without doing a surgery in
> the bridge driver. Deferring updates from the data plane using a work
> queue and converting the spin locks to mutexes. This will also allow us
> to emit netlink notifications from process context and convert
> GFP_ATOMIC to GFP_KERNEL.
> 
> Is that something you consider as acceptable? Does anybody have a better
> idea?
> 

Moving the delays to the bridge for this purpose does not sound like a good solution,
I'd prefer the delaying to be done by the interested third party as in this case rather
than the bridge. If there's a solution that avoids delaying and doesn't hurt the software
fast-path then of course I'll be ok with that.
 
>> For most drivers this is introducing more work (as in processing) rather than helping
>> them right now, give them the option to convert to it on their own accord or bite
>> the bullet and convert everyone so the change won't affect them, it holds rtnl, it is blocking
>> I don't see why not convert everyone to just execute their otherwise queued work.
>> I'm sure driver maintainers would appreciate such help and would test and review it. You're
>> halfway there already..
>>
>> Cheers,
>>  Nik
>>
>>
>>
>>
>>
>>
>>
>>