lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 23 Aug 2021 19:11:57 +0300
From:   Vladimir Oltean <olteanv@...il.com>
To:     Ido Schimmel <idosch@...sch.org>
Cc:     Nikolay Aleksandrov <nikolay@...dia.com>,
        Vladimir Oltean <vladimir.oltean@....com>,
        netdev@...r.kernel.org, Jakub Kicinski <kuba@...nel.org>,
        "David S. Miller" <davem@...emloft.net>,
        Roopa Prabhu <roopa@...dia.com>, Andrew Lunn <andrew@...n.ch>,
        Florian Fainelli <f.fainelli@...il.com>,
        Vivien Didelot <vivien.didelot@...il.com>,
        Vadym Kochan <vkochan@...vell.com>,
        Taras Chornyi <tchornyi@...vell.com>,
        Jiri Pirko <jiri@...dia.com>, Ido Schimmel <idosch@...dia.com>,
        UNGLinuxDriver@...rochip.com,
        Grygorii Strashko <grygorii.strashko@...com>,
        Marek Behun <kabel@...ckhole.sk>,
        DENG Qingfang <dqfext@...il.com>,
        Kurt Kanzenbach <kurt@...utronix.de>,
        Hauke Mehrtens <hauke@...ke-m.de>,
        Woojung Huh <woojung.huh@...rochip.com>,
        Sean Wang <sean.wang@...iatek.com>,
        Landen Chao <Landen.Chao@...iatek.com>,
        Claudiu Manoil <claudiu.manoil@....com>,
        Alexandre Belloni <alexandre.belloni@...tlin.com>,
        George McCollister <george.mccollister@...il.com>,
        Ioana Ciornei <ioana.ciornei@....com>,
        Saeed Mahameed <saeedm@...dia.com>,
        Leon Romanovsky <leon@...nel.org>,
        Lars Povlsen <lars.povlsen@...rochip.com>,
        Steen Hegelund <Steen.Hegelund@...rochip.com>,
        Julian Wiedmann <jwi@...ux.ibm.com>,
        Karsten Graul <kgraul@...ux.ibm.com>,
        Heiko Carstens <hca@...ux.ibm.com>,
        Vasily Gorbik <gor@...ux.ibm.com>,
        Christian Borntraeger <borntraeger@...ibm.com>,
        Ivan Vecera <ivecera@...hat.com>,
        Vlad Buslov <vladbu@...dia.com>,
        Jianbo Liu <jianbol@...dia.com>,
        Mark Bloch <mbloch@...dia.com>, Roi Dayan <roid@...dia.com>,
        Tobias Waldekranz <tobias@...dekranz.com>,
        Vignesh Raghavendra <vigneshr@...com>,
        Jesse Brandeburg <jesse.brandeburg@...el.com>
Subject: Re: [PATCH v2 net-next 0/5] Make SWITCHDEV_FDB_{ADD,DEL}_TO_DEVICE
 blocking

On Mon, Aug 23, 2021 at 07:02:15PM +0300, Ido Schimmel wrote:
> > > Inside the work item you would do something like:
> > >
> > > spin_lock_bh()
> > > list_splice_init()
> > > spin_unlock_bh()
> > >
> > > mutex_lock() // rtnl or preferably private lock
> > > list_for_each_entry_safe()
> > > 	// process entry
> > > 	cond_resched()
> > > mutex_unlock()
> >
> > When is the work item scheduled in your proposal?
>
> Calling queue_work() whenever you get a notification. The work item
> might already be queued, which is fine.
>
> > I assume not only when SWITCHDEV_FDB_FLUSH_TO_DEVICE is emitted. Is
> > there some sort of timer to allow for some batching to occur?
>
> You can add an hysteresis timer if you want, but I don't think it's
> necessary. Assuming user space is programming entries at a high rate,
> then by the time you finish a batch, you will have a new one enqueued.

With the current model, nobody really stops any driver from doing that
if so they wish. No switchdev or bridge changes needed. We have maximum
flexibility now, with this async model. Yet it just so happens that no
one is exploiting it, and instead the existing options are poorly
utilized by most drivers.

> > > In del_nbp(), after br_fdb_delete_by_port(), the bridge will emit some
> > > new blocking event (e.g., SWITCHDEV_FDB_FLUSH_TO_DEVICE) that will
> > > instruct the driver to flush all its pending FDB notifications. You
> > > don't strictly need this notification because of the
> > > netdev_upper_dev_unlink() that follows, but it helps in making things
> > > more structured.
> > >
> > > Pros:
> > >
> > > 1. Solves your problem?
> > > 2. Pattern is not worse than what we currently have
> > > 3. Does not force RTNL
> > > 4. Allows for batching. For example, mlxsw has the ability to program up
> > > to 64 entries in one transaction with the device. I assume other devices
> > > in the same grade have similar capabilities
> > >
> > > Cons:
> > >
> > > 1. Asynchronous
> > > 2. Pattern we will see in multiple drivers? Can consider migrating it
> > > into switchdev itself at some point
> >
> > I can already flush_workqueue(dsa_owq) in dsa_port_pre_bridge_leave()
> > and this will solve the problem in the same way, will it not?
>
> Problem is that you will deadlock if your work item tries to take RTNL.

I think we agreed that the rtnl_lock could be dropped from driver FDB work items.
I have not tried that yet though.

> > It's not that I don't have driver-level solutions and hook points.
> > My concern is that there are way too many moving parts and the entrance
> > barrier for a new switchdev driver is getting higher and higher to
> > achieve even basic stuff.
>
> I understand the frustration, but that's my best proposal at the moment.
> IMO, it doesn't make things worse and has some nice advantages.

Reconsidering my options, I don't want to reduce the available optimizations
that other switchdev drivers can make, in the name of a simpler baseline.
I am also not smart enough for reworking the bridge data path.
I will probably do something like flush_workqueue in the PRECHANGEUPPER
handler, see what other common patterns there might be, and try to synthesize
them in library code (a la switchdev_handle_*) that can be used by drivers
that wish to, and ignored by drivers that don't.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ