[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <562130EE.3040000@gmail.com>
Date: Fri, 16 Oct 2015 10:16:30 -0700
From: John Fastabend <john.fastabend@...il.com>
To: Scott Feldman <sfeldma@...il.com>, Jiri Pirko <jiri@...nulli.us>
CC: Netdev <netdev@...r.kernel.org>,
"David S. Miller" <davem@...emloft.net>,
Ido Schimmel <idosch@...lanox.com>,
Elad Raz <eladr@...lanox.com>,
Florian Fainelli <f.fainelli@...il.com>,
Guenter Roeck <linux@...ck-us.net>,
Vivien Didelot <vivien.didelot@...oirfairelinux.com>,
"andrew@...n.ch" <andrew@...n.ch>,
David Laight <David.Laight@...lab.com>,
"stephen@...workplumber.org" <stephen@...workplumber.org>
Subject: Re: [patch net-next v5 3/8] switchdev: allow caller to explicitly
request attr_set as deferred
On 15-10-16 09:20 AM, Scott Feldman wrote:
> On Fri, Oct 16, 2015 at 1:23 AM, Jiri Pirko <jiri@...nulli.us> wrote:
>> Thu, Oct 15, 2015 at 05:21:22PM CEST, john.fastabend@...il.com wrote:
>>> On 15-10-14 10:40 AM, Jiri Pirko wrote:
>>>> From: Jiri Pirko <jiri@...lanox.com>
>>>>
>>>> Caller should know if he can call attr_set directly (when holding RTNL)
>>>> or if he has to defer the att_set processing for later.
>>>>
>>>> This also allows drivers to sleep inside attr_set and report operation
>>>> status back to switchdev core. Switchdev core then warns if status is
>>>> not ok, instead of silent errors happening in drivers.
>>>>
>>>> Benefit from newly introduced switchdev deferred ops infrastructure.
>>>>
>>>
>>> A nit but the patch description should note your setting the defer bit
>>> on the bridge set state.
>>>
>>>> Signed-off-by: Jiri Pirko <jiri@...lanox.com>
>>>> ---
>>>> include/net/switchdev.h | 1 +
>>>> net/bridge/br_stp.c | 3 +-
>>>> net/switchdev/switchdev.c | 108 ++++++++++++++++++----------------------------
>>>> 3 files changed, 46 insertions(+), 66 deletions(-)
>>>>
>>>> diff --git a/include/net/switchdev.h b/include/net/switchdev.h
>>>> index d1c7f90..f7de6f8 100644
>>>> --- a/include/net/switchdev.h
>>>> +++ b/include/net/switchdev.h
>>>> @@ -17,6 +17,7 @@
>>>>
>>>> #define SWITCHDEV_F_NO_RECURSE BIT(0)
>>>> #define SWITCHDEV_F_SKIP_EOPNOTSUPP BIT(1)
>>>> +#define SWITCHDEV_F_DEFER BIT(2)
>>>>
>>>> struct switchdev_trans_item {
>>>> struct list_head list;
>>>> diff --git a/net/bridge/br_stp.c b/net/bridge/br_stp.c
>>>> index db6d243de..80c34d7 100644
>>>> --- a/net/bridge/br_stp.c
>>>> +++ b/net/bridge/br_stp.c
>>>> @@ -41,13 +41,14 @@ void br_set_state(struct net_bridge_port *p, unsigned int state)
>>>> {
>>>> struct switchdev_attr attr = {
>>>> .id = SWITCHDEV_ATTR_ID_PORT_STP_STATE,
>>>> + .flags = SWITCHDEV_F_DEFER,
>>>> .u.stp_state = state,
>>>> };
>>>
>>>
>>> This creates a possible race (with 6/8) I think, please check!
>>
>> Wait. This patch does not change the previous behaviour. Patch 6 does,
>> so I don't understand why you are asking here. Confusing.
>>
>>
>>>
>>> In del_nbp() we call br_stp_disable_port() to set the port state
>>> to BR_STATE_DISABLE and disabling learning events. But with this
>>> patch it can be deferred. Also note the STP agent may be in userspace
>>> which actually seems more likely the case because you likely want to
>>> run some more modern variant of STP than the kernel supports.
>>>
>>> So at some point in the future the driver will turn off learning. At
>>> the same time we call br_fdb_delete_by_port which calls a deferred
>>> set of fdb deletes.
>>>
>>> I don't see how you guarantee learning is off before you start doing
>>> the deletes here and possibly learning new addresses after the software
>>> side believes the port is down.
>>>
>>> So
>>>
>>> br_stp_disable_port
>>> br_fdb_delete_by_port
>>> {fdb_del_external_learn}
>>> [hw learns a fdb]
>>> [hw disables learning]
>>>
>>> What stops this from happening?
>>
>> Okay. This behaviour is the same as without the patchset. What would
>> resolve the issue it to put switchdev_deferred_process() after
>> br_stp_disable_port() and before br_fdb_delete_by_port() call.
>> That would enforce stp change to happen in hw before fdbs are explicitly
>> deleted. Sound good to you?
>
> Doesn't HW already see things in the right order since items are
> dequeued from the deferred list in the order queued?
>
I just noticed there is the rtnl lock around the deferred process so
this should mean only a single deferred task is run at a time. OK sort
of a big hammer though.
+static void switchdev_deferred_process_work(struct work_struct *work)
+{
+ rtnl_lock();
+ switchdev_deferred_process();
+ rtnl_unlock();
+}
But I don't see how this block of code works,
spin_lock_bh(&br->lock);
br_stp_disable_port(p);
spin_unlock_bh(&br->lock);
br_ifinfo_notify(RTM_DELLINK, p);
list_del_rcu(&p->list);
nbp_vlan_flush(p);
br_fdb_delete_by_port(br, p, 0, 1);
switchdev_deferred_process();
If you defer disabling learning then setup a bunch of deletes also
deferred, what guarantees that learning was disabled before you setup
the deferred fdb delete list?
It seems you may call rocker_port_fdb_learn_work() after the
br_fdb_delete_by_port call above but before the deferred learn disable
call which would have disabled it. Also the rocker_port_fdb_learn_work
queue is on a lw->work and doesn't use the same locking?
Its sort of an interesting hieararchy of work queues being built up
here. I'm being a bit overly paranoid at this point because I've seen
some really ugly bugs around these types of things that are difficult
to spot because they are correctness bugs instead of hard crashes.
.John
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists