[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20220819001355.7kw6rm5bf257huc2@skbuf>
Date: Fri, 19 Aug 2022 03:13:55 +0300
From: Vladimir Oltean <olteanv@...il.com>
To: Andrew Lunn <andrew@...n.ch>
Cc: Vladimir Oltean <vladimir.oltean@....com>, netdev@...r.kernel.org,
Vivien Didelot <vivien.didelot@...il.com>,
Florian Fainelli <f.fainelli@...il.com>,
"David S. Miller" <davem@...emloft.net>,
Eric Dumazet <edumazet@...gle.com>,
Jakub Kicinski <kuba@...nel.org>,
Paolo Abeni <pabeni@...hat.com>,
"Rafael J. Wysocki" <rafael@...nel.org>,
Kevin Hilman <khilman@...nel.org>,
Ulf Hansson <ulf.hansson@...aro.org>,
Len Brown <len.brown@...el.com>, Pavel Machek <pavel@....cz>,
Greg Kroah-Hartman <gregkh@...uxfoundation.org>
Subject: Re: [RFC PATCH net-next 00/10] Use robust notifiers in DSA
On Fri, Aug 19, 2022 at 12:35:07AM +0200, Andrew Lunn wrote:
> > So you think that rollback at the cross-chip notifier layer is a new
> > problem we need to tackle, because we don't have enough transactional
> > layering in the code?
>
> No, i don't think it is a new problem, but it might help explain why
> you don't feel quite right about it. Some errors we simply don't care
> about because we cannot do anything about it. Other errors we should
> try to rollback, and hence need robust notifiers for those errors.
So most of the actual errors I've had to handle in the kernel were
caused by half the code (the callee) expecting one thing, and half the
code (the caller) providing another. That doesn't fit well in neither of
your categories, but it's more like how to best treat the unexpected.
And I'm not talking unexpected as in
switchdev dsa
----------------------------------------------------------------------
- Please add MAC 00:01:02:03:04:05
to the FDB
- Whoa, after all these years, I
never knew you could speak!
but rather
switchdev dsa
----------------------------------------------------------------------
- Please add MAC 00:01:02:03:04:05
to the FDB
- Sure thing, man!
- Please delete MAC 00:01:02:03:04:05
from the FDB
- Aye!
- Please delete MAC 00:01:02:03:04:05
from the FDB
- Wait, what MAC 00:01:02:03:04:05?
I have no such thing!
- Wha?
- Wha?
There's nothing to do about that except to wait for Mr Developer to come
and debug, and the severity of the problem might be low even though the
problem is just as intractable programmatically as a hardware I/O error.
Nonetheless it's still indicative of a problem worth propagating as high
as possible, because one side of the code had expectations of what the
other side could do that were clearly violated, so their models of the
other side are wrong.
This patch set makes that worse for Mr Developer that gets to debug,
because it makes dsa_port_fdb_del() return void, and the errors will now
get reported to the console at the level of dsa_port_notify() and then
suppressed. dsa_port_notify(), being at the low cross-chip notifier
level, won't print all the gory details of the FDB entry that failed to
be deleted and on what port, it will just say that the operation failed
and return void.
That's what felt wrong for me doing this conversion.
Powered by blists - more mailing lists