[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YRLCUc8NZnRZFUFs@shredder>
Date: Tue, 10 Aug 2021 21:15:45 +0300
From: Ido Schimmel <idosch@...sch.org>
To: Jakub Kicinski <kuba@...nel.org>
Cc: Andrew Lunn <andrew@...n.ch>, netdev@...r.kernel.org,
davem@...emloft.net, mkubecek@...e.cz, pali@...nel.org,
vadimp@...dia.com, mlxsw@...dia.com,
Ido Schimmel <idosch@...dia.com>
Subject: Re: [RFC PATCH net-next 2/8] ethtool: Add ability to reset
transceiver modules
On Tue, Aug 10, 2021 at 06:54:23AM -0700, Jakub Kicinski wrote:
> On Tue, 10 Aug 2021 16:05:07 +0300 Ido Schimmel wrote:
> > > Again, i'm wondering, why is user space doing the reset? Can you think
> > > of any other piece of hardware where Linux relies on user space
> > > performing a reset before the kernel can properly use it?
> > >
> > > How long does a reset take? Table 10-1 says the reset pulse must be
> > > 10uS and table 10-2 says the reset should not take longer than
> > > 2000ms.
> >
> > Takes about 1.5ms to get an ACK on the reset request and another few
> > seconds to ensure module is in a valid operational state (will remove
> > RTNL in next version).
>
> Hm. RTNL-lock-less ethtool ops are a little concerning. The devlink
> locking was much complicated by the unclear locking rules. Can we keep
> ethtool simple and put this functionality in a different API or make
> the reset async?
I thought there are already RTNL-lock-less ethtool ops, but maybe I
imagined it. Given that we also want to support firmware update on
modules and that user space might want to update several modules
simultaneously, do you have a suggestion on how to handle it from
locking perspective? The ethtool netlink backend has parallel ops, but
RTNL is a problem. Firmware flashing is currently synchronous in both
ethtool and devlink, but the latter does not hold RTNL.
>
> > > So maybe reset it on ifup if it is in a bad state?
> >
> > We can have multiple ports (split) using the same module and in CMIS
> > each data path is controlled by a different state machine. Given the
> > complexity of these modules and possible faults, it is possible to
> > imagine a situation in which a few ports are fine and the rest are
> > unable to obtain a carrier.
> >
> > Resetting the module on ifup of swp1s0 is not intuitive and it shouldn't
> > affect other split ports (e.g., swp1s1). With the dedicated reset
> > command we have the ability to enforce all the required restrictions
> > from the start instead of changing the behavior of existing commands.
>
> Sounds similar to what ethtool --reset was trying to address (upper
> 16 bits). Could we reuse that? Whether its a SFP or other part of the
> port being reset may not be entirely important to the user, so perhaps
> it's not a bad idea to abstract that away and stick to "reset levels"?
Wasn't aware of this API. Looks like it is only implemented by a few
drivers, but man page says "phy Transceiver/PHY", so I think we can
reuse it.
What do you mean by "reset levels"? The split between shared /
dedicated?
Just to make sure I understand, you suggest the following semantics?
# ethtool --reset swp1s0 phy
Error since module is used by several ports (split)
# ethtool --reset swp1s0 phy-shared
OK
# ethtool --reset swp1 phy
OK (no split)
# ethtool --reset swp1 phy-shared
OK
Powered by blists - more mailing lists