lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 10 Aug 2021 21:15:45 +0300
From:   Ido Schimmel <idosch@...sch.org>
To:     Jakub Kicinski <kuba@...nel.org>
Cc:     Andrew Lunn <andrew@...n.ch>, netdev@...r.kernel.org,
        davem@...emloft.net, mkubecek@...e.cz, pali@...nel.org,
        vadimp@...dia.com, mlxsw@...dia.com,
        Ido Schimmel <idosch@...dia.com>
Subject: Re: [RFC PATCH net-next 2/8] ethtool: Add ability to reset
 transceiver modules

On Tue, Aug 10, 2021 at 06:54:23AM -0700, Jakub Kicinski wrote:
> On Tue, 10 Aug 2021 16:05:07 +0300 Ido Schimmel wrote:
> > > Again, i'm wondering, why is user space doing the reset? Can you think
> > > of any other piece of hardware where Linux relies on user space
> > > performing a reset before the kernel can properly use it?
> > > 
> > > How long does a reset take? Table 10-1 says the reset pulse must be
> > > 10uS and table 10-2 says the reset should not take longer than
> > > 2000ms.  
> > 
> > Takes about 1.5ms to get an ACK on the reset request and another few
> > seconds to ensure module is in a valid operational state (will remove
> > RTNL in next version).
> 
> Hm. RTNL-lock-less ethtool ops are a little concerning. The devlink
> locking was much complicated by the unclear locking rules. Can we keep
> ethtool simple and put this functionality in a different API or make
> the reset async?

I thought there are already RTNL-lock-less ethtool ops, but maybe I
imagined it. Given that we also want to support firmware update on
modules and that user space might want to update several modules
simultaneously, do you have a suggestion on how to handle it from
locking perspective? The ethtool netlink backend has parallel ops, but
RTNL is a problem. Firmware flashing is currently synchronous in both
ethtool and devlink, but the latter does not hold RTNL.

> 
> > > So maybe reset it on ifup if it is in a bad state?  
> > 
> > We can have multiple ports (split) using the same module and in CMIS
> > each data path is controlled by a different state machine. Given the
> > complexity of these modules and possible faults, it is possible to
> > imagine a situation in which a few ports are fine and the rest are
> > unable to obtain a carrier.
> > 
> > Resetting the module on ifup of swp1s0 is not intuitive and it shouldn't
> > affect other split ports (e.g., swp1s1). With the dedicated reset
> > command we have the ability to enforce all the required restrictions
> > from the start instead of changing the behavior of existing commands.
> 
> Sounds similar to what ethtool --reset was trying to address (upper
> 16 bits). Could we reuse that? Whether its a SFP or other part of the
> port being reset may not be entirely important to the user, so perhaps
> it's not a bad idea to abstract that away and stick to "reset levels"?

Wasn't aware of this API. Looks like it is only implemented by a few
drivers, but man page says "phy    Transceiver/PHY", so I think we can
reuse it.

What do you mean by "reset levels"? The split between shared /
dedicated?

Just to make sure I understand, you suggest the following semantics?

# ethtool --reset swp1s0 phy
Error since module is used by several ports (split)

# ethtool --reset swp1s0 phy-shared
OK

# ethtool --reset swp1 phy
OK (no split)

# ethtool --reset swp1 phy-shared
OK

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ