netdev - Re: [PATCH net v6 3/3] net: dsa: microchip: monitor potential faults in half-duplex mode

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <3d5e9b4d-d87c-4376-b698-4cccdbfb81ff@savoirfairelinux.com>
Date: Fri, 14 Jun 2024 17:14:48 +0200
From: Enguerrand de Ribaucourt <enguerrand.de-ribaucourt@...oirfairelinux.com>
To: Andrew Lunn <andrew@...n.ch>
Cc: netdev@...r.kernel.org, hkallweit1@...il.com, linux@...linux.org.uk,
 woojung.huh@...rochip.com, UNGLinuxDriver@...rochip.com, horms@...nel.org,
 Tristram.Ha@...rochip.com, Arun.Ramadoss@...rochip.com
Subject: Re: [PATCH net v6 3/3] net: dsa: microchip: monitor potential faults
 in half-duplex mode

On 14/06/2024 15:37, Andrew Lunn wrote:
> On Fri, Jun 14, 2024 at 09:46:42AM +0000, Enguerrand de Ribaucourt wrote:
>> The errata DS80000754 recommends monitoring potential faults in
>> half-duplex mode for the KSZ9477 family.
>>
>> half-duplex is not very common so I just added a critical message
>> when the fault conditions are detected. The switch can be expected
>> to be unable to communicate anymore in these states and a software
>> reset of the switch would be required which I did not implement.
> 
> If i'm reading this code correctly, every 30 seconds it will test to
> see if the link is half duplex, and is so, log onetime this could lead
> to problems. Also, every 30 seconds, if the statistics counts indicate
> there has been a late collision, it will log a rate limited
> message. Given the 30 second poll interval, rate limiting is probably
> pointless, and every one will get logged. The last print, i have no
> idea what resource you are talking about. Will it also likely print
> once every 30 seconds?

The MIB statistics are read every 5 seconds. It is defined in:
	dev->mib_read_interval = msecs_to_jiffies(5000);
So indeed, _ratelimit having a 5 second back-off, it is redundant.

The second print is about two resources:
  - Packet Memory Available Block Count (PMAVBC)
  - TX Queue Blocks Used Count (TXQBU/QM_TX_CNT)
According to the errata, they are leaky in half duplex mode. Once 
depleted, packets can't be exchanged.

Like the late collision counter, these are not going to go back to 
normal values without a reset. So yes, the critical message will keeping 
being displayed. Do you think a dev_crit_once would be more appropriate?

> 
> Is the idea here, we want to notice when this is happening, and get an
> idea if it is worth implementing the software reset? Do we want to add
> a "Please report if you see this" to the commit message or the log
> messages themselves?

I'm fine with just monitoring and printing in my use case. My devices 
are unlikely to encounter half-duplex peers and the app can recover. I 
simply need to guarantee I can detect this condition. For my use case, 
implementing the software reset wasn't worth the effort. One of the main 
obstacles is that the VLAN table would need to be reconfigured, so it 
would require remembering it from the driver. Also transmission would 
inevitably be stopped for a few seconds.

For generic users, I think the warning is necessary since it indicates 
the switch will stop functioning after prolonged use in half-duplex. We 
could elaborate the warning message to point to the datasheet. That way 
users could understand why their configuration is affected, what are the 
risks, and possible avoidance or recovery mechanisms. If this errata is 
critical to someone who can't avoid it, then we could implement the 
software reset.

Proposal for a v7:
  - dev_crit_ratelimit -> dev_crit_once
  - dev_warn_once -> point to datasheet + add "report if needed"

What is your opinion on this?

Thanks,

> 
> 	Andrew

-- 
Savoir-faire Linux
Enguerrand de Ribaucourt