netdev - Re: [PATCH net-next v4 08/11] net: dsa: realtek: clean user_mii

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <CAJq09z7iBht6KP3XRKzCkWHZwMMOR3O5UHO3hvPaSHRNjX67Ug@mail.gmail.com>
Date: Tue, 30 Jan 2024 15:17:50 -0300
From: Luiz Angelo Daros de Luca <luizluca@...il.com>
To: Andrew Lunn <andrew@...n.ch>
Cc: Arınç ÜNAL <arinc.unal@...nc9.com>, 
	Florian Fainelli <f.fainelli@...il.com>, Vladimir Oltean <olteanv@...il.com>, netdev@...r.kernel.org, 
	linus.walleij@...aro.org, alsi@...g-olufsen.dk, davem@...emloft.net, 
	edumazet@...gle.com, kuba@...nel.org, pabeni@...hat.com, ansuelsmth@...il.com
Subject: Re: [PATCH net-next v4 08/11] net: dsa: realtek: clean user_mii_bus setup

> > > >  From other discussions I've had, there seems to be interest in quite the
> > > > opposite thing, in fact. Reboot the SoC running Linux, but do not
> > > > disturb traffic flowing through the switch, and somehow pick up the
> > > > state from where the previous kernel left it.
> > >
> > > Yes this is actually an use case that is very dear to the users of DSA in an airplane. The entertainment system in the seat in front of you typically has a left, CPU/display and right set of switch ports. Across the 300+ units in the plane each entertainment systems runs STP to avoid loops being created when one of the display units goes bad. Occasionally cabin crew members will have to swap those units out since they tend to wear out. When they do, the switch operates in a headless mode and it would be unfortunate that plugging in a display unit into the network again would be disrupting existing traffic. I have seen out of tree patches doing that, but there was not a good way to make them upstream quality.
> >
> > This piqued my interest. I'm trying to understand how exactly plugging in a
> > display unit into the network would disrupt the traffic flow. Is this about
> > all network interfaces attached to the bridge interface being blocked when
> > a new link is established to relearn the changed topology?
>
> The hardware is split into two parts, a cradle and the display
> unit. The switch itself is in the cradle embedded in the seat
> back. The display unit contains the CPU, GPU, storage etc. There is a
> single Ethernet interface between the display unit and the cradle,
> along with MDIO, power, audio cables for the headphone jack etc.
>
> When you take out the display unit, you disconnect the switches
> management plain. The CPU has gone, and its the CPU running STP,
> sending and receiving BPDUs, etc. But the switch is still powered, and
> switching packets, keeping the network going, at least for a while.
>
> When you plug in a display unit, it boots. As typical for any computer
> booting, it assumes the hardware is in an unknown state, and hits the
> switch with a reset. That then kills the local networking, and it
> takes a little while of the devices around it to swap to a redundant
> path. The move from STP to RSTP has been made, which speeds this all
> up, but you do get some disruption.
>
> It can take a while for the display unit to boot into user space and
> reconfigure the switch. Its only when that is complete can the switch
> rejoin the network.
>
> Rather than hit the switch with a reset, it would be better to somehow
> suck the current configuration out of the switch and prime the Linux
> network stack with that configuration. But that is a totally alien
> concept to Linux.

This is quite a particular case. You'll need to update userland config
from the kernel state and the kernel state from the HW state. It's
upside down from what we normally see.
Anyway, we are far from that point in realtek DSA drivers. The DSA
driver actually resets the switch twice (HW and then SW) during setup.
Even vendor driver/lib states that without the initialization steps,
the switch behavior is undefined. And those steps would probably mess
with any existing switch state. Parsing the reg values into kernel
state is quite a complex task if you need to be usable for many
scenarios. And we still have opaque jam tables in the driver that I
would love to get rid of.

Now, back to the point I raised:

1) should we continue to HW reset the switch when the driver stops
controlling it?
2) If that is a yes, should we do that both for shutdown and poweroff?
3) Should we take additional precautions to lock the switch before
resetting it (as HW reset might be missing or misconfigured)?

I'll probably send the v5 without touching this topic but it is an
interesting point to think about, at least to assert the reset during
shutdown.

Regards,

Luiz