netdev - Re: [PATCH net-next v14 5/7] net: dsa: mv88e6xxx: rmu: Add functionality to get RMON

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20220922114820.hexazc2do5yytsu2@skbuf>
Date:   Thu, 22 Sep 2022 14:48:20 +0300
From:   Vladimir Oltean <olteanv@...il.com>
To:     Andrew Lunn <andrew@...n.ch>
Cc:     Mattias Forsblad <mattias.forsblad@...il.com>,
        netdev@...r.kernel.org, Vivien Didelot <vivien.didelot@...il.com>,
        Florian Fainelli <f.fainelli@...il.com>,
        "David S . Miller" <davem@...emloft.net>,
        Eric Dumazet <edumazet@...gle.com>,
        Jakub Kicinski <kuba@...nel.org>,
        Paolo Abeni <pabeni@...hat.com>, linux@...linux.org.uk,
        ansuelsmth@...il.com
Subject: Re: [PATCH net-next v14 5/7] net: dsa: mv88e6xxx: rmu: Add
 functionality to get RMON

On Tue, Sep 20, 2022 at 11:04:07PM +0200, Andrew Lunn wrote:
> On Tue, Sep 20, 2022 at 04:10:53PM +0300, Vladimir Oltean wrote:
> > On Tue, Sep 20, 2022 at 02:26:22PM +0200, Mattias Forsblad wrote:
> > > This whole shebang was a suggestion from Andrew. I had a solution with
> > > mv88e6xxx_rmu_available in mv88e6xxx_get_ethtool_stats which he wasn't fond of.
> > > The mv88e6xxx_bus_ops is declared const and how am I to change the get_rmon
> > > member? I'm not really sure on how to solve this in a better way?
> > > Suggestions any? Maybe I've misunderstood his suggestion.
> > 
> > Can you point me to the beginning of that exact suggestion? I've removed
> > everything older than v10 from my inbox, since the flow of patches was
> > preventing me from seeing other emails.
> 
> What i want to do is avoid code like:
> 
>      if (have_rmu())
>      	foo()
>      else
> 	bar()
> 
> There is nothing common in the MDIO MIB code and the RMU MIB code,
> just the table of statistics. When we get to dumping the ATU, i also
> expect there will be little in common between the MDIO and the RMU
> functions.

Sorry, I don't understand what it is about "if (have_rmu()) foo() else bar()"
that you don't like. Isn't the indirection via bus_ops the exact same
thing, but expressed as an indirect function call rather than an if/else?

Or is it that the if/else structure precludes the calling of bar() when
foo() fails? The bus_ops will suffer from the same problem.

> Doing MIB via RMU is a big gain, but i would also like normal register
> read and write to go via RMU, probably with some level of
> combining. Multiple writes can be combined into one RMU operation
> ending with a read. That should give us an mv88e6xxx_bus_ops which
> does RMU, and we can swap the bootstrap MDIO bus_ops for the RMU
> bus_ops.

At what level would the combining be done? I think the mv88e6xxx doesn't
really make use of bulk operations, C45 MDIO reads with post-increment,
that sort of thing. I could be wrong. And at some higher level, the
register read/write code should not diverge (too much), even if the
operation may be done over Ethernet or MDIO. So we need to find places
which actually make useful sense of bulk reads.

> But how do we mix RMU MIB and ATU dumps into this? My idea was to make
> them additional members of mv88e6xxx_bus_ops. The MDIO bus_ops
> structures would end up call the mv88e6xxx_ops method for MIB or
> ATU. The rmu bus_ops and directly call an RMU function to do it.
> 
> What is messy at the moment is that we don't have register read/write
> via RMU, so we have some horrible hybrid. We should probably just
> implement simple read and write, without combining, so we can skip
> this hybrid.

I see in the manual that there are some registers which aren't
available or not recommended over RMU, for example the PHY registers if
the PPU is disabled, or phylink methods for the upstream facing ports.
There are also less obvious things, like accessing the PTP clock
gettime(). This will surely change the delay characteristic that phc2sys
sees, I'm not sure if for the better or for the worse, but the SMI code
at least had some tuning made to it, so people might care. So bottom
line, I'm not sure whether we do enough to prevent these pitfalls by
simply creating blanket replacements for all register reads/writes and
not inspecting whether the code is fine after we do that.

On the other hand, having RMU operations might also bring subtle
benefits to phc2sys. I think there were some issues with the PTP code
having trouble getting MDIO bus access (because of bridge fdb dumps or
some other things happening in the background). This may also be an
opportunity to have parallel access to independent IP blocks within the
switch, one goes through MDIO and the other through Ethernet.

But then, Mattias' code structure becomes inadequate. Currently we
serialize mv88e6xxx_master_state_change() with respect to bus accesses
via mv88e6xxx_reg_lock(). But if we permit RMU to run in parallel with
MDIO, we need a rwlock, such that multiple 'readers' of the conceptual
have_rmu() function can run in parallel with each other, and just
serialize with the RMU state changes (the 'writers').

> I am assuming here that RMU is reliable. The QCA8K driver currently
> falls back to MDIO if its inband function is attempted but fails.  I
> want to stress this part, lots of data packets and see if the RMU
> frames get dropped, or delayed too much causing failures.

I don't think you even have to stress it too much. Nothing prevents the
user from putting a policer on the DSA master which will randomly drop
responses. Or a shaper that will delay requests beyond the timeout.

> If we do see
> failures, is a couple of retires enough? Or do we need to fallback to
> MDIO which should always work? If we do need to fallback, this
> structure is not going to work too well.

Consider our previous discussion about the switchdev prepare/commit
transactional structure, where the commit stage is not supposed to fail
even if it writes to hardware. You said that the writes are supposed to
be reliable, or else. Either they all work or none do. Not sure how that
is going to hold up with a transport such as Ethernet which has such a
wide arsenal of foot guns. I think that leaving the MDIO fallback in is
inevitable.

In terms of retries, I'm not sure. With the qca8k code structure:

	if (have_rmu()) {
		ret = foo();
		if (ret == 0)
			return 0;
	}

	return bar();

we won't have retries for the _current_ operation, but all further
operations will still try to use Ethernet first. So here, it seems to me
that the timeout needs to be tuned such that everything does not grind
down to a halt even if we have a lossy Ethernet channel.

Otherwise, we need to build some more (perhaps too advanced) logic. Have
"if (have_rmu() && rmu_works())", periodically re-check if RMU works
after a failure, etc.