netdev - Re: [PATCH net-next v14 5/7] net: dsa: mv88e6xxx: rmu: Add functionality to get RMON

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20220922130452.v2yhykduhbpdw3mi@skbuf>
Date:   Thu, 22 Sep 2022 16:04:52 +0300
From:   Vladimir Oltean <olteanv@...il.com>
To:     Andrew Lunn <andrew@...n.ch>
Cc:     Mattias Forsblad <mattias.forsblad@...il.com>,
        netdev@...r.kernel.org, Vivien Didelot <vivien.didelot@...il.com>,
        Florian Fainelli <f.fainelli@...il.com>,
        "David S . Miller" <davem@...emloft.net>,
        Eric Dumazet <edumazet@...gle.com>,
        Jakub Kicinski <kuba@...nel.org>,
        Paolo Abeni <pabeni@...hat.com>, linux@...linux.org.uk,
        ansuelsmth@...il.com
Subject: Re: [PATCH net-next v14 5/7] net: dsa: mv88e6xxx: rmu: Add
 functionality to get RMON

On Thu, Sep 22, 2022 at 02:45:34PM +0200, Andrew Lunn wrote:
> > > Doing MIB via RMU is a big gain, but i would also like normal register
> > > read and write to go via RMU, probably with some level of
> > > combining. Multiple writes can be combined into one RMU operation
> > > ending with a read. That should give us an mv88e6xxx_bus_ops which
> > > does RMU, and we can swap the bootstrap MDIO bus_ops for the RMU
> > > bus_ops.
> > 
> > At what level would the combining be done? I think the mv88e6xxx doesn't
> > really make use of bulk operations, C45 MDIO reads with post-increment,
> > that sort of thing. I could be wrong. And at some higher level, the
> > register read/write code should not diverge (too much), even if the
> > operation may be done over Ethernet or MDIO. So we need to find places
> > which actually make useful sense of bulk reads.
> 
> I was thinking within mv88e6xxx_read() and mv88e6xxx_write(). Keep a
> buffer for building requests. Each write call appends the write to the
> buffer and returns 0. A read call gets appended to the buffer and then
> executes the RMU. We probably also need to wrap the reg mutex, so that
> when it is released, any buffered writes get executed. If the RMU
> fails, we have all the information needed to do the same via MDIO.

Ah, so you want to make the mv88e6xxx_reg_unlock() become an implicit
write barrier.

That could work, but the trouble seems to be error propagation.
mv88e6xxx_write() will always return 0, the operation will be delayed
until the unlock, and mv88e6xxx_reg_unlock() does not return an error
code (why would it?).

> > But then, Mattias' code structure becomes inadequate. Currently we
> > serialize mv88e6xxx_master_state_change() with respect to bus accesses
> > via mv88e6xxx_reg_lock(). But if we permit RMU to run in parallel with
> > MDIO, we need a rwlock, such that multiple 'readers' of the conceptual
> > have_rmu() function can run in parallel with each other, and just
> > serialize with the RMU state changes (the 'writers').
> 
> I don't think we can allow RMU to run in parallel to MDIO. The reg
> lock will probably prevent that anyway.

Well, I was thinking the locking could get rearchitected, but it seems
you have bigger plans for it, so it becomes even more engrained in the
driver :)

> > > I am assuming here that RMU is reliable. The QCA8K driver currently
> > > falls back to MDIO if its inband function is attempted but fails.  I
> > > want to stress this part, lots of data packets and see if the RMU
> > > frames get dropped, or delayed too much causing failures.
> > 
> > I don't think you even have to stress it too much. Nothing prevents the
> > user from putting a policer on the DSA master which will randomly drop
> > responses. Or a shaper that will delay requests beyond the timeout.
> 
> That would be a self inflicted problem. But you are correct, we need
> to fall back to MDIO.

Here's one variation which is really not self inflicted. You have a 10G
CPU port, and 1G user ports. You use flow control on the DSA master to
avoid packet loss due to the 10G->1G rate adaptation. So the DSA master
goes periodically through states of TX congestion and holds back frames
until it goes away. This creates latency for packets in the TX queues,
including RMU requests, even if the RMU messages don't go to the
external ports. And even with a high skb->priority, you'd still need PFC
to avoid this problem. This can trip up the timeout timers we have for
RMU responses.

> This is one area we can experiment with. Maybe we can retry the
> operation via RMU a few times? Two retries for MIBs is still going to
> be a lot faster, if successful, compared to all the MDIO transactions
> for all the statistics. We can also add some fall back tracking
> logic. If RMU has failed for N times in a row, stop using it for 60
> seconds, etc. That might be something we can put into the DSA core,
> since it seems like a generic problem.

Or the driver might have a worker which periodically sends the GetID
message and tracks whether the switch responded. Maybe the rescheduling
intervals of that are dynamically adjusted based on feedback from
timeouts or successes of register reads/writes. In any case, now we're
starting to talk about really complex logic. And it's not clear how
effective any of these mechanisms would be against random and sporadic
timeouts rather than persistent issues.