netdev - Re: [PATCH v3 net-next 2/4] net: dsa: Link aggregation support

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <878sa663m2.fsf@waldekranz.com>
Date:   Wed, 09 Dec 2020 23:01:25 +0100
From:   Tobias Waldekranz <tobias@...dekranz.com>
To:     Vladimir Oltean <olteanv@...il.com>
Cc:     davem@...emloft.net, kuba@...nel.org, andrew@...n.ch,
        vivien.didelot@...il.com, f.fainelli@...il.com,
        j.vosburgh@...il.com, vfalico@...il.com, andy@...yhouse.net,
        netdev@...r.kernel.org
Subject: Re: [PATCH v3 net-next 2/4] net: dsa: Link aggregation support

On Wed, Dec 09, 2020 at 18:04, Vladimir Oltean <olteanv@...il.com> wrote:
> On Wed, Dec 09, 2020 at 03:11:41PM +0100, Tobias Waldekranz wrote:
>> On Wed, Dec 09, 2020 at 12:53, Vladimir Oltean <olteanv@...il.com> wrote:
>> > And I think that .port_lag_change passes more arguments than needed to
>> > the driver.
>> 
>> You mean the `struct netdev_lag_lower_state_info`? Fine by me, it was
>> mostly to avoid hiding state from the driver if anyone needed it.
>
> There are two approaches really.
> Either you extract the useful information from it, which you already do,
> and in that case you don't need to provide the structure from the netdev
> notifier to the driver, or you let the driver pick it up. Either way is
> fine with me, but having both seems redundant.

Consider it cut.

>> >> > I don't think the DSA switch tree is private to anyone.
>> >>
>> >> Well I need somewhere to store the association from LAG netdev to LAG
>> >> ID. These IDs are shared by all chips in the tree. It could be
>> >> replicated on each ds of course, but that does not feel quite right.
>> >
>> > The LAG ID does not have significance beyond the mv88e6xxx driver, does
>> > it? And even there, it's just a number. You could recalculate all IDs
>> > dynamically upon every join/leave, and they would be consistent by
>> > virtue of the fact that you use a formula which ensures consistency
>> > without storing the LAG ID anywhere. Like, say, the LAG ID is to be
>> > determined by the first struct dsa_port in the DSA switch tree that has
>> > dp->bond == lag_dev. The ID itself can be equal to (dp->ds->index *
>> > MAX_NUM_PORTS + dp->index). All switches will agree on what is the first
>> > dp in dst, since they iterate in the same way, and any LAG join/leave
>> > will notify all of them. It has to be like this anyway.
>> 
>> This will not work for mv88e6xxx. The ID is not just an internal number
>> used by the driver. If that was the case we could just as well use the
>> LAG netdev pointer for this purpose. This ID is configured in hardware,
>> and it is shared between blocks in the switch, we can not just
>> dynamically change them. Neither can we use your formula since this is a
>> 4-bit field.
>> 
>> Another issue is how we are going to handle this in the tagger now,
>> since we can no longer call dsa_lag_dev_by_id. I.e. with `struct
>> dsa_lag` we could resolve the LAG ID (which is the only source
>> information we have in the tag) to the corresponding netdev. This
>> information is now only available in mv88e6xxx driver. I am not sure how
>> I am supposed to conjure it up. Ideas?
>
> Yeah, ok, I get your point. I was going to say that:
> (a) accessing the bonding interface in the fast path seems to be a
>     mv88e6xxx problem
> (b) the LAG IDs that you are making DSA fabricate are not necessarily
>     the ones that other drivers will use, due to various other
>     constraints (e.g. ocelot)

It is not the Fibonacci sequence or anything, it is an integer in the
range 0..num_lags-1. I realize that some hardware probably allocate IDs
from some shared (and thus possibly non-contiguous) pool. Maybe ocelot
works like that. But it seems reasonable to think that at least some
other drivers could make use of a linear range.

> so basically my point was that I think you are adding a lot of infra in
> core DSA that only mv88e6xxx will use.

Message received.

> My practical proposal would have been to keep a 16-entry bonding pointer
> array private in mv88e6xxx, which you could retrieve via:
>
> 	struct dsa_port *cpu_dp = netdev->dsa_ptr;
> 	struct dsa_switch *ds = cpu_dp->ds;
> 	struct mv88e6xxx_chip *chip = ds->priv;
>
> 	skb->dev = chip->lags[source_port];
>
> This is just how I would do it. It would not be the first tagger that
> accesses driver private data prior to knowing the net device. It would,
> however, mean that we know for sure that the mv88e6xxx device will
> always be top-of-tree in a cascaded setup. And this is in conflict with
> what I initially said, that the dst is not private to anyone.

Indeed. I am trying to keep up.

> I think that there is a low practical risk that the assumption will not
> hold true basically forever. But I also see why you might like your
> approach more. Maybe Vivien, Andrew, Florian could also chime in and we
> can see if struct dsa_lag "bothers" anybody else except me (bothers in
> the sense that it's an unnecessary complication to hold in DSA). We
> could, of course, also take the middle ground, which would be to keep
> the 16-entry array of bonding net_device pointers in DSA, and you could
> still call your dsa_lag_dev_by_id() and pretend it's generic, and that
> would just look up that table. Even with this middle ground, we are
> getting rid of the port lists and of the reference counting, which is
> still a welcome simplification in my book.

Yeah I agree that we can trim it down to just the array. Going beyond
that point, i.e. doing something like how sja1105 works, is more painful
but possible if Andrew can live with it.