lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <871qyykai9.fsf@waldekranz.com>
Date:   Sat, 19 Mar 2022 02:15:58 +0100
From:   Tobias Waldekranz <tobias@...dekranz.com>
To:     Vladimir Oltean <olteanv@...il.com>
Cc:     Marek BehĂșn <kabel@...nel.org>,
        Andrew Lunn <andrew@...n.ch>, netdev@...r.kernel.org
Subject: Re: mv88e6xxx broken on 6176 with "Disentangle STU from VTU"

On Sat, Mar 19, 2022 at 01:02, Vladimir Oltean <olteanv@...il.com> wrote:
> On Fri, Mar 18, 2022 at 10:16:55PM +0100, Tobias Waldekranz wrote:
>> On Fri, Mar 18, 2022 at 22:18, Vladimir Oltean <olteanv@...il.com> wrote:
>> > Hello Tobias,
>> >
>> > On Fri, Mar 18, 2022 at 08:20:33PM +0100, Tobias Waldekranz wrote:
>> >> On Fri, Mar 18, 2022 at 18:28, Marek BehĂșn <kabel@...nel.org> wrote:
>> >> > Hello Tobias,
>> >> >
>> >> > mv88e6xxx fails to probe in net-next on Turris Omnia, bisect leads to
>> >> > commit
>> >> >   49c98c1dc7d9 ("net: dsa: mv88e6xxx: Disentangle STU from VTU")
>> >> 
>> >> Oh wow, really sorry about that! I have it reproduced, and I understand
>> >> the issue.
>> >> 
>> >> > Trace:
>> >> >   mv88e6xxx_setup
>> >> >     mv88e6xxx_setup_port
>> >> >       mv88e6xxx_port_vlan_join(MV88E6XXX_VID_STANDALONE) OK
>> >> >       mv88e6xxx_port_vlan_join(MV88E6XXX_VID_BRIDGED) -EOPNOTSUPP
>> >> >
>> >> 
>> >> Thanks, that make it easy to find. There is a mismatch between what the
>> >> family-info struct says and what the chip-specific ops struct supports.
>> >> 
>> >> I'll try to send a fix ASAP.
>> >
>> > I've seen your patches, but I don't understand the problem they fix.
>> > For switches like 6190 indeed this is a problem. It has max_stu = 63 but
>> > mv88e6190_ops has no stu_getnext or stu_loadpurge. That I understand.
>> >
>> > But Marek reported the problem on 6176. There, max_sid is 0, so
>> > mv88e6xxx_has_stu() should already return false. Where is the
>> > -EOPNOTSUPP returned from?
>> 
>> Somewhat surprisingly, it is from mv88e6xxx_broadcast_setup.
>
> Sorry for the delay, I didn't notice the email because I was busy
> gathering my jaw from the floor after relistening some of Marc Martel's
> Queen covers.

Wow. He somehow manages to channel Freddie while still having his own
vibe too.

> This one looks a lot more plausible, let me see if I get it right below.
>
>> Ok, I'll go out on a limb and say that _now_ I know what the problem
>> is. If I uncomment .max_sid and .stu_{loadpurge,getnext} from my 6352
>> (which, like the 6176, is also of the Agate-family) I can reproduce the
>> same issue.
>> 
>> It seems like this family does not like to load VTU entries who's SID
>> points to an invalid STU entry. Since .max_sid == 0, we never run
>> stu_setup, which takes care of loading a valid STU entry for SID 0;
>> therefore when we read back MV88E6XXX_VID_BRIDGED in
>> mv88e6xxx_port_db_load_purge it is reported as invalid.
>> 
>> This still doesn't explain why we're able to load
>> MV88E6XXX_VID_STANDALONE though...
>
> Why doesn't it explain it? MV88E6XXX_VID_STANDALONE is 0, we have this
> code so it falls in the branch that doesn't call mv88e6xxx_vtu_get():
>
> 	if (vid == 0) {
> 		fid = MV88E6XXX_FID_BRIDGED;
> 	} else {
> 		err = mv88e6xxx_vtu_get(chip, vid, &vlan);
> 		if (err)
> 			return err;
>
> 		/* switchdev expects -EOPNOTSUPP to honor software VLANs */
> 		if (!vlan.valid)
> 			return -EOPNOTSUPP;
>
> 		fid = vlan.fid;
> 	}

Ah, yes, of course. We should really change that to

    if (vid == MV88E6XXX_VID_BRIDGED)

I guess we can also add an exception for MV88E6XXX_VID_STANDALONE now so
that we save a roundtrip to the VTU in those cases too. Anyway...

>> Vladimir, any advise on how to proceed here? I took a very conservative
>> approach to filling in the STU ops, only enabling it on HW that I could
>> test. I could study some datasheets and make an educated guess about the
>> full range of chips that we could enable this on, and which version of
>> the ops to use. Does that sound reasonable?
>
> Before, MV88E6XXX_G1_VTU_OP_STU_LOAD_PURGE was done from 2 places:
>
> mv88e6352_g1_vtu_loadpurge()
> mv88e6085_ops, mv88e6097_ops, mv88e6123_ops, mv88e6141_ops, mv88e6161_ops,
> mv88e6165_ops, mv88e6171_ops, mv88e6172_ops, mv88e6175_ops, mv88e6176_ops,
> mv88e6240_ops, mv88e6341_ops, mv88e6350_ops, mv88e6351_ops, mv88e6352_ops
>
> mv88e6390_g1_vtu_loadpurge()
> mv88e6190_ops, mv88e6190x_ops, mv88e6191_ops, mv88e6290_ops, mv88e6390_ops,
> mv88e6390x_ops, mv88e6393x_ops
>
> After the change, MV88E6XXX_G1_VTU_OP_STU_LOAD_PURGE is done only from the ops
> that have stu_loadpurge:
>
> mv88e6352_g1_stu_loadpurge()
> mv88e6097_ops, mv88e6352_ops
>
> mv88e6390_g1_stu_loadpurge()
> mv88e6390_ops, mv88e6390x_ops, mv88e6393x_ops
>
> So if I understand correctly, we have this regression for all families that are
> in the first group but not in the second group. I.e. a lot of families.

I don't have the hardware to test it, but I have now gone through the
the functional spec for each of these devices.

I have confirmed that all those using mv88e6352_g1_vtu_loadpurge support
the same STU operations and SID pool size (63).

Ditto for those using mv88e6390_g1_vtu_loadpurge.

> There's nothing wrong with being conservative, as long as you're a
> correct conservative. In this case, I believe that the switch families
> where you couldn't test MSTP should at least have a max_sid of 1, to
> allow SID 0 to be loaded. So you don't have to claim untested MSTP
> support. But then you may need to refine the guarding that allows for
> MSTP support, to check for > 1 instead of > 0.

Having now done the research, which I should have just done from the
beginning, I think the best approach is to just enable the regular MST
offloading for all supported chips, i.e. all chips with separated
VTU/STU.

Fair?

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ