lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CACRpkdbq03ZXcB-TaBp5Udo3M47rb-o+LfkEkC-gA1+=x1Zd-g@mail.gmail.com>
Date:   Sun, 29 Oct 2023 23:15:58 +0100
From:   Linus Walleij <linus.walleij@...aro.org>
To:     Vladimir Oltean <olteanv@...il.com>
Cc:     DENG Qingfang <dqfext@...il.com>,
        Mauri Sandberg <sandberg@...lfence.com>,
        Andrew Lunn <andrew@...n.ch>,
        Florian Fainelli <f.fainelli@...il.com>,
        "David S. Miller" <davem@...emloft.net>,
        Eric Dumazet <edumazet@...gle.com>,
        Jakub Kicinski <kuba@...nel.org>,
        Paolo Abeni <pabeni@...hat.com>, netdev@...r.kernel.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH] dsa: tag_rtl4_a: Bump min packet size

Hi Vladimir,

first: THANKS, because you ALWAYS ask the *right* questions,
and I quickly get closer to the real solution!

On Sun, Oct 29, 2023 at 12:04 AM Vladimir Oltean <olteanv@...il.com> wrote:
> On Fri, Oct 27, 2023 at 10:21:39PM +0200, Linus Walleij wrote:

> > It was reported that the "LuCI" web UI was not working properly
> > with a device using the RTL8366RB switch. Disabling the egress
> > port tagging code made the switch work again, but this is not
> > a good solution as we want to be able to direct traffic to a
> > certain port.
> >
> > It turns out that sometimes, but not always, small packets are
> > dropped by the switch for no reason.
>
> "For no reason" is a technical statement which means "an unspecific/inconclusive
> drop reason in the ethtool -S output on the conduit interface (which also
> shows the hardware counters of the CPU port", or is it just a figure of
> speech? If just a figure of speech, could you please determine which
> counter gets incremented when the switch drops packets?

I ran ethtool -S on the switch port before and after trying to access the web
UI (LuCI) on the router with the chip, i.e. putting http://192.168.1.1 into
the address bar of a browser and hitting enter three times.

Then I ran diff -ur on the two outputs (before and after) and get this
disturbing and symmetric stat:

-     Dot1dTpPortInDiscards: 0
+     Dot1dTpPortInDiscards: 3

(Counters defined in drivers/net/dsa/realtek/rtl8366rb.c)

> What user port is being targeted when the switch drops packets? Any user
> port, or just specific ones?

I tried on lan0, lan1, lan2 and lan3 (DSA ports 0,1,2,3): same result.
On each of the ports, the same Dot1dTpPortInDiscards counter
goes up.

> What protocol headers do those packets that are dropped have?

HTTP

> Is it size
> that they have in common, I wonder (given that you say that small
> packets are not always dropped), or is it something else?

I got it wrong, it's big packets getting dropped, not small ones... :(

Some tcpdump:ing gives at hand that the problem is that the
httpd is sending 1500 byte packages. Anything over 1496 fails
in ping tests.

1496 is suspiciously much 1500 - DSA tag size. However the
MTU of the parent ethernet is bumped nicely to 1504 and the
device MTU is set up to accomodate it as well.

Modifying the patch to just pad out packets >= 1496 bytes
solves the problem in a better way, but maybe that is not the
last thing we try here...

I'll resend the patch with more elaborate commit log and test
description in the commit.

> > Fixes: 0e90dfa7a8d8 ("net: dsa: tag_rtl4_a: Fix egress tags")
>
> Have you actually checked out this sha1sum and confirmed that the packet
> drop can be reproduced there? Ideally you could also go back to a bit
> earlier, to commit 9eb8bc593a5e ("net: dsa: tag_rtl4_a: fix egress tags")
> (this is a different commit from Qingfang with the same description) and
> test on user port 0 only?

Yes it should be an earlier commit indeed. I'll fix!

Yours,
Linus Walleij

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ