lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <40e73820-6e67-5cde-b492-bfbcba64caeb@aoifes.com>
Date:   Tue, 4 Oct 2016 17:37:13 +0200
From:   Jose Antonio Delgado Alfonso <jose.delgado@...fes.com>
To:     netdev@...r.kernel.org
Subject: [ISSUE: mv88e6xxx]: Down/Up link and not forwarding

We are working in an ARMv7 embedded system running kernel 4.1 but
including patches to upgrade dsa/mv88e6xxx to kernel version 4.3
(5acf4d0, Wed, 27 May 2015 15:32:15 -0700) "[PATCH] blk: rq_data_dir()
should not return a boolean."

This is the schema of the system.

 +-------------------+ eth0
 |                   +--+
 |                   |  |
 | Embedded system   +--+
 |                   |
 |      ARMv7        |
 |                   | Marvell 88E8057(sky2)     +-------------+
 |                   +--+                     +--+             +--+ eth1
 |                   |  +---------------------+  |             |  +------+
 |                   +--+      CPU port       +--+  mv88e6176  +--+
 +------+--+---------+                           |             |
emulated|  |                                     |             |
GPIO    +--+                                  +--+             +--+ eth2
MDIO      +-----------------------------------+  |             |  +------+
                              MDIO            +--+             +--+
                                                 +-------------+

There is a bridge (br-lan) which includes eth0/eth1/eth2

>>From time to time, We are seeing a link down and up of about 1s.
Following the message that kernel sends.

[  312.769399] dsa dsa@0 eth2: Link is Down
[  312.773372] br-lan: port 3(eth2) entered disabled state
[  312.947274] dsa dsa@0 eth2: link up, 100 Mb/s, full duplex, flow
control disabled
[  312.963807] br-lan: port 3(eth2) entered forwarding state
[  312.969276] br-lan: port 3(eth2) entered forwarding state
[  313.777815] dsa dsa@0 eth2: Link is Up - 100Mbps/Full - flow control
rx/tx
[  314.966277] br-lan: port 3(eth2) entered forwarding state

Moreover, under a reboot loop test which consists in booting the system,
ping the unit and, if it responds, reboot again, we found that the
bridge does not forward packages after many reboots.
Looking into 88e6176 registers we saw the following

    GLOBAL GLOBAL2   0    1    2    3    4    5    6 
 0:  c820       0  de0f 5d0f 500f 500f 500f 4e07 4007
 1:     3       0    3e    3    3    3    3    3    3
 2:     0    ffff     0    0    0    0    0    0    0
 3:     0    ffff  1761 1761 1761 1761 1761 1761 1761
 4:  6000     258  373f  433  430  433  433  433  433
 5:  1000    c12f     0    0    0    0    0    0    0
 6:  c000    1f0f  101e 3005 3003 4001 5001 6001 7001
 7:     0    707f     0    0    0    0    0    0    0
 8:     0    7800  2480 2480 2480 2480 2480 2480 2480
 9:     0    1600     1    1    1    1    1    1    1
 a:   148       0     0    0    0    0    0    0    0
 b:  6000    1000     1    2    4    8   10   20   40
 c:     0      22     0    0    0    0    0    0    0
 d:  ffff     507     0    0    0    0    0    0    0
 e:  ffff      36     0    0    0    0    0    0    0
 f:  ffff     f00  dada dada dada dada dada dada dada
10:     0       0     0    0    0    0    0    0    0
11:     0       0     0    0    0    0    0    0    0
12:  5555       0     0    0    0    0    0    0    0
13:  5555       0   34d 8b18  54d    0    0    0    0
14:  aaaa     400     0    0    0    0    0    0    0
15:  aaaa       0     0    0    0    0    0    0    0
16:  ffff       0    33   33   33   33   33   33    0
17:  ffff       0     0    0    0    0    0    0    0
18:  fa41    1884  3210 3210 3210 3210 3210 3210 3210
19:     0     5e1  7654 7654 7654 7654 7654 7654 7654
1a:     0       0     0    0    0    0    0    0    0
1b:   1fc    f869  8000 8000 8000 8000 8000 8000 8000
1c:     0    4c00     0    0    0    0    0    0    0
1d:  5ce0       0     0    0    0    0    0    0    0
1e:     0       0     0    0    0    0    0    0    0
1f:     0       0     0    0    0    0    0    0    0

The main difference is GLOBAL2 5th register. When the unit is just
initialized, the driver sets this register to 00ff, however, when the
issue happens, its value is c12f.
We got a patch which allows us to set registers values. If we change
c12f to 00ff the ping works, otherwise, ping does not work. We do not
know who is changing the register value. Apparently, driver does not.

Weirderif possible, sometimes even global2 5th register is set to 00ff
and bridge does not forward packages either. We have not sorted out
which other register is affecting.

Finally, The weirdest behaviour we are seeing is that the unit does not
detect a link change, register 0 of ports 1 and 2 do not update their
status.

Have you experienced a similar issue in your side?

Is it possible that those micro-outage could be the reason of bad
settings in Global2 5th register?

Have you fixed this issues in a newer Linux kernel version?

Thanks in advance.


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ