lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20091223200948.GF760@caffeine.csclub.uwaterloo.ca>
Date:	Wed, 23 Dec 2009 15:09:48 -0500
From:	lsorense@...lub.uwaterloo.ca (Lennart Sorensen)
To:	Anton Vorontsov <avorontsov@...mvista.com>
Cc:	Lennart Sorensen <lsorense@...lub.uwaterloo.ca>,
	linux-kernel@...r.kernel.org, linuxppc-dev@...abs.org,
	netdev@...r.kernel.org, leoli@...escale.com
Subject: Re: ucc_geth broken in 2.6.32 by
	864fdf884e82bacbe8ca5e93bd43393a61d2e2b4

On Wed, Dec 23, 2009 at 09:04:15PM +0300, Anton Vorontsov wrote:
> On Wed, Dec 23, 2009 at 12:40:19PM -0500, Lennart Sorensen wrote:
> > We use the ucc_geth for 6 ports (4 100Mbit and 2 Gbit ports) on an
> > mpc8360e.  Up to 2.6.31 this worked fine.  2.6.32 on the other hand
> > crashes very quickly after boot.
> 
> Hm. Just curious, what CPU revision you use? So that I could try
> to reproduce the issue... I have MPC8360E-MDS boards, rev 2.0 and
> rev rev 2.1 CPUs, Marvell PHYs. I also have MPC8360E-RDK (rev 2.1).
> And I didn't see any issues with this patch.

Hmm, I have an MDS around, so I could probably try and see if it is
reproduceable there.

CPU:   e300c1, MPC8360EA, Rev: 2.1 at 533.333 MHz, CSB: 266.667 MHz
BOARD: 12-86-0016-001 [00000000]                                   
UPMA:  Configured for Compact Flash PIO Mode                        
I2C:   ready                                                       
SPI:   ready                                                       
DRAM:  512 MB                                                      
MTEST: Testing data bus                                            
MTEST: Testing address bus                                         
MTEST: PASS                                                        
BCSR:  Ver A000B                                                   
FLASH: 32 MB

The PHY we use is KSZ8001LS for the 100Mbit ports, and on the port
that usually fails first is a marvel 88E1111 running fixed gigabit link
(converts to serdes before connecting to a switch chip).

I am actually surprised this code should matter at all given out of the
6 ethernet ports only two actually go to external ports, the rest are
fixed link.

If relevant here are some chunks from the dtb related to ethernet:

        aliases {
                ethernet0 = &enet0;
                ethernet1 = &enet1;
                ethernet2 = &enet2;
                ethernet3 = &enet3;
                ethernet4 = &enet4;
                ethernet5 = &enet5;
                serial0 = &serial0;
                serial1 = &serial1;
        };

                mdio {
                        compatible = "virtual,mdio-gpio";
                        #address-cells = <1>;
                        #size-cells = <0>;
                        /* MDC = PG6 MDIO = PG7 */
                        gpios = <&qe_pio_g 6 0 &qe_pio_g 7 0>;

                        phy0: ethernet-phy@00 {
                                reg = <0x18>;
                                device_type = "ethernet-phy";
                        };
                        phy1: ethernet-phy@01 {
                                reg = <0x06>;
                                device_type = "ethernet-phy";
                        };
                        phy2: ethernet-phy@02 {
                                reg = <0x0F>;
                                device_type = "ethernet-phy";
                        };
                        phy3: ethernet-phy@03 {
                                reg = <0x17>;
                                device_type = "ethernet-phy";
                        };
                };

                /* UCC5 Linux fe-cm-1 */
                enet0: ethernet@...0 {
                        device_type = "network";
                        compatible = "ucc_geth";
                        cell-index = <0x5>;
                        reg = <0x2400 0x200>;
                        interrupts = <0x28>;
                        interrupt-parent = <&qeic>;
                        local-mac-address = [ 00 00 00 00 00 00 ]; /* [U-BOOT] */
                        rx-clock-name = "clk16";
                        tx-clock-name = "clk14";
                        phy-handle = <&phy2>;
                        phy-connection-type = "mii";
                        pio-handle = <&ucc5>;
                };
                /* UCC 6 Linux fe-em-1 */
                enet1: ethernet@...0 {
                        device_type = "network";
                        compatible = "ucc_geth";
                        cell-index = <0x6>;
                        reg = <0x3400 0x200>;
                        interrupts = <0x29>;
                        interrupt-parent = <&qeic>;
                        local-mac-address = [ 00 00 00 00 00 00 ]; /* [U-BOOT] */
                        rx-clock-name = "clk7";
                        tx-clock-name = "clk8";
                        phy-handle = <&phy3>;
                        phy-connection-type = "mii";
                        pio-handle = <&ucc6>;
                };
                /* UCC 1 & 3  1000BT Data Plane dp0 */
                enet2: ethernet@...0 {
                        device_type = "network";
                        compatible = "ucc_geth";
                        cell-index = <0x1>;
                        reg = <0x2000 0x200>;
                        interrupts = <0x20>;
                        interrupt-parent = <&qeic>;
                        local-mac-address = [ 00 00 00 00 00 00 ]; /* [U-BOOT] */
                        rx-clock-name = "none";
                        tx-clock-name = "clk9";
                        fixed-link = <0x03 1 1000 0 0>; /* P3 on data plane switch on SM */
                        phy-connection-type = "gmii";
                        pio-handle = <&ucc1>;
                };
                /* UCC 7 100BT Data Plane dp1 */
                enet3: ethernet@...0 {
                        device_type = "network";
                        compatible = "ucc_geth";
                        cell-index = <0x7>;
                        reg = <0x2600 0x200>;
                        interrupts = <0x2a>;
                        interrupt-parent = <&qeic>;
                        local-mac-address = [ 00 00 00 00 00 00 ]; /* [U-BOOT] */
                        rx-clock-name = "clk20";
                        tx-clock-name = "clk19";
                        fixed-link = <0x02 1 100 0 0>; /* P2 on data-plane switch on SM */
                        phy-connection-type = "mii";
                        pio-handle = <&ucc7>;
                };
                /* UCC 2 & 4 1000 BT Control Plane cp0 */
                enet4: ethernet@...0 {
                        device_type = "network";
                        compatible = "ucc_geth";
                        cell-index = <0x2>;
                        reg = <0x3000 0x200>;
                        interrupts = <0x21>;
                        interrupt-parent = <&qeic>;
                        local-mac-address = [ 00 00 00 00 00 00 ]; /* [U-BOOT] */
                        rx-clock-name = "none";
                        tx-clock-name = "clk4";
                        tx-thread-num = <2>; /* Reduced to allow all enet to come up */
                        fixed-link = <10 1 1000 0 0>; /* P10 on control-plane switch on CM */
                        phy-connection-type = "gmii";
                        pio-handle = <&ucc2>;
                };
                /* UCC 8 100 BT Control Plane cp1 */
                enet5: ethernet@...0 {
                        device_type = "network";
                        compatible = "ucc_geth";
                        cell-index = <0x8>;
                        reg = <0x3600 0x200>;
                        interrupts = <0x2b>;
                        interrupt-parent = <&qeic>;
                        local-mac-address = [ 00 00 00 00 00 00 ]; /* [U-BOOT] */
                        rx-clock-name = "clk5";
                        tx-clock-name = "clk21";
                        fixed-link = <9 1 100 0 0>; /* P9 on control-plane switch on CM */
                        phy-connection-type = "mii";
                        pio-handle = <&ucc8>;
                };

> I don't see any point in reverting, because it will surely break my boards.
> So, we need to fix this, not just hide.

I agree with that.

> > unless someone can fix it.
> 
> Well, I'm ready to help you with debugging.

Excellent.

> > Now I must add that I run with the xenomai/adeos-ipipe patches as well,
> > which do change interrupt handling a little,
> 
> It could be that it takes too long to stop the UCC, and xenomai
> makes the timings worse, so the watchdog triggers.
> 
> Can you try the following patch?

Sure thing.

> diff --git a/drivers/net/ucc_geth.c b/drivers/net/ucc_geth.c
> index afaf088..2f73e3f 100644
> --- a/drivers/net/ucc_geth.c
> +++ b/drivers/net/ucc_geth.c
> @@ -1563,6 +1563,8 @@ static int ugeth_disable(struct ucc_geth_private *ugeth, enum comm_dir mode)
>  
>  static void ugeth_quiesce(struct ucc_geth_private *ugeth)
>  {
> +	netif_device_detach(ugeth->ndev);
> +
>  	/* Wait for and prevent any further xmits. */
>  	netif_tx_disable(ugeth->ndev);
>  
> @@ -1577,7 +1579,7 @@ static void ugeth_activate(struct ucc_geth_private *ugeth)
>  {
>  	napi_enable(&ugeth->napi);
>  	enable_irq(ugeth->ug_info->uf_info.irq);
> -	netif_tx_wake_all_queues(ugeth->ndev);
> +	netif_device_attach(ugeth->ndev);
>  }
>  
>  /* Called every time the controller might need to be made

So there result is:

Unable to handle kernel paging request for data at address 0x00000058
Faulting instruction address: 0xc024f2fc
Oops: Kernel access of bad area, sig: 11 [#1]
RC8360 CM
Modules linked in: rclibapi xeno_native max6369_wdt ucc_geth_driver spi_mpc8xxx ltc4215 lm75
NIP: c024f2fc LR: e30aa0a4 CTR: c024f2e8
REGS: df857ca0 TRAP: 0300   Not tainted  (2.6.32-trunk-8360e)
MSR: 00009032 <EE,ME,IR,DR>  CR: 44042088  XER: 00000000
DAR: 00000058, DSISR: 20000000
TASK = df848c90[4] 'events/0' THREAD: df856000
GPR00: e30aa0a4 df857d50 df848c90 00000000 00000640 00000001 c0428df4 dfa40b80
GPR08: 000000c8 e30ad2b8 df084360 c024f2e8 44042082 1001af90 e30ad2b8 00000000
GPR16: 00000048 00000001 00000000 00000000 df08436c df08440c 00000190 df08455c
GPR24: df0844ec df0842c0 df084000 180005ea dfa40b80 00000000 df0842c0 00000000
NIP [c024f2fc] skb_recycle_check+0x14/0x100
LR [e30aa0a4] ucc_geth_poll+0xd8/0x4e0 [ucc_geth_driver]
Call Trace:
[df857d50] [c000b03c] __ipipe_grab_irq+0x3c/0xa4 (unreliable)
[df857d60] [e30aa0a4] ucc_geth_poll+0xd8/0x4e0 [ucc_geth_driver]
[df857dd0] [c0258cf8] net_rx_action+0xf8/0x1b8
[df857e10] [c0032a38] __do_softirq+0xb8/0x13c
[df857e60] [c00065cc] do_softirq+0xa0/0xac
[df857e70] [c003280c] irq_exit+0x7c/0x9c
[df857e80] [c000b1f4] __ipipe_do_IRQ+0x50/0x68
[df857ea0] [c0064098] __ipipe_sync_stage+0x21c/0x24c
[df857ee0] [c0064a4c] __ipipe_unstall_root+0x5c/0x60
[df857ef0] [c005fb4c] enable_irq+0x88/0xc4
[df857f10] [e30a73d0] adjust_link+0x1fc/0x2f0 [ucc_geth_driver]
[df857f40] [c0209c0c] phy_state_machine+0x3a4/0x610
[df857f60] [c003fec4] worker_thread+0x124/0x1a4
[df857fc0] [c0044310] kthread+0x78/0x7c
[df857ff0] [c0013c30] kernel_thread+0x4c/0x68
Instruction dump:
4e800020 4800fcb5 4bffff50 4802782d 4bffffc8 48076dd9 4bffff6c 9421fff0
7c0802a6 93e1000c 7c7f1b78 90010014 <80030058> 2f800000 409e00cc 81630068
Kernel panic - not syncing: Fatal exception in interrupt
Rebooting in 180 seconds..

So it seems as soon as it touched the phy settings, it blew up.

Now perhaps the ipipe interrupt handling code changes the behaviour
enough that it makes the spinlock_irqsave necesary and that is why it
worked before.  Maybe ipipe is doing something wrong, given ipipe only
supports 2.6.30 for powerpc and I personally ported it to 2.6.31 and
then 2.6.32.  Everything else works fine though as long as I revert this
particular patch.

-- 
Len Sorensen
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ