lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <1289944619.2677.22.camel@giskard.codewiz.org>
Date:	Tue, 16 Nov 2010 16:56:59 -0500
From:	Bernie Innocenti <bernie@...ewiz.org>
To:	Krzysztof Halasa <khc@...waw.pl>
Cc:	Ward Vandewege <ward@....org>, lkml <linux-kernel@...r.kernel.org>
Subject: Re: pc300too on a modern kernel?

[cc += lkml]

On Sun, 2010-11-14 at 22:33 +0100, Krzysztof Halasa wrote:
> Well... could be a hardware problem. I'd make sure the card sits well in
> the PCI slot.

Nope, the card works perfectly in the old box and the bug triggers in
any PCI slot of the new box (running the new kernel with the pc300too
driver).


> Also... it's rather improbable, but I'd look at the SCA-II chip. There
> were certain chips with a hardware bug which could cause such problems.
> Chips with Hitachi logo and "R" letter after the lot code were ok, and
> all later chips made by Renesas (either missing any logo or with
> Renesas' - no "R" letter there) were ok.
> 
> The faulty chips were marked with Hitachi logo and were missing the "R"
> letter after the lot code. I think Hitachi fixed it in 1999 or so.
> I'm not sure if this bug could manifest itself when only one SCA channel
> was in use. The app note doesn't say a word about it, but I think I only
> experienced the problem (with an older card, not PC300) when both
> channels were simultaneously in use.

Looks like we've hit this bug! Here's a photo of the board to confirm
it's the bogus chip:

 http://people.sugarlabs.org/bernie/pc300too-photo.jpg


> It could also be some PLX PCI9050 oddity. Can you show me output of
> "lspci -vv" command (may be limited to the PC300 device)?

Attached as pc300too-lspci.out

> Also, please say something about the machine (CPU, motherboard). What
> speed are you trying to use it with?

Attached as pc300too-dmidecode.out

The most interesting thing is probably the dmesg output:


[   59.175900] bernie: stat=0x80, desc_address=ffffc900111003a8, port->chan=0
[   59.176639] bernie: cp=3b4, bp=1ef18, len=56, unused=12
[   67.159314] bernie: stat=0x80, desc_address=ffffc90011100390, port->chan=0
[   67.163214] bernie: cp=39c, bp=1e298, len=56, unused=12
[   68.425601] bernie: stat=0x80, desc_address=ffffc90011100390, port->chan=0
[   68.426123] bernie: cp=39c, bp=1e298, len=77, unused=12
[   70.312068] bernie: stat=0x80, desc_address=ffffc900111003b4, port->chan=0
[   70.314393] bernie: cp=3c0, bp=1f558, len=1504, unused=12

So it seems that sometimes the controller doesn't always clear the EOM
(0x80) status bit after transmitting a frame. Size and contents of the
packet doesn't seem to matter We're using a single T1 channel.

To obtain this debug output, I modified the driver as follows:

--- linux-2.6.36.orig/drivers/net/wan/hd64572.c	2010-10-20 16:30:22.000000000 -0400
+++ linux-2.6.36/drivers/net/wan/hd64572.c	2010-11-12 20:48:03.000000000 -0500
@@ -567,11 +567,20 @@ static netdev_tx_t sca_xmit(struct sk_bu
 	card_t *card = port->card;
 	pkt_desc __iomem *desc;
 	u32 buff, len;
+	uint8_t stat;
 
 	spin_lock_irq(&port->lock);
 
 	desc = desc_address(port, port->txin + 1, 1);
-	BUG_ON(readb(&desc->stat)); /* previous xmit should stop queue */
+
+	//BUG_ON(readb(&desc->stat)); /* previous xmit should stop queue */
+	stat = readb(&desc->stat); /* previous xmit should stop queue */
+	if (stat) {
+		printk(KERN_EMERG "bernie: stat=0x%02x, desc_address=%p, port->chan=%d\n", stat, desc, port->chan);
+		printk(KERN_EMERG "bernie: cp=%x, bp=%x, len=%d, unused=%x\n", readw(&desc->cp), readl(&desc->bp), readw(&desc->len), readb(&desc->unused));
+		printk(KERN_EMERG "bernie: %s TX(%i):", dev->name, skb->len);
+		debug_frame(skb);
+	}
 
 #ifdef DEBUG_PKT
 	printk(KERN_DEBUG "%s TX(%i):", dev->name, skb->len);


With this patch applied, our system doesn't crash any more and
communication works both ways with negligible packet loss.

Shall we submit a patch lowering the BUG_ON() to a KERN_ERR to report
the problem only once?

-- 
   // Bernie Innocenti - http://codewiz.org/
 \X/  Sugar Labs       - http://sugarlabs.org/

View attachment "pc300too-dmesg.out" of type "text/plain" (55749 bytes)

View attachment "pc300too-lspci.out" of type "text/plain" (21584 bytes)

View attachment "pc300too-dmidecode.out" of type "text/plain" (19738 bytes)

View attachment "pc300too-dmesg.out" of type "text/plain" (55749 bytes)

Download attachment "pc300too-photo.jpg" of type "image/jpeg" (170494 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ