lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <513FCBD3.4@am.sony.com>
Date:	Tue, 12 Mar 2013 17:44:03 -0700
From:	Frank Rowand <frank.rowand@...sony.com>
To:	Sebastian Andrzej Siewior <bigeasy@...utronix.de>
CC:	Thomas Gleixner <tglx@...utronix.de>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"linux-rt-users@...r.kernel.org" <linux-rt-users@...r.kernel.org>
Subject: Re: linux-3.6.11-rt30 smoke test on ARM

On 03/11/13 10:34, Sebastian Andrzej Siewior wrote:
> * Frank Rowand | 2013-03-07 20:03:18 [-0800]:
> 
>> panda boot often fails due to a usb timeout, while sending a command on
>> behalf of the smsc95xx ethernet driver.
>>
>> This patch is a temporary hack to force a retry when the timeout occurs.
> 
> It looks like you overrun the chip for some reason. Can you reproduce it
> on mainline? They added a few delayes on register read() it might do the
> trick.

Yes, I can reproduce it on mainline.

Here is the current state of my debugging:

The problem usually occurs within three boot attempts.  But it has also
taken eight boot attempts to see the problem.  I do not know what the
maximum number of boots is required to see the problem, so I can not
state with certainty that a given kernel version does not have the
problem.  If the boot fails then I can state with certainty that the
given kernel version has the problem.

Given that level of uncertainty, I know:

  v3.5      does not appear to have the problem
  v3.6-rc1  has the problem
  v3.6      has the problem
  v3.7      has the problem
  v3.8      does not appear to have the problem
  v3.9-rc1  fails to build

I thought I had bisected the problem to a specific commit, but wanting
to be sure of it, I did extra boots of what should have been the last
good commit.  On the 7th boot, that kernel version had the problem.

I'll probably redo the bisect, but have not had time to do so yet.

The problem manifests as a timeout from at least two different locations
in drivers/net/usb/smsc95xx.c:


 656 static int smsc95xx_set_mac_address(struct usbnet *dev)
 657 {
 ...
 663         ret = smsc95xx_write_reg(dev, ADDRL, addr_lo);
 664         if (ret < 0) {
 665                 netdev_warn(dev->net, "Failed to write ADDRL: %d\n", ret);
 666                 return ret;
 667         }


751 static int smsc95xx_reset(struct usbnet *dev)
 752 {
 ...
 783         write_buf = PM_CTL_PHY_RST_;
 784         ret = smsc95xx_write_reg(dev, PM_CTRL, write_buf);
 785         if (ret < 0) {
 786                 netdev_warn(dev->net, "Failed to write PM_CTRL: %d\n", ret);
 787                 return ret;
 788         }


Some of the other smsc95xx_write_reg() calls in smsc95xx_reset() are protected with
checks for timeout, with up to 100 retries.  I do not know if this one should have
the same protection.

-Frank

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ