lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Tue, 16 Jan 2007 17:42:19 +0900
From:	Kenzo Iwami <k-iwami@...jp.nec.com>
To:	Auke Kok <auke-jan.h.kok@...el.com>
CC:	Jesse Brandeburg <jesse.brandeburg@...el.com>,
	"Ronciak, John" <john.ronciak@...el.com>,
	Shaw Vrana <shaw@...nix.com>, netdev@...r.kernel.org
Subject: Re: watchdog timeout panic in e1000 driver

Hi,

Thank you for your comment.

> thanks for staying patient while most of us were out or busy. Apart from acknowledging 
> that you might have fixed a problem with your patch, we're very reluctant to merge such 
> a huge change in our driver that touches much more cases then the one that seems to be 
> giving you problems.
> 
> I've thought up a much more elegant solution that prevents the driver from asserting the 
> swfw semaphore during normal operations by checking the mac LU (link up) register in the 
> watchdog. This allows the watchdog task to bypass all PHY checking in case all link 
> statuses are OK, and thus removes the big problem that you are seeing.
> 
> Attached a version that should apply against most current trees. Please give it a try 
> and let us know if this also fixes the problem for you. I will most likely push this 
> patch to the netdev tree in any case.

I tried your patch. Unfortunately, the system still panicked with the
same symptom.

In your patch, e1000_update_stats() is still called by e1000_watchdog().
And, e1000_update_stats() calls e1000_read_phy_reg().
Therefore, interrupt handler tires to acquire the semaphore.
As a result, the same problem still occurs.

To fix this problem, interrupt handler must not call e1000_read_phy_reg()
while the interrupted code is holding the semaphore.

My patch may seem like a huge change, but in essence the change is
pretty simple.

In my patch, the interrupt handler code will check whether the interrupted
code is holding the swfw semaphore. If it is held, the watchdog function
is deferred until swfw semaphore is released.
The modification is for the interrupted code which is holding the
semaphore, and the interrupt handler, so they are both directly related
to this problem.

I will try to add some comments to my code to make it more readable.
--
  Kenzo Iwami (k-iwami@...jp.nec.com)

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ