lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Thu, 8 Sep 2011 10:43:24 -0700
From:	Simon Kirby <sim@...tway.ca>
To:	drbd-dev@...ts.linbit.com, linux-kernel@...r.kernel.org,
	xfs@....sgi.com
Subject: Re: [Drbd-dev] [3.1-rc4] XFS+DRBD hangs

On Thu, Sep 08, 2011 at 05:13:05PM +0200, Lars Ellenberg wrote:

> Sorry for double posting on drbd-dev, I managed to strip the other lists from Cc.
> 
> > We upgraded from 2.6.36 which seemed to have a page leak (file pages left
> > on the LRU) and so would eventually perform very poorly. 2.6.37 and
> > 2.6.38 seemed to have some unix socket issue that caused heartbeat to
> > wedge. Shall we enable lock debugging or something here?
> 
> That could help us understand that stack trace.
> 
> It looks like cpu 1 blocks in
> 
> > [ 1532.427149]  [<ffffffff8103d512>] ? try_to_wake_up+0xc2/0x270
> > [ 1532.427149]  <<EOE>>  <IRQ>  [<ffffffff8103d6cd>] default_wake_function+0xd/0x10
> 
> Which does not make sense to me at all.

Well, good news, I think.. I believe this may be related to
"PCI: Set PCI-E Max Payload Size on fabric", added by b03e7495a862b02829.
3.1-rc5 is running now with a patch to basically disable those changes,
and has been stable for 12 hours. It usually hung in a few minutes
before.

The XFS peoples say it was very likely not 58d84c4ee0389ddeb86238d5 which
is the only other thing that changed between these versions that seems to
be at all in the hang path.

Also, when the thing hangs, it stops pinging immediately, and with the
PCI-E max payload thing active, the device that raises a bus error is
actually the PCI-E to PCI-X bridge chip used to support the BCM5708 NICs,
so that all seems related.

Simon-
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ