linux-kernel - Re: raid1 with nbd member hangs MD on SLES10 and RHEL5

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <4671E85E.30100@steeleye.com>
Date:	Thu, 14 Jun 2007 21:16:14 -0400
From:	Paul Clements <paul.clements@...eleye.com>
To:	Mike Snitzer <snitzer@...il.com>
CC:	Bill Davidsen <davidsen@....com>, Neil Brown <neilb@...e.de>,
	linux-raid@...r.kernel.org, linux-kernel@...r.kernel.org,
	nbd-general@...ts.sourceforge.net,
	Herbert Xu <herbert@...dor.apana.org.au>
Subject: Re: raid1 with nbd member hangs MD on SLES10 and RHEL5

Mike Snitzer wrote:
> On 6/14/07, Paul Clements <paul.clements@...eleye.com> wrote:
>> Mike Snitzer wrote:
>>
>> > Here are the steps to reproduce reliably on SLES10 SP1:
>> > 1) establish a raid1 mirror (md0) using one local member (sdc1) and
>> > one remote member (nbd0)
>> > 2) power off the remote machine, whereby severing nbd0's connection
>> > 3) perform IO to the filesystem that is on the md0 device to enduce
>> > the MD layer to mark the nbd device as "faulty"
>> > 4) cat /proc/mdstat hangs, sysrq trace was collected
>>
>> That's working as designed. NBD works over TCP. You're going to have to
>> wait for TCP to time out before an error occurs. Until then I/O will 
>> hang.
> 
> With kernel.org 2.6.15.7 (uni-processor) I've not seen NBD hang in the
> kernel like I am with RHEL5 and SLES10.  This hang (tcp timeout) is
> indefinite oh RHEL5 and ~5min on SLES10.
> 
> Should/can I be playing with TCP timeout values?  Why was this not a
> concern with kernel.org 2.6.15.7; I was able to "feel" the nbd
> connection break immediately; no MD superblock update hangs, no
> longwinded (or indefinite) TCP timeout.

I don't know. I've never seen nbd immediately start returning I/O 
errors. Perhaps something was different about the configuration?
If the other other machine rebooted quickly, for instance, you'd get a 
connection reset, which would kill the nbd connection.

--
Paul
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/