[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4671E85E.30100@steeleye.com>
Date: Thu, 14 Jun 2007 21:16:14 -0400
From: Paul Clements <paul.clements@...eleye.com>
To: Mike Snitzer <snitzer@...il.com>
CC: Bill Davidsen <davidsen@....com>, Neil Brown <neilb@...e.de>,
linux-raid@...r.kernel.org, linux-kernel@...r.kernel.org,
nbd-general@...ts.sourceforge.net,
Herbert Xu <herbert@...dor.apana.org.au>
Subject: Re: raid1 with nbd member hangs MD on SLES10 and RHEL5
Mike Snitzer wrote:
> On 6/14/07, Paul Clements <paul.clements@...eleye.com> wrote:
>> Mike Snitzer wrote:
>>
>> > Here are the steps to reproduce reliably on SLES10 SP1:
>> > 1) establish a raid1 mirror (md0) using one local member (sdc1) and
>> > one remote member (nbd0)
>> > 2) power off the remote machine, whereby severing nbd0's connection
>> > 3) perform IO to the filesystem that is on the md0 device to enduce
>> > the MD layer to mark the nbd device as "faulty"
>> > 4) cat /proc/mdstat hangs, sysrq trace was collected
>>
>> That's working as designed. NBD works over TCP. You're going to have to
>> wait for TCP to time out before an error occurs. Until then I/O will
>> hang.
>
> With kernel.org 2.6.15.7 (uni-processor) I've not seen NBD hang in the
> kernel like I am with RHEL5 and SLES10. This hang (tcp timeout) is
> indefinite oh RHEL5 and ~5min on SLES10.
>
> Should/can I be playing with TCP timeout values? Why was this not a
> concern with kernel.org 2.6.15.7; I was able to "feel" the nbd
> connection break immediately; no MD superblock update hangs, no
> longwinded (or indefinite) TCP timeout.
I don't know. I've never seen nbd immediately start returning I/O
errors. Perhaps something was different about the configuration?
If the other other machine rebooted quickly, for instance, you'd get a
connection reset, which would kill the nbd connection.
--
Paul
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists