[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <170fa0d20706141810x39cf0c48v645a8292f84a9eb7@mail.gmail.com>
Date: Thu, 14 Jun 2007 21:10:46 -0400
From: "Mike Snitzer" <snitzer@...il.com>
To: "Paul Clements" <paul.clements@...eleye.com>
Cc: "Bill Davidsen" <davidsen@....com>, "Neil Brown" <neilb@...e.de>,
linux-raid@...r.kernel.org, linux-kernel@...r.kernel.org,
nbd-general@...ts.sourceforge.net,
"Herbert Xu" <herbert@...dor.apana.org.au>
Subject: Re: raid1 with nbd member hangs MD on SLES10 and RHEL5
On 6/14/07, Paul Clements <paul.clements@...eleye.com> wrote:
> Mike Snitzer wrote:
>
> > Here are the steps to reproduce reliably on SLES10 SP1:
> > 1) establish a raid1 mirror (md0) using one local member (sdc1) and
> > one remote member (nbd0)
> > 2) power off the remote machine, whereby severing nbd0's connection
> > 3) perform IO to the filesystem that is on the md0 device to enduce
> > the MD layer to mark the nbd device as "faulty"
> > 4) cat /proc/mdstat hangs, sysrq trace was collected
>
> That's working as designed. NBD works over TCP. You're going to have to
> wait for TCP to time out before an error occurs. Until then I/O will hang.
With kernel.org 2.6.15.7 (uni-processor) I've not seen NBD hang in the
kernel like I am with RHEL5 and SLES10. This hang (tcp timeout) is
indefinite oh RHEL5 and ~5min on SLES10.
Should/can I be playing with TCP timeout values? Why was this not a
concern with kernel.org 2.6.15.7; I was able to "feel" the nbd
connection break immediately; no MD superblock update hangs, no
longwinded (or indefinite) TCP timeout.
regards,
Mike
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists