linux-kernel - Re: raid1 with nbd member hangs MD on SLES10 and RHEL5

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <170fa0d20706141801u6d6effd9ub362f3ae397f3d32@mail.gmail.com>
Date:	Thu, 14 Jun 2007 21:01:19 -0400
From:	"Mike Snitzer" <snitzer@...il.com>
To:	"Paul Clements" <paul.clements@...eleye.com>
Cc:	"Bill Davidsen" <davidsen@....com>, "Neil Brown" <neilb@...e.de>,
	linux-raid@...r.kernel.org, linux-kernel@...r.kernel.org,
	nbd-general@...ts.sourceforge.net,
	"Herbert Xu" <herbert@...dor.apana.org.au>
Subject: Re: raid1 with nbd member hangs MD on SLES10 and RHEL5

On 6/14/07, Paul Clements <paul.clements@...eleye.com> wrote:
> Bill Davidsen wrote:
>
> > Second, AFAIK nbd hasn't working in a while. I haven't tried it in ages,
> > but was told it wouldn't work with smp and I kind of lost interest. If
> > Neil thinks it should work in 2.6.21 or later I'll test it, since I have
> > a machine which wants a fresh install soon, and is both backed up and
> > available.
>
> Please stop this. nbd is working perfectly fine, AFAIK. I use it every
> day, and so do 100s of our customers. What exactly is it that not's
> working? If there's a problem, please send the bug report.

Paul,

This thread details what I've experienced using MD (raid1) with 2
devices; one being a local scsi device and the other is an NBD device.
 I've yet to put effort to pinpointing the problem in a kernel.org
kernel; however both SLES10 and RHEL5 kernels appear to be hanging in
either 1) nbd or 2) the socket layer.

Here are the steps to reproduce reliably on SLES10 SP1:
1) establish a raid1 mirror (md0) using one local member (sdc1) and
one remote member (nbd0)
2) power off the remote machine, whereby severing nbd0's connection
3) perform IO to the filesystem that is on the md0 device to enduce
the MD layer to mark the nbd device as "faulty"
4) cat /proc/mdstat hangs, sysrq trace was collected

To be clear, the MD superblock update hangs indefinitely on RHEL5.
But with SLES10 it eventually succeeds after ~5min (and MD marks the
nbd0 member faulty); and the other tasks that were blocking waiting
for the MD lock (e.g. 'cat /proc/mdstat') then complete immediately.

If you look back in this thread you'll see traces for md0_raid1 for
both SLES10 and RHEL5.  I hope to try to reproduce this issue on
kernel.org 2.6.16.46 (the basis for SLES10).  If I can I'll then git
bisect back to try to pinpoint the regression; I obviously need to
verify that 2.6.16 works in this situation on SMP.

Mike
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/