linux-kernel - Re: raid1 with nbd member hangs MD on SLES10 and RHEL5

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4671AD7C.4010109@tmr.com>
Date:	Thu, 14 Jun 2007 17:05:00 -0400
From:	Bill Davidsen <davidsen@....com>
To:	Mike Snitzer <snitzer@...il.com>
CC:	Neil Brown <neilb@...e.de>, linux-raid@...r.kernel.org,
	linux-kernel@...r.kernel.org, nbd-general@...ts.sourceforge.net,
	Herbert Xu <herbert@...dor.apana.org.au>,
	Paul Clements <Paul.Clements@...eleye.com>
Subject: Re: raid1 with nbd member hangs MD on SLES10 and RHEL5

Mike Snitzer wrote:
> On 6/13/07, Mike Snitzer <snitzer@...il.com> wrote:
>> On 6/13/07, Mike Snitzer <snitzer@...il.com> wrote:
>> > On 6/12/07, Neil Brown <neilb@...e.de> wrote:
>> ...
>> > > > > On 6/12/07, Neil Brown <neilb@...e.de> wrote:
>> > > > > > On Tuesday June 12, snitzer@...il.com wrote:
>> > > > > > >
>> > > > > > > I can provided more detailed information; please just ask.
>> > > > > > >
>> > > > > >
>> > > > > > A complete sysrq trace (all processes) might help.
>>
>> Bringing this back to a wider audience.  I provided the full sysrq
>> trace of the RHEL5 kernel to Neil; in it we saw that md0_raid1 had the
>> following trace:
>>
>> md0_raid1     D ffff810026183ce0  5368 31663     11          3822 
>> 29488 (L-TLB)
>>  ffff810026183ce0 ffff810031e9b5f8 0000000000000008 000000000000000a
>>  ffff810037eef040 ffff810037e17100 00043e64d2983c1f 0000000000004c7f
>>  ffff810037eef210 0000000100000001 000000081c506640 00000000ffffffff
>> Call Trace:
>>  [<ffffffff8003e371>] keventd_create_kthread+0x0/0x61
>>  [<ffffffff801b9364>] md_super_wait+0xa8/0xbc
>>  [<ffffffff8003e711>] autoremove_wake_function+0x0/0x2e
>>  [<ffffffff801b9adb>] md_update_sb+0x1dd/0x23a
>>  [<ffffffff801bed2a>] md_check_recovery+0x15f/0x449
>>  [<ffffffff882a1af3>] :raid1:raid1d+0x27/0xc1e
>>  [<ffffffff80233209>] thread_return+0x0/0xde
>>  [<ffffffff8023279c>] __sched_text_start+0xc/0xa79
>>  [<ffffffff8003e371>] keventd_create_kthread+0x0/0x61
>>  [<ffffffff80233a9f>] schedule_timeout+0x1e/0xad
>>  [<ffffffff8003e371>] keventd_create_kthread+0x0/0x61
>>  [<ffffffff801bd06c>] md_thread+0xf8/0x10e
>>  [<ffffffff8003e711>] autoremove_wake_function+0x0/0x2e
>>  [<ffffffff801bcf74>] md_thread+0x0/0x10e
>>  [<ffffffff8003e5e7>] kthread+0xd4/0x109
>>  [<ffffffff8000a505>] child_rip+0xa/0x11
>>  [<ffffffff8003e371>] keventd_create_kthread+0x0/0x61
>>  [<ffffffff8003e513>] kthread+0x0/0x109
>>  [<ffffffff8000a4fb>] child_rip+0x0/0x11
>>
>> To which Neil had the following to say:
>>
>> > > md0_raid1 is holding the lock on the array and trying to write 
>> out the
>> > > superblocks for some reason, and the write isn't completing.
>> > > As it is holding the locks, mdadm and /proc/mdstat are hanging.
> ...
>
>> > We're using MD+NBD for disaster recovery (one local scsi device, one
>> > remote via nbd).  The nbd-server is not contributing to md0.  The
>> > nbd-server is connected to a remote machine that is running a raid1
>> > remotely
>>
>> To take this further I've now collected a full sysrq trace of this
>> hang on a SLES10 SP1 RC5 2.6.16.46-0.12-smp kernel, the relevant
>> md0_raid1 trace is comparable to the RHEL5 trace from above:
>>
>> md0_raid1     D ffff810001089780     0  8583     51          8952  
>> 8260 (L-TLB)
>> ffff810812393ca8 0000000000000046 ffff8107b7fbac00 000000000000000a
>>        ffff81081f3c6a18 ffff81081f3c67d0 ffff8104ffe8f100 
>> 000044819ddcd5e2
>>        000000000000eb8b 00000007028009c7
>> Call Trace: <ffffffff801e1f94>{generic_make_request+501}
>>        <ffffffff8026946c>{md_super_wait+168}
>> <ffffffff80145aa2>{autoremove_wake_function+0}
>>        <ffffffff8026f056>{write_page+128} 
>> <ffffffff80269ac7>{md_update_sb+220}
>>        <ffffffff8026bda5>{md_check_recovery+361}
>> <ffffffff883a97f5>{:raid1:raid1d+38}
>>        <ffffffff8013ad8f>{lock_timer_base+27}
>> <ffffffff8013ae01>{try_to_del_timer_sync+81}
>>        <ffffffff8013ae16>{del_timer_sync+12}
>> <ffffffff802d9adf>{schedule_timeout+146}
>>        <ffffffff801456a9>{keventd_create_kthread+0}
>> <ffffffff8026d5d8>{md_thread+248}
>>        <ffffffff80145aa2>{autoremove_wake_function+0}
>> <ffffffff8026d4e0>{md_thread+0}
>>        <ffffffff80145965>{kthread+236} <ffffffff8010bdce>{child_rip+8}
>>        <ffffffff801456a9>{keventd_create_kthread+0}
>> <ffffffff80145879>{kthread+0}
>>        <ffffffff8010bdc6>{child_rip+0}
>>
>> Taking a step back, here is what was done to reproduce on SLES10:
>> 1) establish a raid1 mirror (md0) using one local member (sdc1) and
>> one remote member (nbd0)
>> 2) power off the remote machine, whereby severing nbd0's connection
>> 3) perform IO to the filesystem that is on the md0 device to enduce
>> the MD layer to mark the nbd device as "faulty"
>> 4) cat /proc/mdstat hangs, sysrq trace was collected and showed the
>> above md0_raid1 trace.
>>
>> To be clear, the MD superblock update hangs indefinitely on RHEL5.
>> But with SLES10 it eventually succeeds (and MD marks the nbd0 member
>> faulty); and the other tasks that were blocking waiting for the MD
>> lock (e.g. 'cat /proc/mdstat') then complete immediately.
>>
>> It should be noted that this MD+NBD configuration has worked
>> flawlessly using a stock kernel.org 2.6.15.7 kernel (ontop of a
>> RHEL4U4 distro).  Steps have not been taken to try to reproduce  with
>> 2.6.15.7 on SLES10; it may be useful to pursue but I'll defer to
>> others to suggest I do so.
>>
>> 2.6.15.7 does not have the SMP race fixes that were made in 2.6.16;
>> yet both SLES10 and RHEL5 kernels do:
>> http://www.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=4b2f0260c74324abca76ccaa42d426af163125e7 
>>
>>
>> If not this specific NBD change, something appears to have changed
>> with how NBD behaves in the face of it's connection to the server
>> being lost.  Almost like the MD superblock update that would be
>> written to nbd0 is blocking within nbd or the network layer because of
>> a network timeout issue?
>
> Just a quick update; it is really starting to look like there is
> definitely an issue with the nbd kernel driver.  I booted the SLES10
> 2.6.16.46-0.12-smp kernel with maxcpus=1 to test the theory that the
> nbd SMP fix that went into 2.6.16 was in some way causing this MD/NBD
> hang.  But it _still_ occurs with the 4-step process I outlined above.
>
First, running an smp kernel with maxcpus=1 is not the same as running a 
uni kernel, not is nosmp option. The code is different.

Second, AFAIK nbd hasn't working in a while. I haven't tried it in ages, 
but was told it wouldn't work with smp and I kind of lost interest. If 
Neil thinks it should work in 2.6.21 or later I'll test it, since I have 
a machine which wants a fresh install soon, and is both backed up and 
available.
> The nbd0 device _should_ feel an NBD_DISCONNECT because the nbd-server
> is no longer running (the node it was running on was powered off)...
> however the nbd-client is still connected to the kernel (meaning the
> kernel didn't return an error back to userspace).
> Also, MD is still blocking waiting to write the superblock (presumably
> to nbd0). 

-- 
bill davidsen <davidsen@....com>
  CTO TMR Associates, Inc
  Doing interesting things with small computers since 1979

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/