lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20070212000042.M73586@liquid-nexus.net>
Date:	Mon, 12 Feb 2007 08:03:57 +0800
From:	"Marc Marais" <marcm@...uid-nexus.net>
To:	Neil Brown <neilb@...e.de>
Cc:	linux-raid@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: md: md6_raid5 crash 2.6.20

On Mon, 12 Feb 2007 09:02:33 +1100, Neil Brown wrote
> On Sunday February 11, marcm@...uid-nexus.net wrote:
> > Greetings,
> > 
> > I've been running md on my server for some time now and a few days ago one of
> > the (3) drives in the raid5 array starting giving read errors. The result was
> > usually system hangs and this was with kernel 2.6.17.13. I upgraded to the
> > latest production 2.6.20 kernel and experienced the same behaviour.
> 
> System hangs suggest a problem with the drive controller.  However
> this "kernel BUG" is something newly introduced in 2.6.20 which 
> should be fixed in 2.6.20.1.  Patch is below.
> 
> If you still get hangs with this patch installed, then please report
> detail, and probably copy to linux-ide@...r.kernel.org.
> 
> NeilBrown
> 
> Fix various bugs with aligned reads in RAID5.
> 
> It is possible for raid5 to be sent a bio that is too big
> for an underlying device.  So if it is a READ that we
> pass stright down to a device, it will fail and confuse
> RAID5.
> 
> So in 'chunk_aligned_read' we check that the bio fits within the
> parameters for the target device and if it doesn't fit, fall back
> on reading through the stripe cache and making lots of one-page
> requests.
> 
> Note that this is the earliest time we can check against the device
> because earlier we don't have a lock on the device, so it could 
> change underneath us.
> 
> Also, the code for handling a retry through the cache when a read
> fails has not been tested and was badly broken.  This patch fixes 
> that code.
> 
> Signed-off-by: Neil Brown <neilb@...e.de>
> 

Thanks for the quick response Neil unfortunately the kernel doesn't build with
this patch due to a missing symbol:

WARNING: "blk_recount_segments" [drivers/md/raid456.ko] undefined!

Is that in another file that needs patching or within raid5.c?

Marc

--
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ