lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Mon, 05 Nov 2007 09:36:35 +0100
From:	BERTRAND Joël <joel.bertrand@...tella.fr>
To:	Neil Brown <neilb@...e.de>
CC:	Justin Piszcz <jpiszcz@...idpixels.com>,
	linux-kernel@...r.kernel.org, linux-raid@...r.kernel.org
Subject: Re: 2.6.23.1: mdadm/raid5 hung/d-state

Neil Brown wrote:
> On Sunday November 4, jpiszcz@...idpixels.com wrote:
>> # ps auxww | grep D
>> USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
>> root       273  0.0  0.0      0     0 ?        D    Oct21  14:40 [pdflush]
>> root       274  0.0  0.0      0     0 ?        D    Oct21  13:00 [pdflush]
>>
>> After several days/weeks, this is the second time this has happened, while 
>> doing regular file I/O (decompressing a file), everything on the device 
>> went into D-state.
> 
> At a guess (I haven't looked closely) I'd say it is the bug that was
> meant to be fixed by
> 
> commit 4ae3f847e49e3787eca91bced31f8fd328d50496
> 
> except that patch applied badly and needed to be fixed with
> the following patch (not in git yet).
> These have been sent to stable@ and should be in the queue for 2.6.23.2

	My linux-2.6.23/drivers/md/raid5.c contains your patch for a long time :

...
         spin_lock(&sh->lock);
         clear_bit(STRIPE_HANDLE, &sh->state);
         clear_bit(STRIPE_DELAYED, &sh->state);

         s.syncing = test_bit(STRIPE_SYNCING, &sh->state);
         s.expanding = test_bit(STRIPE_EXPAND_SOURCE, &sh->state);
         s.expanded = test_bit(STRIPE_EXPAND_READY, &sh->state);
         /* Now to look around and see what can be done */

         /* clean-up completed biofill operations */
         if (test_bit(STRIPE_OP_BIOFILL, &sh->ops.complete)) {
                 clear_bit(STRIPE_OP_BIOFILL, &sh->ops.pending);
                 clear_bit(STRIPE_OP_BIOFILL, &sh->ops.ack);
                 clear_bit(STRIPE_OP_BIOFILL, &sh->ops.complete);
         }

         rcu_read_lock();
         for (i=disks; i--; ) {
                 mdk_rdev_t *rdev;
                 struct r5dev *dev = &sh->dev[i];
...

but it doesn't fix this bug.

	Regards,

	JKB
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ