linux-kernel - [PATCH 3.2 041/115] raid1: include bio_end_io_list in nr_queued to prevent freeze

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <lsq.1461711744.747390322@decadent.org.uk>
Date:	Wed, 27 Apr 2016 01:02:24 +0200
From:	Ben Hutchings <ben@...adent.org.uk>
To:	linux-kernel@...r.kernel.org, stable@...r.kernel.org
CC:	akpm@...ux-foundation.org, "Shaohua Li" <shli@...com>,
	"Nate Dailey" <nate.dailey@...atus.com>
Subject: [PATCH 3.2 041/115] raid1: include bio_end_io_list in nr_queued
 to prevent freeze_array hang

3.2.80-rc1 review patch.  If anyone has any objections, please let me know.

------------------

From: Nate Dailey <nate.dailey@...atus.com>

commit ccfc7bf1f09d6190ef86693ddc761d5fe3fa47cb upstream.

If raid1d is handling a mix of read and write errors, handle_read_error's
call to freeze_array can get stuck.

This can happen because, though the bio_end_io_list is initially drained,
writes can be added to it via handle_write_finished as the retry_list
is processed. These writes contribute to nr_pending but are not included
in nr_queued.

If a later entry on the retry_list triggers a call to handle_read_error,
freeze array hangs waiting for nr_pending == nr_queued+extra. The writes
on the bio_end_io_list aren't included in nr_queued so the condition will
never be satisfied.

To prevent the hang, include bio_end_io_list writes in nr_queued.

There's probably a better way to handle decrementing nr_queued, but this
seemed like the safest way to avoid breaking surrounding code.

I'm happy to supply the script I used to repro this hang.

Fixes: 55ce74d4bfe1b(md/raid1: ensure device failure recorded before write request returns.)
Signed-off-by: Nate Dailey <nate.dailey@...atus.com>
Signed-off-by: Shaohua Li <shli@...com>
Signed-off-by: Ben Hutchings <ben@...adent.org.uk>
---
 drivers/md/raid1.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -1976,6 +1976,7 @@ static void handle_write_finished(struct
 	if (fail) {
 		spin_lock_irq(&conf->device_lock);
 		list_add(&r1_bio->retry_list, &conf->bio_end_io_list);
+		conf->nr_queued++;
 		spin_unlock_irq(&conf->device_lock);
 		md_wakeup_thread(conf->mddev->thread);
 	} else {
@@ -2090,8 +2091,10 @@ static void raid1d(struct mddev *mddev)
 		LIST_HEAD(tmp);
 		spin_lock_irqsave(&conf->device_lock, flags);
 		if (!test_bit(MD_CHANGE_PENDING, &mddev->flags)) {
-			list_add(&tmp, &conf->bio_end_io_list);
-			list_del_init(&conf->bio_end_io_list);
+			while (!list_empty(&conf->bio_end_io_list)) {
+				list_move(conf->bio_end_io_list.prev, &tmp);
+				conf->nr_queued--;
+			}
 		}
 		spin_unlock_irqrestore(&conf->device_lock, flags);
 		while (!list_empty(&tmp)) {