linux-kernel - Re: Multi-partition block layer behaviour

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Mon, 31 Oct 2011 13:39:24 +0800
From:	Shaohua Li <shaohua.li@...el.com>
To:	Tiju Jacob <jacobtiju@...il.com>
Cc:	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: Multi-partition block layer behaviour

On Mon, 2011-10-31 at 12:35 +0800, Tiju Jacob wrote:
> On Thu, Oct 27, 2011 at 6:12 AM, Shaohua Li <shaohua.li@...el.com> wrote:
> > On Wed, 2011-10-26 at 18:10 +0800, Tiju Jacob wrote:
> >> >> 1. When an I/O request is made to the filesystem, process 'A' acquires
> >> >> a mutex FS lock and a mutex block driver lock.
> >> >>
> >> >> 2. Process 'B' tries to acquire the mutex FS lock, which is not
> >> >> available. Hence, it goes to sleep. Due to the new plugging mechanism,
> >> >> before going to sleep, shcedule() is invoked which disables preemption
> >> >> and the context becomes atomic. In schedule(), the newly added
> >> >> blk_flush_plug_list() is invoked which unplugs the block driver.
> >> >>
> >> >> 3) During unplug operation the block driver tries to acquire the mutex
> >> >> lock which fails, because the lock was held by process 'A'. Previous
> >> >> invocation of scheudle() in step 2 has already made the context as
> >> >> atomic, hence the error "Schedule while atomic" occured.
> >> > if blk_flush_plug_list() is called in schedule(), it will use
> >> > blk_run_queue_async
> >> > to unplug the queue. This runs in a workqueue. So how could this happen?
> >> >
> >>
> >> The call stack goes as follows:
> >>
> >> From schedule() it calls blk_schedule_flush_plug()  and
> >> blk_flush_plug_list() gets invoked.
> >>
> >> In blk_flush_plug_list() queue_unplugged() does not get invoked. Hence
> >>  blk_run_queue_async is not called.
> >> Instead __elv_add_request() is invoked with ELEVATOR_INSERT_SORT_MERGE
> >> flag and the flag gets reassigned to ELEVATOR_INSERT_BACK.
> >>
> >> In ELEVATOR_INSERT_BACK, __blk_run_queue() gets invoked and calls request_fn().
> 
> > This doesn't make sense. why the flag is changed from
> > ELEVATOR_INSERT_SORT_MERGE to ELEVATOR_INSERT_BACK?
> 
> In  __elv_add_request() "where" gets reassigned as follows:
> 
> 	} else if (!(rq->cmd_flags & REQ_ELVPRIV) &&
> 		    (where == ELEVATOR_INSERT_SORT ||
> 		     where == ELEVATOR_INSERT_SORT_MERGE))
> 		where = ELEVATOR_INSERT_BACK;
> 
ok, thanks. So this means the elevator is switching in the test. How
about below patch:

diff --git a/block/elevator.c b/block/elevator.c
index a3b64bc..e14824a 100644
--- a/block/elevator.c
+++ b/block/elevator.c
@@ -683,8 +683,13 @@ void __elv_add_request(struct request_queue *q, struct request *rq, int where)
 		 * - Usually, back inserted requests won't be merged
 		 *   with anything.  There's no point in delaying queue
 		 *   processing.
+		 * If elevator is switching, doesn't need run the queue.
+		 * elevator switching will run it anyway. And this could
+		 * cause warning since the code might run in atomic
+		 * environment(blk_flush_plug_list() callbed in schedule())
 		 */
-		__blk_run_queue(q);
+		if (!test_bit(QUEUE_FLAG_ELVSWITCH, &q->queue_flags))
+			__blk_run_queue(q);
 		break;
 
 	case ELEVATOR_INSERT_SORT_MERGE:


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/