lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <54BF315F.5090709@hitachi.com>
Date:	Wed, 21 Jan 2015 13:55:59 +0900
From:	Hidehiro Kawai <hidehiro.kawai.ez@...achi.com>
To:	Jeff Moyer <jmoyer@...hat.com>
Cc:	axboe@...nel.dk, linux-kernel@...r.kernel.org
Subject: Re: [patch] cfq-iosched: fix incorrect filing of rt async cfqq

Hi,

Thank you for the patch, Jeff!

(2015/01/13 5:21), Jeff Moyer wrote:
> Hi,
>
> If you can manage to submit an async write as the first async I/O from
> the context of a process with realtime scheduling priority, then a
> cfq_queue is allocated, but filed into the wrong async_cfqq bucket.  It
> ends up in the best effort array, but actually has realtime I/O
> scheduling priority set in cfqq->ioprio.

Actually, this bug causes a severe I/O starvation in the following scenario
(we've experienced):

1. RT demon issues an O_SYNC write to an ext2 filesystem early in the
   boot sequence (this is an example which causes the first async write
   to a device in an RT process context)

2. Do heavy buffered writes to the device (these writes will be
   treated as RT class)

3. Issue a sync I/O to the device, and it will take over tens seconds
   to be prevented by many RT class I/Os

I confirmed Jeff's patch solves the above problem.

Tested-by: Hidehiro Kawai <hidehiro.kawai.ez@...achi.com>

>
> The reason is that cfq_get_queue assumes the default scheduling class and
> priority when there is no information present (i.e. when the async cfqq
> is created):
>
> static struct cfq_queue *
> cfq_get_queue(struct cfq_data *cfqd, bool is_sync, struct cfq_io_cq *cic,
> 	      struct bio *bio, gfp_t gfp_mask)
> {
> 	const int ioprio_class = IOPRIO_PRIO_CLASS(cic->ioprio);
> 	const int ioprio = IOPRIO_PRIO_DATA(cic->ioprio);
>
> cic->ioprio starts out as 0, which is "invalid".  So, class of 0
> (IOPRIO_CLASS_NONE) is passed to cfq_async_queue_prio like so:
>
> 		async_cfqq = cfq_async_queue_prio(cfqd, ioprio_class, ioprio);
>
> static struct cfq_queue **
> cfq_async_queue_prio(struct cfq_data *cfqd, int ioprio_class, int ioprio)
> {
>         switch (ioprio_class) {
>         case IOPRIO_CLASS_RT:
>                 return &cfqd->async_cfqq[0][ioprio];
>         case IOPRIO_CLASS_NONE:
>                 ioprio = IOPRIO_NORM;
>                 /* fall through */
>         case IOPRIO_CLASS_BE:
>                 return &cfqd->async_cfqq[1][ioprio];
>         case IOPRIO_CLASS_IDLE:
>                 return &cfqd->async_idle_cfqq;
>         default:
>                 BUG();
>         }
> }
>
> Here, instead of returning a class mapped from the process' scheduling
> priority, we get back the bucket associated with IOPRIO_CLASS_BE.
>
> Now, there is no queue allocated there yet, so we create it:
>
> 		cfqq = cfq_find_alloc_queue(cfqd, is_sync, cic, bio, gfp_mask);
>
> That function ends up doing this:
>
> 			cfq_init_cfqq(cfqd, cfqq, current->pid, is_sync);
> 			cfq_init_prio_data(cfqq, cic);
>
> cfq_init_cfqq marks the priority as having changed.  Then, cfq_init_prio
> data does this:
>
> 	ioprio_class = IOPRIO_PRIO_CLASS(cic->ioprio);
> 	switch (ioprio_class) {
> 	default:
> 		printk(KERN_ERR "cfq: bad prio %x\n", ioprio_class);
> 	case IOPRIO_CLASS_NONE:
> 		/*
> 		 * no prio set, inherit CPU scheduling settings
> 		 */
> 		cfqq->ioprio = task_nice_ioprio(tsk);
> 		cfqq->ioprio_class = task_nice_ioclass(tsk);
> 		break;
>
> So we basically have two code paths that treat IOPRIO_CLASS_NONE
> differently, which results in an RT async cfqq filed into a best effort
> bucket.
>
> Attached is a patch which fixes the problem.  I'm not sure how to make
> it cleaner.  Suggestions would be welcome.
>
> Signed-off-by: Jeff Moyer <jmoyer@...hat.com>
>
> diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
> index 6f2751d..b9abdca 100644
> --- a/block/cfq-iosched.c
> +++ b/block/cfq-iosched.c
> @@ -3656,12 +3656,17 @@ static struct cfq_queue *
>  cfq_get_queue(struct cfq_data *cfqd, bool is_sync, struct cfq_io_cq *cic,
>  	      struct bio *bio, gfp_t gfp_mask)
>  {
> -	const int ioprio_class = IOPRIO_PRIO_CLASS(cic->ioprio);
> -	const int ioprio = IOPRIO_PRIO_DATA(cic->ioprio);
> +	int ioprio_class = IOPRIO_PRIO_CLASS(cic->ioprio);
> +	int ioprio = IOPRIO_PRIO_DATA(cic->ioprio);
>  	struct cfq_queue **async_cfqq = NULL;
>  	struct cfq_queue *cfqq = NULL;
>  
>  	if (!is_sync) {
> +		if (!ioprio_valid(cic->ioprio)) {
> +			struct task_struct *tsk = current;
> +			ioprio = task_nice_ioprio(tsk);
> +			ioprio_class = task_nice_ioclass(tsk);
> +		}
>  		async_cfqq = cfq_async_queue_prio(cfqd, ioprio_class, ioprio);
>  		cfqq = *async_cfqq;
>  	}
>
>


--
Hidehiro Kawai
Hitachi, Yokohama Research Laboratory

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ