lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5638E597.4090109@kernel.dk>
Date:	Tue, 3 Nov 2015 09:49:27 -0700
From:	Jens Axboe <axboe@...nel.dk>
To:	Jan Kara <jack@...e.cz>
Cc:	LKML <linux-kernel@...r.kernel.org>, Jeff Moyer <jmoyer@...hat.com>
Subject: Re: Flush requests not going through IO scheduler

On 11/03/2015 09:41 AM, Jan Kara wrote:
> On Mon 02-11-15 09:58:01, Jens Axboe wrote:
>> On 11/02/2015 05:20 AM, Jan Kara wrote:
>>> Hello,
>>>
>>> when looking into a performance issue, I've noticed one interesting thing
>>> in blktrace data:
>>>
>>>    8,0    2        0     1.745149746     0  m   N cfq320SN / dispatch_insert
>>>    8,0    2        0     1.745150258     0  m   N cfq320SN / dispatched a request
>>>    8,0    2        0     1.745150524     0  m   N cfq320SN / activate rq, drv=10
>>>    8,0    2     2893     1.745150644 30477  D  WS 495331192 + 192 [git]
>>>    8,0    1     3678     1.746851310     0  C  WS 495331192 + 192 [0]
>>>
>>> We wrote the data for transaction commit here.
>>>
>>>    8,0    1        0     1.746863220     0  m   N cfq320SN / complete rqnoidle 1
>>>    8,0    1        0     1.746863801     0  m   N cfq320SN / set_slice=27
>>>    8,0    1        0     1.746864439     0  m   N cfq320SN / arm_idle: 8 group_idle: 0
>>>
>>> Currently there is no IO queued from jbd2 thread so idle...
>>>
>>>    8,0    1     3679     1.746878424   320  A FWFS 495331384 + 8 <- (8,2) 478543928
>>>    8,0    1     3680     1.746879028   320  Q FWFS 495331384 + 8 [jbd2/sda2-8]
>>>    8,0    1     3681     1.746879673   320  G FWFS 495331384 + 8 [jbd2/sda2-8]
>>>    8,0    1     3682     1.746880227   320  I FWFS 495331384 + 8 [jbd2/sda2-8]
>>>
>>> Jbd2 thread now queues the commit block.
>>>
>>>    8,0    1        0     1.754263523     0  m   N cfq idle timer fired
>>>    8,0    1        0     1.754264733     0  m   N cfq320SN / slice expired t=0
>>>
>>> But it was not dispatched and we just idled until timer fired. Then we
>>> started dispatching for other queue and got to dispatching the commit block
>>> only much later.
>>>
>>> I've looked into the block layer code and the reason for this behavior
>>> (idling when there is in fact IO to dispatch) is the special handling of
>>> flush requests. When a flush request is submitted, we insert it with
>>> ELEVATOR_INSERT_FLUSH and blk_insert_flush() then handles it. That
>>> eventually just ends up doing something along the lines of:
>>>
>>> 	list_add_tail(&rq->queuelist, &q->queue_head);
>>>
>>> So we add request to the list of requests to dispatch but we don't notify
>>> IO scheduler in any way. Thus IO scheduler won't properly track the
>>> request, won't properly account IO time for it if I'm right etc...
>>>
>>> Ideally we should call q->elevator->type->ops.elevator_add_req_fn() to
>>> handle the request but I'm not sure it won't break some assumptions of the
>>> flush code. But at minimum shouldn't we at least try to dispatch the
>>> request?
>>
>> Certainly, the current behavior is undoubtedly broken. The least
>> intrusive fix would be to kick off scheduling when we add it to the
>> request, but the elevator should handle it. Are you going to be up
>> for hacking up a fix?
>
> I have some trouble understanding what do you mean exactly. Do you think we
> should just call __blk_run_queue() after we add the request to
> q->queue_head?

No, that won't be enough, as it won't always break out of the idle 
logic. We need to ensure that the new request is noticed, so that CFQ 
knows and can decide to kick off things.

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ