lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Tue, 14 Jun 2011 09:30:47 -0400
From:	Vivek Goyal <vgoyal@...hat.com>
To:	Tao Ma <tm@....ma>
Cc:	linux-kernel@...r.kernel.org, Jens Axboe <axboe@...nel.dk>
Subject: Re: CFQ: async queue blocks the whole system

On Tue, Jun 14, 2011 at 03:03:24PM +0800, Tao Ma wrote:
> Hi Vivek,
> On 06/14/2011 05:41 AM, Vivek Goyal wrote:
> > On Mon, Jun 13, 2011 at 06:08:40PM +0800, Tao Ma wrote:
> > 
> > [..]
> >>> You can also run iostat on disk and should be able to see that with
> >>> the patch you are dispatching writes more often than before.
> >> Sorry, the patch doesn't work.
> >>
> >> I used trace event to capture all the blktraces since it doesn't
> >> interfere with the tests, hope it helps.
> > 
> > Actually I was looking for CFQ traces. This seems to be generic block
> > layer trace points. May be you can use "blktrace -d /dev/<device>"
> > and then blkparse. It also gives the aggregate view which is helpful.
> > 
> >>
> >> Please downloaded it from http://blog.coly.li/tmp/blktrace.tar.bz2
> > 
> > What concerns me is following.
> > 
> > 5255.521353: block_rq_issue: 8,0 W 0 () 571137153 + 8 [attr_set]
> > 5578.863871: block_rq_issue: 8,0 W 0 () 512950473 + 48 [kworker/0:1]
> > 
> > IIUC, we dispatched second write more than 300 seconds after dispatching
> > 1 write. What happened in between. We should have dispatched more writes.
> > 
> > CFQ traces might give better idea in terms of whether wl_type for async
> > queues was scheduled or not at all.
> I tried several times today, but it looks like that if I enable
> blktrace, the hung_task will not show up in the message. So do you think
> the blktrace at that time is still useful? If yes, I can capture 1
> minute for you. Thanks.

Capturing 1 min output will also be good. 

You can do one more thing. Mount block IO controller. It has the stats for
sync and async dispatch (blkio.io_serviced or blkio.io_service_bytes). You
can write a simple script to read and print these files every few seconds.
That will also tell whether CFQ is dispatching async requests for the 
said device regularly or not.

So both blktrace and blkio controller stat will help.

Thanks
Vivek
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ