lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Sat, 12 Jul 2008 11:00:06 -0700 (PDT)
From:	Linus Torvalds <torvalds@...ux-foundation.org>
To:	Arjan van de Ven <arjan@...radead.org>
cc:	Török Edwin <edwintorok@...il.com>,
	Ingo Molnar <mingo@...e.hu>,
	Roland McGrath <roland@...hat.com>,
	Thomas Gleixner <tglx@...utronix.de>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	Elias Oltmanns <eo@...ensachen.de>,
	Oleg Nesterov <oleg@...sign.ru>
Subject: Re: [PATCH] x86_64: fix delayed signals



On Sat, 12 Jul 2008, Arjan van de Ven wrote:
> 
> I see really bad delays on 32 bit as well, but they go away for me if I
> do
> echo 4096 > /sys/block/sda/queue/nr_requests

Hmm. I think the default is 128, and in many cases latencies should 
actually go up with bigger requests queues - especially if it means that 
you can have a lot more writes in front of the read. You see the opposite 
behaviour.

That could easily happen if the scheduler is crazy and lets writes use up 
all of the request queue, or if the limited queue means that it cannot 
effectively merge requests. But request merging should happen trivially 
for the contiguous 'dd' case almost regardless of queue size, so I wonder 
if something else is going on.

Ahh.. I see something _very_ suspicious.

Look at block/blk-core.c: get_request(). It starts throttling and batching 
requests when it gets

	if (rl->count[rw]+1 >= queue_congestion_on_threshold(q)) {

and notice how this is independent of whether it's a read or a write (but 
it does count them separately). But on the wakeup path, it uses different 
limits for reads than for writes.

That batching looks pretty bogus for reads to begin with, and then 
behaving similarly on throttling but differently on wakup sounds bogus.

The blk_alloc_request() also ends up allocating all requests from one 
mempool, so if that mempool runs out (due to writes having used them all 
up), then those writes will block reads too, even though reads should have 
much higher priority.

I dunno. But there _has_ been a lot of churn in the different block queues 
over the last few months. I wouldn't be surprised at all if something got 
broken in the process. And as with filesystems, almost all performance 
tests are for throughput, not "bad latency" in the presense of other 
heavy IO.

		Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ