lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date:	Mon, 16 Nov 2015 16:11:59 +0100
From:	Jan Kara <jack@...e.cz>
To:	LKML <linux-kernel@...r.kernel.org>
Cc:	axboe@...nel.dk, Jeff Moyer <jmoyer@...hat.com>
Subject: CFQ timer precision

Hello,

lately I was looking into a big performance hit we take when blkio
controller is enabled and jbd2 thread ends up in a different cgroup than
user process. E.g. dbench4 throughput drops from ~140 MB/s to ~20 MB/s.
However artificial dbench4 is, this kind of drop will likely be clearly
visible in real life workloads as well. With unified cgroup hierarchy
the above cgroup split between jbd2 and user processes is unavoidable
once you enable blkio controller so IMO we should accomodate that better.

I have couple of CFQ idling improvements / fixes which I'll post later this
week once I'll complete some round of benchmarking. They improve the
throughput to ~40 MB/s which helps but clearly there's still a big room for
improvement. The reason for the performance drop is essentially in idling
we do to avoid starvation of CFQ queues. Now when idling in this context,
current default of 8 ms idle window is far to large - we start the timer
after the final request is completed and thus we effectively give the
process 8 ms of CPU time to submit the next IO request. Which I think is
usually far too much. The problem is that more fine grained idling is
actually problematic because e.g. SUSE distro kernels have HZ=250 and thus
1 jiffy is 4 ms. Hence my proposal: Do you think it would be OK to convert
CFQ to use highres timers and do all the accounting in microseconds?
Then we could tune the idle time to be say 1ms or even autotune it based on
process' think time both of which I expect would get us much closer to
original throughput (4 ms idle window gets us to ~70 MB/s with my patches,
disabling idling gets us to original throughput as expected).

Thoughts?

								Honza

-- 
Jan Kara <jack@...e.com>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ