lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20171201213322.GW692@shells.gnugeneration.com>
Date:   Fri, 1 Dec 2017 13:33:22 -0800
From:   vcaputo@...garu.com
To:     linux-kernel <linux-kernel@...r.kernel.org>
Cc:     timmurray@...gle.com, tj@...nel.org
Subject: Re: [REGRESSION] (>= v4.12) IO w/dmcrypt causing audio underruns

On Wed, Nov 29, 2017 at 10:39:19AM -0800, vcaputo@...garu.com wrote:
> Hello,
> 
> Recently I noticed substantial audio dropouts when listening to MP3s in
> `cmus` while doing big and churny `git checkout` commands in my linux git
> tree.
> 
> It's not something I've done much of over the last couple months so I
> hadn't noticed until yesterday, but didn't remember this being a problem in
> recent history.
> 
> As there's quite an accumulation of similarly configured and built kernels
> in my grub menu, it was trivial to determine approximately when this began:
> 
> 4.11.0: no dropouts
> 4.12.0-rc7: dropouts
> 4.14.0-rc6: dropouts (seem more substantial as well, didn't investigate)
> 
> Watching top while this is going on in the various kernel versions, it's
> apparent that the kworker behavior changed.  Both the priority and quantity
> of running kworker threads is elevated in kernels experiencing dropouts.
> 
> Searching through the commit history for v4.11..v4.12 uncovered:
> 
> commit a1b89132dc4f61071bdeaab92ea958e0953380a1
> Author: Tim Murray <timmurray@...gle.com>
> Date:   Fri Apr 21 11:11:36 2017 +0200
> 
>     dm crypt: use WQ_HIGHPRI for the IO and crypt workqueues
>     
>     Running dm-crypt with workqueues at the standard priority results in IO
>     competing for CPU time with standard user apps, which can lead to
>     pipeline bubbles and seriously degraded performance.  Move to using
>     WQ_HIGHPRI workqueues to protect against that.
>     
>     Signed-off-by: Tim Murray <timmurray@...gle.com>
>     Signed-off-by: Enric Balletbo i Serra <enric.balletbo@...labora.com>
>     Signed-off-by: Mike Snitzer <snitzer@...hat.com>
> 
> ---
> 
> Reverting a1b8913 from 4.14.0-rc6, my current kernel, eliminates the
> problem completely.
> 
> Looking at the diff in that commit, it looks like the commit message isn't
> even accurate; not only is the priority of the dmcrypt workqueues being
> changed - they're also being made "CPU intensive" workqueues as well.
> 
> This combination appears to result in both elevated scheduling priority and
> greater quantity of participant worker threads effectively starving any
> normal priority user task under periods of heavy IO on dmcrypt volumes.
> 
> I don't know what the right solution is here.  It seems to me we're lacking
> the appropriate mechanism for charging CPU resources consumed on behalf of
> user processes in kworker threads to the work-causing process.
> 
> What effectively happens is my normal `git` user process is able to
> greatly amplify what share of CPU it takes from the system by generating IO
> on what happens to be a high-priority CPU-intensive storage volume.
> 
> It looks potentially complicated to fix properly, but I suspect at its core
> this may be a fairly longstanding shortcoming of the page cache and its
> asynchronous design.  Something that has been exacerbated substantially by
> the introduction of CPU-intensive storage subsystems like dmcrypt.
> 
> If we imagine the whole stack simplified, where all the IO was being done
> synchronously in-band, and the dmcrypt kernel code simply ran in the
> IO-causing process context, it would be getting charged to the calling
> process and scheduled accordingly.  The resource accounting and scheduling
> problems all emerge with the page cache, buffered IO, and async background
> writeback in a pool of unrelated worker threads, etc.  That's how it
> appears to me anyways...
> 
> The system used is a X61s Thinkpad 1.8Ghz with 840 EVO SSD, lvm on dmcrypt.
> The kernel .config is attached in case it's of interest.
> 
> Thanks,
> Vito Caputo



Ping...

Could somebody please at least ACK receiving this so I'm not left wondering
if my mails to lkml are somehow winding up flagged as spam, thanks!

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ