lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <169481048.20160703172411@gmail.com>
Date:	Sun, 3 Jul 2016 17:24:11 +0000
From:	Vladimir Panteleev <thecybershadow@...il.com>
To:	Thomas Gleixner <tglx@...utronix.de>,
	Ingo Molnar <mingo@...hat.com>,
	"H. Peter Anvin" <hpa@...or.com>, x86@...nel.org
CC:	linux-kernel@...r.kernel.org
Subject: Subject: PROBLEM: CPU accounting/scheduling regression in v4.6 CPU scheduling patchset?

Hi,

Since updating my PC to Linux 4.6, I noticed the following problems:

1. CPU-bound tasks which use all CPU cores have a severe impact on
   responsiveness.  For example, the following bash command (which
   simply starts one busyloop per core) is enough to make the machine
   almost completely unresponsive:

   for N in $(seq $(nproc)) ; do while true ; do ; done & ; done

2. Nearly all tasks in the process listing are shown with 0% CPU
   usage, even when they're CPU-bound. The only exceptions are the
   kernel migration and kthreadd tasks, and occasionally the init
   process.

I have bisected the problem to commit
1cf4f629d9d246519a1e76c021806f2a51ddba4d ("cpu/hotplug: Move online
calls to hotplugged cpu"), which is part of Thomas Gleixner's CPU
hotplug refactoring patchset [1]. It introduces both problems
described above.

My system is a GIGABYTE X79S-UP5-WIFI motherboard (F5f BIOS) with an
i7-4960X CPU, running Arch Linux. I've reproduced with both the
distro's kernel config [2], as well as a minimal config for my
system. I can reproduce the problems on the latest rc at the moment,
v4.7-rc5.

Comparing dmesg output before and after 1cf4f629, I see no notable
differences.

I noticed an existing thread "S3 resume regression" [3] referencing
this commit, however it describes a different problem. I also found a
Bugzilla issue for the zero CPU usage problem [4], however it has no
replies.

[1]: https://lkml.org/lkml/2016/2/26/806
[2]: https://aur.archlinux.org/cgit/aur.git/tree/config.x86_64?h=linux-git
[3]: https://lkml.org/lkml/2016/5/11/238
[4]: https://bugzilla.kernel.org/show_bug.cgi?id=120151

Stuff REPORTING-BUGS told me to include:

ver_linux output:
https://dump.v.panteleev.md/616390d43a4c6a3d085acc5eaa390c82/16%3A58%3A08-stdin.txt

/proc/cpuinfo:
https://dump.v.panteleev.md/5dfeba5d7c64028de51d50559b566088/16%3A58%3A49-stdin.txt

/proc/modules:
https://dump.v.panteleev.md/868c0f2b23651be8164975fa5d7e7aab/16%3A59%3A18-stdin.txt

/proc/ioports:
https://dump.v.panteleev.md/5e44aa12cc403dbd783b0273bd3edab4/17%3A01%3A33-stdin.txt

/proc/iomem:
https://dump.v.panteleev.md/110a8fdd0f647fd8d729c54f4f01a3d0/17%3A01%3A49-stdin.txt

"lspci -vvv" output:
https://dump.v.panteleev.md/0c2448fa8a872e34c4555d876b656013/17%3A02%3A18-stdin.txt

/proc/scsi/scsi:
https://dump.v.panteleev.md/6efa007ce74f0bf4ce10ae56690c63de/17%3A02%3A54-stdin.txt

dmesg output:
https://dump.v.panteleev.md/b8a3ba608a914a3d70667dad697dddfb/1467563818.log

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ