lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Fri, 08 Aug 2014 14:10:56 +0530
From:	Amit Shah <amit.shah@...hat.com>
To:	linux-kernel@...r.kernel.org
Cc:	mingo@...nel.org, laijs@...fujitsu.com, dipankar@...ibm.com,
	akpm@...ux-foundation.org, mathieu.desnoyers@...icios.com,
	josh@...htriplett.org, niv@...ibm.com, tglx@...utronix.de,
	peterz@...radead.org, rostedt@...dmis.org, dhowells@...hat.com,
	edumazet@...gle.com, dvhart@...ux.intel.com, fweisbec@...il.com,
	oleg@...hat.com, sbw@....edu
Subject: Re: [PATCH tip/core/rcu 1/2] rcu: Parallelize and economize NOCB
 kthread wakeups

On Friday 11 July 2014 07:05 PM, Paul E. McKenney wrote:
> From: "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
>
> An 80-CPU system with a context-switch-heavy workload can require so
> many NOCB kthread wakeups that the RCU grace-period kthreads spend several
> tens of percent of a CPU just awakening things.  This clearly will not
> scale well: If you add enough CPUs, the RCU grace-period kthreads would
> get behind, increasing grace-period latency.
>
> To avoid this problem, this commit divides the NOCB kthreads into leaders
> and followers, where the grace-period kthreads awaken the leaders each of
> whom in turn awakens its followers.  By default, the number of groups of
> kthreads is the square root of the number of CPUs, but this default may
> be overridden using the rcutree.rcu_nocb_leader_stride boot parameter.
> This reduces the number of wakeups done per grace period by the RCU
> grace-period kthread by the square root of the number of CPUs, but of
> course by shifting those wakeups to the leaders.  In addition, because
> the leaders do grace periods on behalf of their respective followers,
> the number of wakeups of the followers decreases by up to a factor of two.
> Instead of being awakened once when new callbacks arrive and again
> at the end of the grace period, the followers are awakened only at
> the end of the grace period.
>
> For a numerical example, in a 4096-CPU system, the grace-period kthread
> would awaken 64 leaders, each of which would awaken its 63 followers
> at the end of the grace period.  This compares favorably with the 79
> wakeups for the grace-period kthread on an 80-CPU system.
>
> Reported-by: Rik van Riel <riel@...hat.com>
> Signed-off-by: Paul E. McKenney <paulmck@...ux.vnet.ibm.com>

This patch causes KVM guest boot to not proceed after a while.  .config 
is attached, and boot messages are appeneded.  This commit was pointed 
to by bisect, and reverting on current master (while addressing a 
trivial conflict) makes the boot work again.

The qemu cmdline is

./x86_64-softmmu/qemu-system-x86_64 -m 512 -smp 2 -cpu 
host,+kvmclock,+x2apic -enable-kvm  -kernel 
~/src/linux/arch/x86/boot/bzImage /guests/f11-auto.qcow2  -append 
'root=/dev/sda2 console=ttyS0 console=tty0' -snapshot -serial stdio

Using qemu.git.

Rik suggested collecting qemu stack traces, here they are:

$ pgrep qemu
10587
$ cat /proc/10587/stack
[<ffffffff811fa559>] poll_schedule_timeout+0x49/0x70
[<ffffffff811fbbf2>] do_sys_poll+0x442/0x560
[<ffffffff811fc063>] SyS_ppoll+0x1b3/0x1d0
[<ffffffff816ff969>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff

$ cat /proc/10587/task/105
10587/ 10589/ 10590/ 10592/


$ cat /proc/10587/task/*/stack
[<ffffffff811fa559>] poll_schedule_timeout+0x49/0x70
[<ffffffff811fbbf2>] do_sys_poll+0x442/0x560
[<ffffffff811fc063>] SyS_ppoll+0x1b3/0x1d0
[<ffffffff816ff969>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff
[<ffffffffa04d3c3d>] kvm_vcpu_block+0x7d/0xd0 [kvm]
[<ffffffffa04ec87c>] kvm_arch_vcpu_ioctl_run+0x11c/0x1180 [kvm]
[<ffffffffa04d6fca>] kvm_vcpu_ioctl+0x2aa/0x5a0 [kvm]
[<ffffffff811f9ac0>] do_vfs_ioctl+0x2e0/0x4a0
[<ffffffff811f9d01>] SyS_ioctl+0x81/0xa0
[<ffffffff816ff969>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff
[<ffffffffa04d3c3d>] kvm_vcpu_block+0x7d/0xd0 [kvm]
[<ffffffffa04ec87c>] kvm_arch_vcpu_ioctl_run+0x11c/0x1180 [kvm]
[<ffffffffa04d6fca>] kvm_vcpu_ioctl+0x2aa/0x5a0 [kvm]
[<ffffffff811f9ac0>] do_vfs_ioctl+0x2e0/0x4a0
[<ffffffff811f9d01>] SyS_ioctl+0x81/0xa0
[<ffffffff816ff969>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff
[<ffffffff811fa559>] poll_schedule_timeout+0x49/0x70
[<ffffffff811fbbf2>] do_sys_poll+0x442/0x560
[<ffffffff811fbe14>] SyS_poll+0x74/0x110
[<ffffffff816ff969>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff


[    0.000000] Initializing cgroup subsys cpu
[    0.000000] Linux version 3.16.0-rc1+ (amit@...bl.mre) (gcc version 
4.8.3 20140624 (Red Hat 4.8.3-1) (GCC) ) #71 SMP PREEMPT Thu Aug 7 
21:30:26 IST 2014
[    0.000000] Command line: root=/dev/sda2 console=ttyS0 console=tty0
[    0.000000] e820: BIOS-provided physical RAM map:
[    0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009fbff] usable
[    0.000000] BIOS-e820: [mem 0x000000000009fc00-0x000000000009ffff] 
reserved
[    0.000000] BIOS-e820: [mem 0x00000000000f0000-0x00000000000fffff] 
reserved
[    0.000000] BIOS-e820: [mem 0x0000000000100000-0x000000001ffdffff] usable
[    0.000000] BIOS-e820: [mem 0x000000001ffe0000-0x000000001fffffff] 
reserved
[    0.000000] BIOS-e820: [mem 0x00000000feffc000-0x00000000feffffff] 
reserved
[    0.000000] BIOS-e820: [mem 0x00000000fffc0000-0x00000000ffffffff] 
reserved
[    0.000000] NX (Execute Disable) protection: active
[    0.000000] SMBIOS 2.8 present.
[    0.000000] Hypervisor detected: KVM
[    0.000000] AGP: No AGP bridge found
[    0.000000] e820: last_pfn = 0x1ffe0 max_arch_pfn = 0x400000000
[    0.000000] x86 PAT enabled: cpu 0, old 0x7040600070406, new 
0x7010600070106
[    0.000000] init_memory_mapping: [mem 0x00000000-0x000fffff]
[    0.000000] init_memory_mapping: [mem 0x1f800000-0x1f9fffff]
[    0.000000] init_memory_mapping: [mem 0x1c000000-0x1f7fffff]
[    0.000000] init_memory_mapping: [mem 0x00100000-0x1bffffff]
[    0.000000] init_memory_mapping: [mem 0x1fa00000-0x1ffdffff]
[    0.000000] RAMDISK: [mem 0x1fa2e000-0x1ffeffff]
[    0.000000] Allocated new RAMDISK: [mem 0x1f342000-0x1f903645]
[    0.000000] Move RAMDISK from [mem 0x1fa2e000-0x1ffef645] to [mem 
0x1f342000-0x1f903645]
[    0.000000] ACPI: Early table checksum verification disabled
[    0.000000] ACPI: RSDP 0x00000000000F0C50 000014 (v00 BOCHS )
[    0.000000] ACPI: ??k? 0x000000001FFE18BD 419C3D35 (v198 9?E�G� 
�#��??�\ D5C8453D ??�� 811D127E)
[    0.000000] ------------[ cut here ]------------
[    0.000000] WARNING: CPU: 0 PID: 0 at mm/early_ioremap.c:136 
__early_ioremap+0xf5/0x1c4()
[    0.000000] Modules linked in:
[    0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 3.16.0-rc1+ #71
[    0.000000] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), 
BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
[    0.000000]  0000000000000000 6f00866b7b6a4d61 ffffffff81803d30 
ffffffff813e04f5
[    0.000000]  0000000000000000 ffffffff81803d68 ffffffff81038668 
ffffffff81aad219
[    0.000000]  0000000000000000 00000000000419c5 0000000000000000 
0000000000000000
[    0.000000] Call Trace:
[    0.000000]  [<ffffffff813e04f5>] dump_stack+0x4e/0x7a
[    0.000000]  [<ffffffff81038668>] warn_slowpath_common+0x7f/0x98
[    0.000000]  [<ffffffff81aad219>] ? __early_ioremap+0xf5/0x1c4
[    0.000000]  [<ffffffff81038779>] warn_slowpath_null+0x1a/0x1c
[    0.000000]  [<ffffffff81aad219>] __early_ioremap+0xf5/0x1c4
[    0.000000]  [<ffffffff810872b2>] ? wake_up_klogd+0x52/0x66
[    0.000000]  [<ffffffff813dc655>] ? __pte+0x17/0x19
[    0.000000]  [<ffffffff81aad49c>] early_ioremap+0x13/0x15
[    0.000000]  [<ffffffff81a95b8b>] __acpi_map_table+0x13/0x18
[    0.000000]  [<ffffffff813da661>] acpi_os_map_iomem+0x26/0x14b
[    0.000000]  [<ffffffff813da794>] acpi_os_map_memory+0xe/0x10
[    0.000000]  [<ffffffff81abcdbe>] acpi_tb_parse_root_table+0xf6/0x1d9
[    0.000000]  [<ffffffff81abcef8>] acpi_initialize_tables+0x57/0x59
[    0.000000]  [<ffffffff81abb49c>] acpi_table_init+0x5d/0xef
[    0.000000]  [<ffffffff813452c3>] ? dmi_check_system+0x20/0x49
[    0.000000]  [<ffffffff81a95f64>] acpi_boot_table_init+0x1e/0x6c
[    0.000000]  [<ffffffff81a8e228>] setup_arch+0x883/0x95c
[    0.000000]  [<ffffffff81a8ac0b>] start_kernel+0xe5/0x439
[    0.000000]  [<ffffffff81a8a120>] ? early_idt_handlers+0x120/0x120
[    0.000000]  [<ffffffff81a8a4ba>] x86_64_start_reservations+0x2a/0x2c
[    0.000000]  [<ffffffff81a8a607>] x86_64_start_kernel+0x14b/0x16e
[    0.000000] ---[ end trace e2f2e6a01bc90242 ]---
[    0.000000] kvm-clock: Using msrs 4b564d01 and 4b564d00
[    0.000000] kvm-clock: cpu 0, msr 0:1ffdf001, primary cpu clock
[    0.000000] Zone ranges:
[    0.000000]   DMA      [mem 0x00001000-0x00ffffff]
[    0.000000]   DMA32    [mem 0x01000000-0xffffffff]
[    0.000000]   Normal   empty
[    0.000000] Movable zone start for each node
[    0.000000] Early memory node ranges
[    0.000000]   node   0: [mem 0x00001000-0x0009efff]
[    0.000000]   node   0: [mem 0x00100000-0x1ffdffff]
[    0.000000] smpboot: Allowing 1 CPUs, 0 hotplug CPUs
[    0.000000] PM: Registered nosave memory: [mem 0x0009f000-0x0009ffff]
[    0.000000] PM: Registered nosave memory: [mem 0x000a0000-0x000effff]
[    0.000000] PM: Registered nosave memory: [mem 0x000f0000-0x000fffff]
[    0.000000] e820: [mem 0x20000000-0xfeffbfff] available for PCI devices
[    0.000000] Booting paravirtualized kernel on KVM
[    0.000000] setup_percpu: NR_CPUS:4 nr_cpumask_bits:4 nr_cpu_ids:1 
nr_node_ids:1
[    0.000000] PERCPU: Embedded 475 pages/cpu @ffff88001fc00000 s1916480 
r8192 d20928 u2097152
[    0.000000] KVM setup async PF for cpu 0
[    0.000000] kvm-stealtime: cpu 0, msr 1fc0cbc0
[    0.000000] Built 1 zonelists in Zone order, mobility grouping on. 
Total pages: 128873
[    0.000000] Kernel command line: root=/dev/sda2 console=ttyS0 
console=tty0
[    0.000000] PID hash table entries: 2048 (order: 2, 16384 bytes)
[    0.000000] Dentry cache hash table entries: 65536 (order: 7, 524288 
bytes)
[    0.000000] Inode-cache hash table entries: 32768 (order: 6, 262144 
bytes)
[    0.000000] xsave: enabled xstate_bv 0x7, cntxt size 0x340
[    0.000000] AGP: Checking aperture...
[    0.000000] AGP: No AGP bridge found
[    0.000000] Memory: 479824K/523768K available (4009K kernel code, 
723K rwdata, 2172K rodata, 2868K init, 14172K bss, 43944K reserved)
[    0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=1, Nodes=1
[    0.000000] Preemptible hierarchical RCU implementation.
[    0.000000] 	RCU debugfs-based tracing is enabled.
[    0.000000] 	RCU lockdep checking is enabled.
[    0.000000] 	Additional per-CPU info printed with stalls.
[    0.000000] 	RCU restricting CPUs from NR_CPUS=4 to nr_cpu_ids=1.
[    0.000000] 	Offload RCU callbacks from all CPUs
[    0.000000] 	Offload RCU callbacks from CPUs: 0.
[    0.000000] RCU: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=1
[    0.000000] NO_HZ: Full dynticks CPUs: 1-3.
[    0.000000] NR_IRQS:4352 nr_irqs:256 16
[    0.000000] Console: colour VGA+ 80x25
[    0.000000] console [tty0] enabled
[    0.000000] console [ttyS0] enabled
[    0.000000] Lock dependency validator: Copyright (c) 2006 Red Hat, 
Inc., Ingo Molnar
[    0.000000] ... MAX_LOCKDEP_SUBCLASSES:  8
[    0.000000] ... MAX_LOCK_DEPTH:          48
[    0.000000] ... MAX_LOCKDEP_KEYS:        8191
[    0.000000] ... CLASSHASH_SIZE:          4096
[    0.000000] ... MAX_LOCKDEP_ENTRIES:     32768
[    0.000000] ... MAX_LOCKDEP_CHAINS:      65536
[    0.000000] ... CHAINHASH_SIZE:          32768
[    0.000000]  memory used by lock dependency info: 8671 kB
[    0.000000]  per task-struct memory footprint: 2688 bytes
[    0.000000] tsc: Detected 2790.934 MHz processor
[    0.008000] Calibrating delay loop (skipped) preset value.. 5581.86 
BogoMIPS (lpj=11163736)
[    0.008000] pid_max: default: 32768 minimum: 301
[    0.009202] Mount-cache hash table entries: 1024 (order: 1, 8192 bytes)
[    0.010479] Mountpoint-cache hash table entries: 1024 (order: 1, 8192 
bytes)
[    0.016838] Last level iTLB entries: 4KB 512, 2MB 8, 4MB 8
[    0.016838] Last level dTLB entries: 4KB 512, 2MB 32, 4MB 32, 1GB 0
[    0.016838] tlb_flushall_shift: 6
[    0.024114] debug: unmapping init [mem 
0xffffffff81b83000-0xffffffff81b85fff]
[    0.026826] ftrace: allocating 17803 entries in 70 pages
[    0.036941] smpboot: weird, boot CPU (#0) not listed by the BIOS
[    0.040023] smpboot: SMP motherboard not detected
[    0.041689] smpboot: SMP disabled
[    0.043189] Performance Events: 16-deep LBR, SandyBridge events, 
Intel PMU driver.
[    0.044012] perf_event_intel: PEBS disabled due to CPU errata, please 
upgrade microcode
[    0.048056] ... version:                2
[    0.049446] ... bit width:              48
[    0.050845] ... generic registers:      4
[    0.052007] ... value mask:             0000ffffffffffff
[    0.053708] ... max period:             000000007fffffff
[    0.055456] ... fixed-purpose events:   3
[    0.056015] ... event mask:             000000070000000f
[    0.057987] KVM setup paravirtual spinlock
[    0.081529] x86: Booted up 1 node, 1 CPUs
[    0.082986] smpboot: Total of 1 processors activated (5581.86 BogoMIPS)
[    0.112201] prandom: seed boundary self test passed
[    0.114354] prandom: 100 self tests passed
[    0.117623] NET: Registered protocol family 16
[    0.123868] cpuidle: using governor ladder
[    0.124056] cpuidle: using governor menu
[    0.126206] PCI: Using configuration type 1 for base access

<and this is where it gets stuck>

		Amit


View attachment ".config" of type "text/plain" (58703 bytes)

Powered by blists - more mailing lists