lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20150930224317.GB2839@wfg-t540p.sh.intel.com>
Date:	Thu, 1 Oct 2015 06:43:17 +0800
From:	Fengguang Wu <fengguang.wu@...el.com>
To:	Kirill Tkhai <ktkhai@...n.com>
Cc:	0day robot <fengguang.wu@...el.com>, LKP <lkp@...org>,
	LKML <linux-kernel@...r.kernel.org>,
	"Huang, Ying" <ying.huang@...ux.intel.com>
Subject: [sched/fair] swapper/0/1 is trying to release lock
 (rcu_read_lockrcu_read_lock) at:

Hi Kirill,

FYI, we applied your patch on v4.3-rc3 and find the below boot error.

commit 482eaa50ff81046e1e9f95af94176953d0743ec9
Author:     Kirill Tkhai <ktkhai@...n.com>
AuthorDate: Mon Sep 28 18:36:28 2015 +0300
Commit:     0day robot <fengguang.wu@...el.com>
CommitDate: Mon Sep 28 23:38:23 2015 +0800

    sched/fair: Skip wake_affine() for core siblings
    
    On 28.09.2015 16:12, Mike Galbraith wrote:
    > On Mon, 2015-09-28 at 13:28 +0300, Kirill Tkhai wrote:
    >
    >> Looks like, NAK may be better, because it saves L1 cache, while the patch always invalidates it.
    >
    > Yeah, bounce hurts more when there's no concurrency win waiting to be
    > collected.  This mixed load wasn't a great choice, but it turned out to
    > be pretty interesting.  Something waking a gaggle of waiters on a busy
    > big socket should do very bad things.
    >
    >> Could you say, do you execute pgbench using just -cX -jY -T30 or something special? I've tried it,
    >> but the dispersion of the results much differs from time to time.
    >
    > pgbench -T $testtime -j 1 -S -c $clients
    
    Using -S the results stabilized. It looks like my db is enormous, and some problem with that. I will
    investigate.
    
    Thanks!
    
    >>> Ok, that's what I want to see, full repeat.
    >>> master = twiddle
    >>> master+ = twiddle+patch
    >>>
    >>> concurrent tbench 4 + pgbench, 2 minutes per client count (i4790+smt)
    >>>                                              master                           master+
    >>> pgbench                   1       2       3     avg         1       2       3     avg   comp
    >>> clients 1       tps = 18599   18627   18532   18586     17480   17682   17606   17589   .946
    >>> clients 2       tps = 32344   32313   32408   32355     25167   26140   23730   25012   .773
    >>> clients 4       tps = 52593   51390   51095   51692     22983   23046   22427   22818   .441
    >>> clients 8       tps = 70354   69583   70107   70014     66924   66672   69310   67635   .966
    >>>
    >>> Hrm... turn the tables, measure tbench while pgbench 4 client load runs endlessly.
    >>>
    >>>                                              master                           master+
    >>> tbench                    1       2       3     avg         1       2       3     avg   comp
    >>> pairs 1        MB/s =   430     426     436     430       481     481     494     485  1.127
    >>> pairs 2        MB/s =  1083    1085    1072    1080      1086    1090    1083    1086  1.005
    >>> pairs 4        MB/s =  1725    1697    1729    1717      2023    2002    2006    2010  1.170
    >>> pairs 8        MB/s =  2740    2631    2700    2690      3016    2977    3071    3021  1.123
    >>>
    >>> tbench without competition
    >>>                master        master+   comp
    >>> pairs 1        MB/s =   694     692    .997
    >>> pairs 2        MB/s =  1268    1259    .992
    >>> pairs 4        MB/s =  2210    2165    .979
    >>> pairs 8        MB/s =  3586    3526    .983  (yawn, all within routine variance)
    >>
    >> Hm, it seems tbench with competition is better only because of a busy system makes tbench
    >> processes be woken on the same cpu.
    >
    > Yeah.  When box is really full, select_idle_sibling() (obviously) turns
    > into a waste of cycles, but even as you approach that, especially when
    > filling the box with identical copies of nearly fully synchronous high
    > frequency localhost packet blasters, stacking is a win.
    >
    > What bent my head up a bit was the combined effect of making wake_wide()
    > really keep pgbench from collapsing then adding the affine wakeup grant
    > for tbench.  It's not at all clear to me why 2,4 would be so demolished.
    
    Mike, one more moment. wake_wide() and current logic confuses me a bit.
    It makes us to decide if we want affine wakeup or not, but select_idle_sibling()
    if a function is not for choosing this_cpu's llc domain only. We use it
    for searching in prev_cpu llc domain too, and it seems we are not interested
    in current flips in this case. Imagine a situation, when we share a mutex
    with a task on another NUMA node. When the task is realising the mutex
    it is waking us, but we definitelly won't use affine logic in this case.
    We wake the wakee anywhere and loose hot cache. I changed the logic, and
    tried pgbench 1:8. The results (I threw away 3 first iterations, because
    they much differ with iter >= 4. Looks like, the reason is in uncached disk IO).
    
    Before:
    
      trans. |  tps (i)     | tps (e)
    --------------------------------------
    12098226 | 60491.067392 | 60500.886373
    12030184 | 60150.874285 | 60160.654295
    11882977 | 59414.829150 | 59424.830637
    12020125 | 60100.579023 | 60111.600176
    12161917 | 60809.547906 | 60827.321639
    12154660 | 60773.249254 | 60783.085165
    
    After:
    
     trans.  | tps (i)      | tps (e)
    --------------------------------------
    12770407 | 63849.883578 | 63860.310019
    12635366 | 63176.399769 | 63187.152569
    12676890 | 63384.396440 | 63400.930755
    12639949 | 63199.526330 | 63210.460753
    12670626 | 63353.079951 | 63363.274143
    12647001 | 63209.613698 | 63219.812331
    
    I'm going to test other cases, but could you tell me (if you remember) are there reasons
    we skip prev_cpu, like I described above? Some types of workloads etc.

+----------------------------------------------------------------------------------------+----------+------------+------------+
|                                                                                        | v4.3-rc3 | 482eaa50ff | 2c8c0a4d26 |
+----------------------------------------------------------------------------------------+----------+------------+------------+
| boot_successes                                                                         | 150      | 41         | 0          |
| boot_failures                                                                          | 0        | 51         | 24         |
| is_trying_to_release_lock(rcu_read_lockrcu_read_lock)at                                | 0        | 51         | 4          |
| BUG:sleeping_function_called_from_invalid_context_at_mm/slab.c                         | 0        | 51         | 4          |
| INFO:lockdep_is_turned_off                                                             | 0        | 51         | 4          |
| BUG:sleeping_function_called_from_invalid_context_at_kernel/rcu/tree.c                 | 0        | 26         | 2          |
| BUG:sleeping_function_called_from_invalid_context_at_arch/x86/mm/fault.c               | 0        | 2          |            |
| backtrace:spawn_ksoftirqd                                                              | 0        | 51         | 4          |
| backtrace:kernel_init_freeable                                                         | 0        | 51         | 4          |
| backtrace:getname_kernel                                                               | 0        | 51         | 4          |
| backtrace:run_init_process                                                             | 0        | 51         | 4          |
| BUG:sleeping_function_called_from_invalid_context_at_arch/x86/include/asm/uaccess.h    | 0        | 7          |            |
| BUG:sleeping_function_called_from_invalid_context_at_mm/vmstat.c                       | 0        | 5          | 3          |
| BUG:sleeping_function_called_from_invalid_context_at_kernel/locking/mutex.c            | 0        | 17         |            |
| BUG:sleeping_function_called_from_invalid_context_at_kernel/sched/completion.c         | 0        | 2          |            |
| BUG:sleeping_function_called_from_invalid_context_at_kernel/workqueue.c                | 0        | 1          |            |
| BUG:sleeping_function_called_from_invalid_context_at_kernel/cpu.c                      | 0        | 5          | 1          |
| BUG:sleeping_function_called_from_invalid_context_at_kernel/softirq.c                  | 0        | 1          |            |
| BUG:sleeping_function_called_from_invalid_context_at_mm/page_alloc.c                   | 0        | 1          |            |
| BUG:sleeping_function_called_from_invalid_context_at_kernel/events/core.c              | 0        | 1          |            |
| BUG:sleeping_function_called_from_invalid_context_at_arch/x86/include/asm/uaccess_32.h | 0        | 1          |            |
| BUG:sleeping_function_called_from_invalid_context_at_fs/inode.c                        | 0        | 1          |            |
| BUG:sleeping_function_called_from_invalid_context_at_arch/x86/mm/highmem_32.c          | 0        | 1          |            |
| backtrace:flush_to_ldisc                                                               | 0        | 1          |            |
| BUG:sleeping_function_called_from_invalid_context_at_arch/x86/lib/usercopy_32.c        | 0        | 1          |            |
| backtrace:force_qs_rnp                                                                 | 0        | 0          | 2          |
| backtrace:cache_reap                                                                   | 0        | 0          | 4          |
| backtrace:vmstat_update                                                                | 0        | 0          | 3          |
| backtrace:vmstat_shepherd                                                              | 0        | 0          | 1          |
| BUG:kernel_boot_hang                                                                   | 0        | 0          | 20         |
+----------------------------------------------------------------------------------------+----------+------------+------------+

[    0.888548] -------------------------------------
[    0.888548] -------------------------------------
[    0.890637] swapper/0/1 is trying to release lock (
[    0.890637] swapper/0/1 is trying to release lock (rcu_read_lockrcu_read_lock) at:
) at:
[    0.893535] [<4109805b>] select_task_rq_fair+0x42b/0xa30
[    0.893535] [<4109805b>] select_task_rq_fair+0x42b/0xa30
[    0.895846] but there are no more locks to release!
[    0.895846] but there are no more locks to release!
[    0.897963] 
[    0.897963] other info that might help us debug this:
[    0.897963] 
[    0.897963] other info that might help us debug this:
[    0.901067] 3 locks held by swapper/0/1:
[    0.901067] 3 locks held by swapper/0/1:
[    0.902899]  #0: 
[    0.902899]  #0:  ( (cpu_hotplug.lockcpu_hotplug.lock){......}){......}, at: , at: [<41063c7a>] get_online_cpus+0x2a/0x80
[<41063c7a>] get_online_cpus+0x2a/0x80
[    0.906893]  #1: 
[    0.906893]  #1:  ( (smpboot_threads_locksmpboot_threads_lock){......}){......}, at: , at: [<4108695d>] smpboot_register_percpu_thread_cpumask+0x5d/0x1d0
[<4108695d>] smpboot_register_percpu_thread_cpumask+0x5d/0x1d0
[    0.911703]  #2: 
[    0.911703]  #2:  ( (&p->pi_lock&p->pi_lock){......}){......}, at: , at: [<4108e603>] try_to_wake_up+0x23/0x320
[<4108e603>] try_to_wake_up+0x23/0x320
[    0.915055] 
[    0.915055] stack backtrace:
[    0.915055] 
[    0.915055] stack backtrace:
[    0.917115] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.3.0-rc3-00001-g482eaa5 #1
[    0.917115] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.3.0-rc3-00001-g482eaa5 #1
[    0.924223] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Debian-1.8.2-1 04/01/2014
[    0.924223] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Debian-1.8.2-1 04/01/2014
[    0.928408]  41b28f7c
[    0.928408]  41b28f7c 50625d0c 50625d0c 4137de12 4137de12 50622040 50622040 50625d2c 50625d2c 410b69ca 410b69ca 41a21090 41a21090 4109805b 4109805b

[    0.933450]  4109805b
[    0.933450]  4109805b ffffffff ffffffff af9dda70 af9dda70 50622574 50622574 50625d78 50625d78 410b8aae 410b8aae 00000000 00000000 00000000 00000000

[    0.941080]  00000003
[    0.941080]  00000003 00000000 00000000 00000000 00000000 00000002 00000002 00000046 00000046 50622040 50622040 00000003 00000003 506225ac 506225ac

[    0.944747] Call Trace:
[    0.944747] Call Trace:
[    0.945907]  [<4137de12>] dump_stack+0x4b/0x79
[    0.945907]  [<4137de12>] dump_stack+0x4b/0x79
[    0.948570]  [<410b69ca>] print_unlock_imbalance_bug+0xda/0xe0
[    0.948570]  [<410b69ca>] print_unlock_imbalance_bug+0xda/0xe0
[    0.950975]  [<4109805b>] ? select_task_rq_fair+0x42b/0xa30
[    0.950975]  [<4109805b>] ? select_task_rq_fair+0x42b/0xa30
[    0.953365]  [<4109805b>] ? select_task_rq_fair+0x42b/0xa30
[    0.953365]  [<4109805b>] ? select_task_rq_fair+0x42b/0xa30
[    0.955919]  [<410b8aae>] lock_release+0x27e/0x430
[    0.955919]  [<410b8aae>] lock_release+0x27e/0x430
[    0.958209]  [<4109805b>] ? select_task_rq_fair+0x42b/0xa30
[    0.958209]  [<4109805b>] ? select_task_rq_fair+0x42b/0xa30
[    0.960926]  [<4109806f>] select_task_rq_fair+0x43f/0xa30
[    0.960926]  [<4109806f>] select_task_rq_fair+0x43f/0xa30
[    0.963567]  [<4108e603>] ? try_to_wake_up+0x23/0x320
[    0.963567]  [<4108e603>] ? try_to_wake_up+0x23/0x320
[    0.965903]  [<410b73c6>] ? __lock_is_held+0x46/0x70
[    0.965903]  [<410b73c6>] ? __lock_is_held+0x46/0x70
[    0.968065]  [<4108e70e>] try_to_wake_up+0x12e/0x320
[    0.968065]  [<4108e70e>] try_to_wake_up+0x12e/0x320
[    0.970129]  [<4108e91f>] wake_up_process+0x1f/0x50
[    0.970129]  [<4108e91f>] wake_up_process+0x1f/0x50
[    0.972188]  [<410827ef>] kthread_create_on_node+0xaf/0x140
[    0.972188]  [<410827ef>] kthread_create_on_node+0xaf/0x140
[    0.974608]  [<410830be>] kthread_create_on_cpu+0x2e/0x60
[    0.974608]  [<410830be>] kthread_create_on_cpu+0x2e/0x60
[    0.977100]  [<410862a0>] ? sort_range+0x30/0x30
[    0.977100]  [<410862a0>] ? sort_range+0x30/0x30
[    0.979251]  [<410865f7>] __smpboot_create_thread+0x67/0xe0
[    0.979251]  [<410865f7>] __smpboot_create_thread+0x67/0xe0
[    0.981886]  [<410869ab>] smpboot_register_percpu_thread_cpumask+0xab/0x1d0
[    0.981886]  [<410869ab>] smpboot_register_percpu_thread_cpumask+0xab/0x1d0
[    0.984986]  [<41be5089>] ? trace_event_define_fields_softirq+0x1c/0x1c
[    0.984986]  [<41be5089>] ? trace_event_define_fields_softirq+0x1c/0x1c
[    0.987866]  [<41be50a6>] spawn_ksoftirqd+0x1d/0x27
[    0.987866]  [<41be50a6>] spawn_ksoftirqd+0x1d/0x27
[    0.990119]  [<4100044e>] do_one_initcall+0x7e/0x1d0
[    0.990119]  [<4100044e>] do_one_initcall+0x7e/0x1d0
[    0.992251]  [<41be5089>] ? trace_event_define_fields_softirq+0x1c/0x1c
[    0.992251]  [<41be5089>] ? trace_event_define_fields_softirq+0x1c/0x1c
[    0.995097]  [<410c3a17>] ? vprintk_default+0x37/0x40
[    0.995097]  [<410c3a17>] ? vprintk_default+0x37/0x40
[    0.997301]  [<4114bf41>] ? printk+0x17/0x19
[    0.997301]  [<4114bf41>] ? printk+0x17/0x19
[    0.999302]  [<41015eba>] ? print_cpu_info+0x8a/0x120
[    0.999302]  [<41015eba>] ? print_cpu_info+0x8a/0x120
[    1.001692]  [<41bdd77b>] ? native_smp_prepare_cpus+0x28c/0x2c8
[    1.001692]  [<41bdd77b>] ? native_smp_prepare_cpus+0x28c/0x2c8
[    1.004237]  [<41bcfc07>] kernel_init_freeable+0xf4/0x20c
[    1.004237]  [<41bcfc07>] kernel_init_freeable+0xf4/0x20c
[    1.006939]  [<417f8730>] kernel_init+0x10/0xe0
[    1.006939]  [<417f8730>] kernel_init+0x10/0xe0
[    1.009210]  [<4109029f>] ? schedule_tail+0x2f/0xa0
[    1.009210]  [<4109029f>] ? schedule_tail+0x2f/0xa0
[    1.011599]  [<41801831>] ret_from_kernel_thread+0x21/0x30
[    1.011599]  [<41801831>] ret_from_kernel_thread+0x21/0x30
[    1.014045]  [<417f8720>] ? rest_init+0xb0/0xb0
[    1.014045]  [<417f8720>] ? rest_init+0xb0/0xb0
[    1.060738] CPU 1 irqstacks, hard=4fc0a000 soft=4fc0c000
[    1.060738] CPU 1 irqstacks, hard=4fc0a000 soft=4fc0c000
[    1.063030] x86: Booting SMP configuration:

git bisect start 2c8c0a4d26e74a8987dc59def3bed02cd3956692 9ffecb10283508260936b96022d4ee43a7798b4c --
git bisect  bad 612cca0860f13d07ff14213738fb3c89c24ced8c  # 11:25      0-      1  Merge 'linux-review/Kirill-Tkhai/Re-PATCH-sched-fair-Skip-wake_affine-for-core-siblings' into devel-catchup-201509291039
git bisect good e14fdfbf8c68b5504f7d96a7d7a5660ccc081edf  # 11:57     47+     47  Merge 'linux-review/Richard-Fitzgerald/regulator-arizona-add-support-for-WM8998-and-WM1814' into devel-catchup-201509291039
git bisect good 0434b9a84ff1b373a1cc167ee6ff340d31155f57  # 12:20     50+     50  Merge 'linux-review/Paul-Gortmaker/net-core-make-sock_diag-c-explicitly-non-modular' into devel-catchup-201509291039
git bisect good 2411c8f8387f9fb1c964de2c8b9a3ebddc8b490b  # 12:42     48+     48  Merge 'linux-review/Lukasz-Skalski/Documentation-kdbus-use-valid-endpoint-names' into devel-catchup-201509291039
git bisect good df959084968cff10e79c79716f5f34ccf640d1a6  # 13:05     47+     47  Merge 'linux-review/Luis-de-Bethencourt/staging-comedi-ni_tiocmd-remove-unused-code' into devel-catchup-201509291039
git bisect good ba9a39fc21229187dda68b90cbac55c17974a2da  # 13:37     49+     49  Merge 'linux-review/Liviu-Dudau/RESEND-PATCH-v3-net-next-sky2-use-random-address-if-EEPROM-is-bad' into devel-catchup-201509291039
git bisect good ed3f6ff4efb03739f5f102f3053c3657e52f682b  # 14:04     47+     47  Merge 'linux-review/Kishon-Vijay-Abraham-I/DRA72-DRA74-Add-2-lane-support' into devel-catchup-201509291039
git bisect  bad 482eaa50ff81046e1e9f95af94176953d0743ec9  # 14:04      0-     51  sched/fair: Skip wake_affine() for core siblings
# first bad commit: [482eaa50ff81046e1e9f95af94176953d0743ec9] sched/fair: Skip wake_affine() for core siblings
git bisect good 9ffecb10283508260936b96022d4ee43a7798b4c  # 14:08    150+      0  Linux 4.3-rc3
# extra tests on HEAD of linux-devel/devel-catchup-201509291039
git bisect  bad 2c8c0a4d26e74a8987dc59def3bed02cd3956692  # 14:08      0-     24  0day head guard for 'devel-catchup-201509291039'
# extra tests on tree/branch linux-devel/revert-482eaa50ff81046e1e9f95af94176953d0743ec9-482eaa50ff81046e1e9f95af94176953d0743ec9
git bisect good 3a51398efa3a8362652971306175b65774480e80  # 14:25    151+      0  Revert "sched/fair: Skip wake_affine() for core siblings"
# extra tests on tree/branch linus/master
git bisect good 097f70b3c4d84ffccca15195bdfde3a37c0a7c0f  # 14:48    151+      0  Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus
# extra tests on tree/branch linux-next/master
git bisect good 0293645856ec527639b5902f021fa5aeba93e305  # 14:55    151+      0  Add linux-next specific files for 20150929


This script may reproduce the error.

----------------------------------------------------------------------------
#!/bin/bash

kernel=$1
initrd=quantal-core-i386.cgz

wget --no-clobber https://github.com/fengguang/reproduce-kernel-bug/raw/master/initrd/$initrd

kvm=(
	qemu-system-x86_64
	-enable-kvm
	-cpu kvm64
	-kernel $kernel
	-initrd $initrd
	-m 300
	-smp 2
	-device e1000,netdev=net0
	-netdev user,id=net0
	-boot order=nc
	-no-reboot
	-watchdog i6300esb
	-rtc base=localtime
	-serial stdio
	-display none
	-monitor null 
)

append=(
	hung_task_panic=1
	earlyprintk=ttyS0,115200
	systemd.log_level=err
	debug
	apic=debug
	sysrq_always_enabled
	rcupdate.rcu_cpu_stall_timeout=100
	panic=-1
	softlockup_panic=1
	nmi_watchdog=panic
	oops=panic
	load_ramdisk=2
	prompt_ramdisk=0
	console=ttyS0,115200
	console=tty0
	vga=normal
	root=/dev/ram0
	rw
	drbd.minor_count=8
)

"${kvm[@]}" --append "${append[*]}"
----------------------------------------------------------------------------

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/lkp                          Intel Corporation

View attachment "dmesg-quantal-vp-27:20150929000241:i386-randconfig-s0-201539+CONFIG_DEBUG_INFO_REDUCED:4.3.0-rc3-00001-g482eaa5:1" of type "text/plain" (84189 bytes)

View attachment ".config" of type "text/plain" (96028 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ