lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <57D8EE82.3090502@kyup.com>
Date:   Wed, 14 Sep 2016 09:30:26 +0300
From:   Nikolay Borisov <kernel@...p.com>
To:     Oleg Nesterov <oleg@...hat.com>
Cc:     "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
        linux-kernel@...r.kernel.org
Subject: Re: BUG_ON in rcu_sync_func triggered



On 09/13/2016 06:20 PM, Oleg Nesterov wrote:
> On 09/13, Nikolay Borisov wrote:
>>
>> On 09/13/2016 05:35 PM, Nikolay Borisov wrote:
>>>
>>> On 09/13/2016 04:43 PM, Oleg Nesterov wrote:
>>>> On 09/13, Oleg Nesterov wrote:
>>>>>
>>>>> OK... perhaps the unbalanced up_write... I'll try to look at freeze/thaw code,
>>>>
>>>> Heh, yes, it looks racy or I am totally confused.
>>>>
>>>>> could test the debugging patch below meanwhile?
>>>>
>>>> Yes please. I'll send you another patch (hopefully fix) later, but it
>>>> would be nice if you can test this patch to get more info.
>>>
>>> I've already started testing with this patch on 4.4.20 this time
> 
> I think it would be better to stay with the same kernel version to
> debug the problem...
> 
>> Actually forget that, here is a warning that this triggered:
>>
>> [  844.290454] WARNING: CPU: 2 PID: 1900 at kernel/rcu/sync.c:160 rcu_sync_func+0xc8/0x150()
>> ...
>> [  844.754708] XXX: ffff88047527da78 gp=2 cnt=0 cb=1
> 
> Hmm. Thanks. Please show us all the warnings you get.

What I showed you yesterday was the only warning that I got. On the same kernel another run produced the following warnings: 

[  556.869624] ------------[ cut here ]------------
[  556.875128] WARNING: CPU: 6 PID: 20210 at kernel/rcu/sync.c:111 rcu_sync_enter+0x148/0x1a0()
[  556.885142] Modules linked in: xt_state act_police cls_basic sch_ingress veth rbd libceph openvswitch nf_defrag_ipv6 nf_nat_ftp nf_conntrack_ftp xt_owner iptable_mangle xt_nat iptable_nat nf_nat_ipv4 nf_nat xt_CT iptable_raw nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ip6table_filter ip6_tables rdma_ucm ib_ucm ib_uverbs rdma_cm iw_cm dm_mirror dm_region_hash dm_log ib_umad ib_ipoib ib_cm ib_sa ib_mad ib_core ib_addr ipv6 x86_pkg_temp_thermal kvm_intel kvm irqbypass crc32_pclmul ixgbe mdio ipmi_devintf ipmi_si ipmi_msghandler igb i2c_algo_bit sb_edac edac_core i2c_i801 lpc_ich mfd_core ioatdma dca shpchp dm_thin_pool dm_bio_prison dm_persistent_data dm_bufio
[  556.960028] CPU: 6 PID: 20210 Comm: fsfreeze Not tainted 4.4.20-clouder1 #9
[  556.968143] Hardware name: Supermicro X9DRW/X9DRW, BIOS 1.0b 10/11/2012
[  556.975963]  0000000000000000 ffff880256b43cf8 ffffffff81307a9b 0000000000004ef2
[  556.985182]  0000000000000000 0000000000000000 000000000000006f ffff880256b43d38
[  556.994383]  ffffffff81054a85 ffff880473326b10 ffff880473326b08 ffff880473326b10
[  557.003589] Call Trace:
[  557.006656]  [<ffffffff81307a9b>] dump_stack+0x6b/0xa0
[  557.012737]  [<ffffffff81054a85>] warn_slowpath_common+0x95/0xe0
[  557.019781]  [<ffffffff81054aea>] warn_slowpath_null+0x1a/0x20
[  557.026645]  [<ffffffff810ab9a8>] rcu_sync_enter+0x148/0x1a0
[  557.033309]  [<ffffffff8109c9be>] percpu_down_write+0x1e/0xf0
[  557.040074]  [<ffffffff81315683>] ? call_rwsem_down_write_failed+0x13/0x20
[  557.048092]  [<ffffffff811a868b>] freeze_super+0xab/0x1b0
[  557.054456]  [<ffffffff811b7c0d>] do_vfs_ioctl+0x29d/0x560
[  557.060920]  [<ffffffff811aae7e>] ? SYSC_newfstat+0x2e/0x40
[  557.067480]  [<ffffffff811b7f62>] SyS_ioctl+0x92/0xa0
[  557.073465]  [<ffffffff8163c357>] entry_SYSCALL_64_fastpath+0x12/0x6a
[  557.081015] ---[ end trace fc087420ac1d8f16 ]---
[  557.086507] XXX: ffff880473326b08 gp=2 cnt=-1 cb=1
[  557.092326] rbd: rbd19: added with size 0x500000000

This is: if (WARN_ON(rsp->gp_count < 0)) xxx(rsp);



[  557.094835] ------------[ cut here ]------------
[  557.094841] WARNING: CPU: 1 PID: 0 at kernel/rcu/sync.c:159 rcu_sync_func+0xf9/0x150()
[  557.094873] Modules linked in: xt_state act_police cls_basic sch_ingress veth rbd libceph openvswitch nf_defrag_ipv6 nf_nat_ftp nf_conntrack_ftp xt_owner iptable_mangle xt_nat iptable_nat nf_nat_ipv4 nf_nat xt_CT iptable_raw nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ip6table_filter ip6_tables rdma_ucm ib_ucm ib_uverbs rdma_cm iw_cm dm_mirror dm_region_hash dm_log ib_umad ib_ipoib ib_cm ib_sa ib_mad ib_core ib_addr ipv6 x86_pkg_temp_thermal kvm_intel kvm irqbypass crc32_pclmul ixgbe mdio ipmi_devintf ipmi_si ipmi_msghandler igb i2c_algo_bit sb_edac edac_core i2c_i801 lpc_ich mfd_core ioatdma dca shpchp dm_thin_pool dm_bio_prison dm_persistent_data dm_bufio
[  557.094876] CPU: 1 PID: 0 Comm: swapper/1 Tainted: G        W       4.4.20-clouder1 #9
[  557.094877] Hardware name: Supermicro X9DRW/X9DRW, BIOS 1.0b 10/11/2012
[  557.094879]  0000000000000000 ffff880277c83d78 ffffffff81307a9b 0000000000000000
[  557.094881]  0000000000000000 0000000000000000 000000000000009f ffff880277c83db8
[  557.094882]  ffffffff81054a85 0000000000000000 ffff880473326bc0 ffff880473326b98
[  557.094882] Call Trace:
[  557.094889]  <IRQ>  [<ffffffff81307a9b>] dump_stack+0x6b/0xa0
[  557.094892]  [<ffffffff81054a85>] warn_slowpath_common+0x95/0xe0
[  557.094893]  [<ffffffff81054aea>] warn_slowpath_null+0x1a/0x20
[  557.094894]  [<ffffffff810ab7b9>] rcu_sync_func+0xf9/0x150
[  557.094897]  [<ffffffff810b0620>] rcu_process_callbacks+0x290/0x740
[  557.094900]  [<ffffffff810590f3>] __do_softirq+0x113/0x330
[  557.094902]  [<ffffffff810593e5>] irq_exit+0x75/0x80
[  557.094905]  [<ffffffff8163ea16>] smp_apic_timer_interrupt+0x46/0x55
[  557.094907]  [<ffffffff8163d069>] apic_timer_interrupt+0x89/0x90
[  557.094911]  <EOI>  [<ffffffff81530cbc>] ? cpuidle_enter_state+0x14c/0x2e0
[  557.094912]  [<ffffffff81530cb1>] ? cpuidle_enter_state+0x141/0x2e0
[  557.094914]  [<ffffffff81530e67>] cpuidle_enter+0x17/0x20
[  557.094916]  [<ffffffff81096d70>] cpu_startup_entry+0x310/0x370
[  557.094919]  [<ffffffff810c2aac>] ? clockevents_config_and_register+0x2c/0x40
[  557.094923]  [<ffffffff81035359>] start_secondary+0xf9/0x100
[  557.094924] ---[ end trace fc087420ac1d8f17 ]---
[  557.094925] XXX: ffff880473326b98 gp=2 cnt=-1 cb=1

This is: 
if (WARN_ON(rsp->gp_count < 0)) xxx(rsp);

I'd say the warnings are pretty consistent between 4.4.14 and 4.4.20. 
Another run on the same kernel (no reboots between tries) produced the following: 

[  700.981151] ------------[ cut here ]------------
[  700.986643] WARNING: CPU: 3 PID: 2428 at kernel/rcu/sync.c:111 rcu_sync_enter+0x148/0x1a0()
[  700.996541] Modules linked in: xt_state act_police cls_basic sch_ingress veth rbd libceph openvswitch nf_defrag_ipv6 nf_nat_ftp nf_conntrack_ftp xt_owner iptable_mangle xt_nat iptable_nat nf_nat_ipv4 nf_nat xt_CT iptable_raw nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ip6table_filter ip6_tables rdma_ucm ib_ucm ib_uverbs rdma_cm iw_cm dm_mirror dm_region_hash dm_log ib_umad ib_ipoib ib_cm ib_sa ib_mad ib_core ib_addr ipv6 x86_pkg_temp_thermal kvm_intel kvm irqbypass crc32_pclmul ixgbe mdio ipmi_devintf ipmi_si ipmi_msghandler igb i2c_algo_bit sb_edac edac_core i2c_i801 lpc_ich mfd_core ioatdma dca shpchp dm_thin_pool dm_bio_prison dm_persistent_data dm_bufio
[  701.071212] CPU: 3 PID: 2428 Comm: fsfreeze Tainted: G        W       4.4.20-clouder1 #9
[  701.080816] Hardware name: Supermicro X9DRW/X9DRW, BIOS 1.0b 10/11/2012
[  701.088530]  0000000000000000 ffff88006b5efcf8 ffffffff81307a9b 000000000000097c
[  701.097711]  0000000000000000 0000000000000000 000000000000006f ffff88006b5efd38
[  701.106870]  ffffffff81054a85 ffff880079ab9310 ffff880079ab9308 ffff880079ab9310
[  701.116039] Call Trace:
[  701.119084]  [<ffffffff81307a9b>] dump_stack+0x6b/0xa0
[  701.125149]  [<ffffffff81054a85>] warn_slowpath_common+0x95/0xe0
[  701.132184]  [<ffffffff81054aea>] warn_slowpath_null+0x1a/0x20
[  701.139027]  [<ffffffff810ab9a8>] rcu_sync_enter+0x148/0x1a0
[  701.145674]  [<ffffffff8107b0b2>] ? __might_sleep+0x52/0x90
[  701.152221]  [<ffffffff8109c9be>] percpu_down_write+0x1e/0xf0
[  701.158962]  [<ffffffff8107b0b2>] ? __might_sleep+0x52/0x90
[  701.165511]  [<ffffffff811a868b>] freeze_super+0xab/0x1b0
[  701.171867]  [<ffffffff811b7c0d>] do_vfs_ioctl+0x29d/0x560
[  701.178319]  [<ffffffff811aae7e>] ? SYSC_newfstat+0x2e/0x40
[  701.184959]  [<ffffffff811b7f62>] SyS_ioctl+0x92/0xa0
[  701.190926]  [<ffffffff8163c357>] entry_SYSCALL_64_fastpath+0x12/0x6a
[  701.198446] ---[ end trace fc087420ac1d8f18 ]---
[  701.203915] XXX: ffff880079ab9308 gp=0 cnt=-1 cb=0

[  701.209624] ------------[ cut here ]------------
[  701.215193] WARNING: CPU: 3 PID: 2428 at kernel/rcu/sync.c:118 rcu_sync_enter+0x7e/0x1a0()
[  701.224996] Modules linked in: xt_state act_police cls_basic sch_ingress veth rbd libceph openvswitch nf_defrag_ipv6 nf_nat_ftp nf_conntrack_ftp xt_owner iptable_mangle xt_nat iptable_nat nf_nat_ipv4 nf_nat xt_CT iptable_raw nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ip6table_filter ip6_tables rdma_ucm ib_ucm ib_uverbs rdma_cm iw_cm dm_mirror dm_region_hash dm_log ib_umad ib_ipoib ib_cm ib_sa ib_mad ib_core ib_addr ipv6 x86_pkg_temp_thermal kvm_intel kvm irqbypass crc32_pclmul ixgbe mdio ipmi_devintf ipmi_si ipmi_msghandler igb i2c_algo_bit sb_edac edac_core i2c_i801 lpc_ich mfd_core ioatdma dca shpchp dm_thin_pool dm_bio_prison dm_persistent_data dm_bufio
[  701.299721] CPU: 3 PID: 2428 Comm: fsfreeze Tainted: G        W       4.4.20-clouder1 #9
[  701.309325] Hardware name: Supermicro X9DRW/X9DRW, BIOS 1.0b 10/11/2012
[  701.317038]  0000000000000000 ffff88006b5efcf8 ffffffff81307a9b 000000000000097c
[  701.326203]  0000000000000000 0000000000000000 0000000000000076 ffff88006b5efd38
[  701.335382]  ffffffff81054a85 0000000000000007 ffff880079ab9308 ffff880079ab9310
[  701.344556] Call Trace:
[  701.347610]  [<ffffffff81307a9b>] dump_stack+0x6b/0xa0
[  701.353674]  [<ffffffff81054a85>] warn_slowpath_common+0x95/0xe0
[  701.360705]  [<ffffffff81054aea>] warn_slowpath_null+0x1a/0x20
[  701.367543]  [<ffffffff810ab8de>] rcu_sync_enter+0x7e/0x1a0
[  701.374090]  [<ffffffff8107b0b2>] ? __might_sleep+0x52/0x90
[  701.380637]  [<ffffffff8109c9be>] percpu_down_write+0x1e/0xf0
[  701.387379]  [<ffffffff8107b0b2>] ? __might_sleep+0x52/0x90
[  701.393928]  [<ffffffff811a868b>] freeze_super+0xab/0x1b0
[  701.400281]  [<ffffffff811b7c0d>] do_vfs_ioctl+0x29d/0x560
[  701.406737]  [<ffffffff811aae7e>] ? SYSC_newfstat+0x2e/0x40
[  701.413283]  [<ffffffff811b7f62>] SyS_ioctl+0x92/0xa0
[  701.419249]  [<ffffffff8163c357>] entry_SYSCALL_64_fastpath+0x12/0x6a
[  701.426773] ---[ end trace fc087420ac1d8f19 ]---
[  701.432273] XXX: ffff880079ab9308 gp=1 cnt=0 cb=0

Regards, 
Nikolay 


> 
> Oleg.
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ