linux-kernel - Re: [bug] __nf_ct_refresh_acct(): WARNING: at lib/list_debug.c:30 __list

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <4A3A0D45.8090806@trash.net>
Date:	Thu, 18 Jun 2009 11:47:49 +0200
From:	Patrick McHardy <kaber@...sh.net>
To:	Eric Dumazet <dada1@...mosbay.com>
CC:	Ingo Molnar <mingo@...e.hu>, David Miller <davem@...emloft.net>,
	Thomas Gleixner <tglx@...utronix.de>,
	torvalds@...ux-foundation.org, akpm@...ux-foundation.org,
	netdev@...r.kernel.org, linux-kernel@...r.kernel.org,
	Pablo Neira Ayuso <pablo@...filter.org>
Subject: Re: [bug] __nf_ct_refresh_acct(): WARNING: at lib/list_debug.c:30
 __list_add+0x7d/0xad()

Eric Dumazet wrote:
> Ingo Molnar a écrit :
>> * Ingo Molnar <mingo@...e.hu> wrote:
>>
>>>> IPS_CONFIRMED_BIT is set under nf_conntrack_lock (in 
>>>> __nf_conntrack_confirm()), we probably want to add a 
>>>> synchronisation under ct->lock as well, or 
>>>> __nf_ct_refresh_acct() could set ct->timeout.expires to 
>>>> extra_jiffies, while a different cpu could confirm the 
>>>> conntrack.
>>>>
>>>> Following patch as RFC
>>> A quick test suggests that it seems to works here - thanks Eric!
>> a test-box still triggered this crash overnight:
>>
>> [  252.433471] ------------[ cut here ]------------
>> [  252.436031] WARNING: at lib/list_debug.c:30 __list_add+0x95/0xa0()
>> [  252.436031] Hardware name: System Product Name
>> [  252.436031] list_add corruption. prev->next should be next (ffff88003fa1d460), but was ffffffff82e560a0. (prev=ffff880003b458c0).
>> [  252.436031] Pid: 7348, comm: ssh Tainted: G        W  2.6.30-tip #54604
>> [  252.436031] Call Trace:
>> [  252.436031]  [<ffffffff8149eda5>] ? __list_add+0x95/0xa0
>> [  252.436031]  [<ffffffff8105c79b>] warn_slowpath_common+0x7b/0xd0
>> [  252.436031]  [<ffffffff8105c851>] warn_slowpath_fmt+0x41/0x50
>> [  252.436031]  [<ffffffff8149eda5>] __list_add+0x95/0xa0
>> [  252.436031]  [<ffffffff8106937e>] internal_add_timer+0x9e/0xf0
>> [  252.436031]  [<ffffffff8106a5ef>] mod_timer+0x10f/0x160
>> [  252.436031]  [<ffffffff8106a658>] add_timer+0x18/0x20
>> [  252.436031]  [<ffffffff81f6e42a>] __nf_conntrack_confirm+0x1da/0x3c0
>> [  252.436031]  [<ffffffff81fdb8dd>] ipv4_confirm+0xfd/0x160
>> [  252.436031]  [<ffffffff81f6a130>] nf_iterate+0x70/0xd0
>> [  252.436031]  [<ffffffff81f99180>] ? ip_finish_output+0x0/0x380
>> [  252.436031]  [<ffffffff81f6a4c4>] nf_hook_slow+0xe4/0x160
>> [  252.436031]  [<ffffffff81f99180>] ? ip_finish_output+0x0/0x380
>> [  252.436031]  [<ffffffff81f995f5>] ip_output+0xf5/0x110
>> [  252.436031]  [<ffffffff81f96b05>] ip_local_out+0x25/0x40
>> [  252.436031]  [<ffffffff81f97434>] ip_queue_xmit+0x224/0x420
>> [  252.436031]  [<ffffffff81111118>] ? __kmalloc_node_track_caller+0xd8/0x1f0
>> [  252.436031]  [<ffffffff8108df19>] ? trace_hardirqs_on_caller+0x29/0x1f0
>> [  252.436031]  [<ffffffff81fae0dd>] tcp_transmit_skb+0x50d/0x7e0
>> [  252.436031]  [<ffffffff81faf547>] tcp_connect+0x3c7/0x500
>> [  252.436031]  [<ffffffff81fb4693>] tcp_v4_connect+0x433/0x520
>> [  252.436031]  [<ffffffff81fc446f>] inet_stream_connect+0x22f/0x2d0
>> [  252.436031]  [<ffffffff81118719>] ? fget_light+0x19/0x110
>> [  252.436031]  [<ffffffff81f294b8>] sys_connect+0xb8/0xd0
>> [  252.436031]  [<ffffffff8100ccf9>] ? retint_swapgs+0x13/0x1b
>> [  252.436031]  [<ffffffff8108df19>] ? trace_hardirqs_on_caller+0x29/0x1f0
>> [  252.436031]  [<ffffffff8217a49f>] ? trace_hardirqs_on_thunk+0x3a/0x3f
>> [  252.436031]  [<ffffffff8100c252>] system_call_fastpath+0x16/0x1b
>> [  252.436031] ---[ end trace a7919e7f17c0a73d ]---
>>
>> With your patch (repeated below) applied. Is Patrick's alternative 
>> patch supposed to fix something that yours does not?
> 
> Hmm, not really, Patrick patch should fix same problem, but without extra locking
> as mine.
> 
> This new stack trace is somewhat different, as corruption is detected in the add_timer()
> call in __nf_conntrack_confirm()
> 
> So there is another problem. CCed Pablo Neira Ayuso who added some stuff
> in netfilter and timeout logic recently.

That timeout logic shouldn't be relevant in this case, its only
activated when netlink event delivery is used, a userspace process
is actually listening and it has the socket marked for reliable
delivery.

I think its still the same problem, the detection is just noticed
at a different point.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/