[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <52B268E5.4090008@huawei.com>
Date: Thu, 19 Dec 2013 11:32:53 +0800
From: Ding Tianhong <dingtianhong@...wei.com>
To: Hannes Frederic Sowa <hannes@...essinduktion.org>,
Eric Dumazet <eric.dumazet@...il.com>,
David Miller <davem@...emloft.net>, <yoshfuji@...ux-ipv6.org>,
<joe@...ches.com>, <vfalico@...hat.com>, <netdev@...r.kernel.org>
Subject: Re: [PATCH net] net: neighbour: add neighbour dead check for neigh_timer_handler()
On 2013/12/18 23:46, Hannes Frederic Sowa wrote:
> On Wed, Dec 18, 2013 at 11:12:51PM +0800, Ding Tianhong wrote:
>> δΊ 2013/12/18 22:27, Hannes Frederic Sowa ει:
>>> On Wed, Dec 18, 2013 at 07:57:40PM +0800, Ding Tianhong wrote:
>>>> yes, I cannot repruduce the bug again.
>>>
>>> Hmm, it actually seems hard to hit even if the race happens. Even if slab
>>> poisoning is active it would only hit if ->solicit would be called again,
>>> because that is the only pointer dereference directly used in the old memory.
>>>
>>> neigh_alloc allocates memory with kzalloc, so it would null out that memory,
>>> so the race would not only have to race with kfree, the memory needs to be
>>> reallocated in the mean time.
>>>
>>> I would suggest adding some poisoning manually in neigh_release before kfree
>>> and check for this in all periodic called functions. Maybe we can see it
>>> again?
>>>
>> Great, thanks for your help, I think make the neigh_release not kfree neighbour until
>> the timer is over is a clear way to fix this, maybe you could another idea, glad to
>> hear your opinion.
>
> But I don't suggest this as an fix, just as a help for debugging this issue.
>
> Maybe you could also store the _RET_IP_ in the to be freed struct neighbour
> (just before kfree) and thus have it available in case the machine panics (or
> simply print it with printk).
>
> Maybe it would make sense to use kmem_cache_create and kmem_cache_alloc for
> struct neighs so we can better utilize the slub debugging features.
>
> Greetings,
>
> Hannes
>
>
Good idea, I will try it, but I still could not make it happen again.
I can repeat the process that the problem happed:
(1).A: xxx.xxx.xxx.83, B:xxx.xxx.xxx.84
(2). down A, B instead of A, ifconfig B xxx.xxx.xxx.83
(3).use "/sbin/arping -I %s -U -b -c 1 -w 4 %s "to tell vlan B is xxx.xxx.xxx.83,
(4). then it happened.
Regards
Ding
> .
>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists