[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <6f2e6b62-42dc-f2b3-3758-29f5338f1185@gmail.com>
Date: Thu, 20 Sep 2018 15:29:30 -0700
From: Eric Dumazet <eric.dumazet@...il.com>
To: Eric Dumazet <eric.dumazet@...il.com>, stranche@...eaurora.org
Cc: netdev@...r.kernel.org, steffen.klassert@...unet.com
Subject: Re: [PATCH net] af_key: free SKBs under RCU protection
On 09/20/2018 03:10 PM, Eric Dumazet wrote:
>
>
> On 09/20/2018 12:25 PM, stranche@...eaurora.org wrote:
>>>
>>> I do not believe the changelog or the patch makes sense.
>>>
>>> Having skb still referencing a socket prevents this socket being released.
>>>
>>> If you think about it, what would prevent the freeing happening
>>> _before_ the rcu_read_lock() in pfkey_broadcast() ?
>>>
>>> Maybe the correct fix is that pfkey_broadcast_one() should ensure the
>>> socket is still valid.
>>>
>>> I would suggest something like :
>>>
>>> diff --git a/net/key/af_key.c b/net/key/af_key.c
>>> index
>>> 9d61266526e767770d9a1ce184ac8cdd59de309a..5ce309d020dda5e46e4426c4a639bfb551e2260d
>>> 100644
>>> --- a/net/key/af_key.c
>>> +++ b/net/key/af_key.c
>>> @@ -201,7 +201,9 @@ static int pfkey_broadcast_one(struct sk_buff
>>> *skb, struct sk_buff **skb2,
>>> {
>>> int err = -ENOBUFS;
>>>
>>> - sock_hold(sk);
>>> + if (!refcount_inc_not_zero(&sk->sk_refcnt))
>>> + return -ENOENT;
>>> +
>>> if (*skb2 == NULL) {
>>> if (refcount_read(&skb->users) != 1) {
>>> *skb2 = skb_clone(skb, allocation);
>>
>> Hi Eric,
>>
>> I'm not sure that the socket getting freed before the rcu_read_lock() would
>> be an issue, since then it would no longer be in the net_pkey->table that
>> we loop through (since we call pfkey_remove() from pfkey_relase()). Because of
>> that, all the sockets processed in pfkey_broadcast_one() have valid refcounts,
>> so checking for zero there doesn't prevent the crash that I'm seeing.
>>
>> However, after going over the call flow again, I see that the actual problem
>> occurs because of pfkey_broadcast_one(). Specifically, because of this check:
>>
>> if (*skb2 == NULL) {
>> if (refcount_read(&skb->users) != 1) {
>> *skb2 = skb_clone(skb, allocation);
>> } else {
>> *skb2 = skb;
>> refcount_inc(&skb->users);
>> }
>> }
>>
>> Since we always pass a freshly cloned SKB to this function, skb->users is
>> always 1, and skb2 just becomes skb. We then set skb2 (and thus skb) to
>> belong to the socket.
>>
>> If the socket we queue skb2 to frees this SKB (thereby decrementing its
>> refcount to 1) and the socket is freed before pfkey_broadcast() can
>> execute the kfree_skb(skb) on line 284, we will then attempt to run
>> sock_rfree() on an SKB with a dangling reference to this socket.
>>
>> Perhaps a cleaner solution here is to always clone the SKB in
>> pfkey_broadcast_one(). That will ensure that the two kfree_skb() calls
>> in pfkey_broadcast() will never be passed an SKB with sock_rfree() as
>> its destructor, and we can avoid this race condition.
>
> As long as one skb has sock_rfree has its destructor, the socket attached to
> this skb can not be released. There is no race here.
>
> Note that skb_clone() does not propagate the destructor.
>
> The issue here is that in the rcu lookup, we can find a socket that has been
> dismantled, with a 0 refcount.
>
> We must not use sock_hold() in this case, since we are not sure the socket refcount is not already 0.
>
> pfkey_broadcast() and pfkey_broadcast_one() violate basic RCU rules.
>
> When in a RCU lookup, one want to increment an object refcount, it needs
> to be extra-careful, as I did in my proposal.
>
> Note that the race could be automatically detected with CONFIG_REFCOUNT_FULL=y
Bug was added in commit 7f6b9dbd5afb ("af_key: locking change")
Powered by blists - more mailing lists