netdev - Re: [RFC net v7] net: skb_expand_head() adjust skb->truesize incorrectly

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <5ed3cdb8-0a72-9dfb-ecdd-d59411f63653@virtuozzo.com>
Date:   Tue, 21 Sep 2021 09:36:26 +0300
From:   Vasily Averin <vvs@...tuozzo.com>
To:     Jakub Kicinski <kuba@...nel.org>,
        Eric Dumazet <eric.dumazet@...il.com>
Cc:     netdev@...r.kernel.org,
        Christoph Paasch <christoph.paasch@...il.com>,
        Hao Sun <sunhao.th@...il.com>, kernel@...nvz.org
Subject: Re: [RFC net v7] net: skb_expand_head() adjust skb->truesize
 incorrectly

On 9/21/21 3:39 AM, Jakub Kicinski wrote:
> On Tue, 21 Sep 2021 00:41:15 +0300 Vasily Averin wrote:
>>> Thanks for taking a look. I would prefer not to bake any ideas about
>>> the skb's function into generic functions. Enumerating every destructor
>>> callback in generic code is impossible (technically so, since the code
>>> may reside in modules).
>>>
>>> Let me think about it. Perhaps we can extend sock callbacks with
>>> skb_sock_inherit, and skb_adjust_trusize? That'd transfer the onus of
>>> handling the adjustments done on splitting to the protocols. I'll see
>>> if that's feasible unless someone can immediately call this path
>>> ghastly.  
>>
>> This is similar to Alexey Kuznetsov's suggestion for me, 
>> see https://lkml.org/lkml/2021/8/27/460
> 
> Interesting, I wasn't thinking of keeping the ops pointer in every skb.
> 
>> However I think we can do it later,
>> right now we need to fix somehow broken skb_expand_head(),
>> please take look at v8.
> 
> I think v8 still has the issue that Eric was explaining over and over.

I've missed sock_edemux check, however I do not see any other issues.
Could you please explain what problem you talking about?

Eric said:
"it is not valid to call skb_set_owner_w(skb, sk) on all kind of sockets",
because socket might have been closed already.

Before the call we have old skb with sk reference, so sk is not closed yet
and have nonzero sk->sk_wmem_alloc.

During the call, skb_set_owner_w calls skb_orphan that calls old skb destructor.
Yes, it can decrement last sk reference and release the socket, 
and I think this is exactly the problem that Eric was pointing out: 
now sk access is unsafe.

However it can be prevented in at least 2 ways:
a) clone old skb and call skb_set_owner_w(nskb, sk) before skb_consume(oskb).
   In this case, skb_orphan does not call old destructor, because at this point
   nskb->sk = NULL and nskb->destructor = NULL, and sk reference is kept by oskb.  
   This is widely used in current code (ppp_xmit, ipip6_tunnel_xmit, 
   ip_vs_prepare_tunneled_skb and so on).
   This is used in v8 too.
b) Alternatively, extra refs on sk->sk_wmem_alloc and sk->sk_refcnt can be
   carefully taken before skb_set_owner_w() call. These references will not allow
   to release sk during old destructor's execution. 
   This was used in v6, and I think this should works correctly too. 

Could you please explain where I am wrong?
Do you talking about some other issue perhaps?

Thank you,
	Vasily Averin