lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5ed3cdb8-0a72-9dfb-ecdd-d59411f63653@virtuozzo.com>
Date:   Tue, 21 Sep 2021 09:36:26 +0300
From:   Vasily Averin <vvs@...tuozzo.com>
To:     Jakub Kicinski <kuba@...nel.org>,
        Eric Dumazet <eric.dumazet@...il.com>
Cc:     netdev@...r.kernel.org,
        Christoph Paasch <christoph.paasch@...il.com>,
        Hao Sun <sunhao.th@...il.com>, kernel@...nvz.org
Subject: Re: [RFC net v7] net: skb_expand_head() adjust skb->truesize
 incorrectly

On 9/21/21 3:39 AM, Jakub Kicinski wrote:
> On Tue, 21 Sep 2021 00:41:15 +0300 Vasily Averin wrote:
>>> Thanks for taking a look. I would prefer not to bake any ideas about
>>> the skb's function into generic functions. Enumerating every destructor
>>> callback in generic code is impossible (technically so, since the code
>>> may reside in modules).
>>>
>>> Let me think about it. Perhaps we can extend sock callbacks with
>>> skb_sock_inherit, and skb_adjust_trusize? That'd transfer the onus of
>>> handling the adjustments done on splitting to the protocols. I'll see
>>> if that's feasible unless someone can immediately call this path
>>> ghastly.  
>>
>> This is similar to Alexey Kuznetsov's suggestion for me, 
>> see https://lkml.org/lkml/2021/8/27/460
> 
> Interesting, I wasn't thinking of keeping the ops pointer in every skb.
> 
>> However I think we can do it later,
>> right now we need to fix somehow broken skb_expand_head(),
>> please take look at v8.
> 
> I think v8 still has the issue that Eric was explaining over and over.

I've missed sock_edemux check, however I do not see any other issues.
Could you please explain what problem you talking about?

Eric said:
"it is not valid to call skb_set_owner_w(skb, sk) on all kind of sockets",
because socket might have been closed already.

Before the call we have old skb with sk reference, so sk is not closed yet
and have nonzero sk->sk_wmem_alloc.

During the call, skb_set_owner_w calls skb_orphan that calls old skb destructor.
Yes, it can decrement last sk reference and release the socket, 
and I think this is exactly the problem that Eric was pointing out: 
now sk access is unsafe.

However it can be prevented in at least 2 ways:
a) clone old skb and call skb_set_owner_w(nskb, sk) before skb_consume(oskb).
   In this case, skb_orphan does not call old destructor, because at this point
   nskb->sk = NULL and nskb->destructor = NULL, and sk reference is kept by oskb.  
   This is widely used in current code (ppp_xmit, ipip6_tunnel_xmit, 
   ip_vs_prepare_tunneled_skb and so on).
   This is used in v8 too.
b) Alternatively, extra refs on sk->sk_wmem_alloc and sk->sk_refcnt can be
   carefully taken before skb_set_owner_w() call. These references will not allow
   to release sk during old destructor's execution. 
   This was used in v6, and I think this should works correctly too. 

Could you please explain where I am wrong?
Do you talking about some other issue perhaps?

Thank you,
	Vasily Averin

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ