netdev - Re: [PATCH net-next v4] skb_expand_head() adjust skb->truesize incorrectly

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <2984f16b-7f20-e72d-1661-b942fdc4ff9b@virtuozzo.com>
Date:   Thu, 2 Sep 2021 10:33:09 +0300
From:   Vasily Averin <vvs@...tuozzo.com>
To:     Eric Dumazet <eric.dumazet@...il.com>,
        Christoph Paasch <christoph.paasch@...il.com>,
        "David S. Miller" <davem@...emloft.net>
Cc:     Hideaki YOSHIFUJI <yoshfuji@...ux-ipv6.org>,
        David Ahern <dsahern@...nel.org>,
        Jakub Kicinski <kuba@...nel.org>,
        netdev <netdev@...r.kernel.org>, linux-kernel@...r.kernel.org,
        kernel@...nvz.org, Alexey Kuznetsov <kuznet@....inr.ac.ru>,
        Julian Wiedmann <jwi@...ux.ibm.com>
Subject: Re: [PATCH net-next v4] skb_expand_head() adjust skb->truesize
 incorrectly

On 9/2/21 10:13 AM, Vasily Averin wrote:
> On 9/2/21 7:48 AM, Eric Dumazet wrote:
>> On 9/1/21 9:32 PM, Eric Dumazet wrote:
>>> I think you missed netem case, in particular
>>> skb_orphan_partial() which I already pointed out.
>>>
>>> You can setup a stack of virtual devices (tunnels),
>>> with a qdisc on them, before ip6_xmit() is finally called...
>>>
>>> Socket might have been closed already.
>>>
>>> To test your patch, you could force a skb_orphan_partial() at the beginning
>>> of skb_expand_head() (extending code coverage)
>>
>> To clarify :
>>
>> It is ok to 'downgrade' an skb->destructor having a ref on sk->sk_wmem_alloc to
>> something owning a ref on sk->refcnt.
>>
>> But the opposite operation (ref on sk->sk_refcnt -->  ref on sk->sk_wmem_alloc) is not safe.
> 
> Could you please explain in more details, since I stil have a completely opposite point of view?
> 
> Every sk referenced in skb have sk_wmem_alloc > 9 
> It is assigned to 1 in sk_alloc and decremented right before last __sk_free(),
> inside  both sk_free() sock_wfree() and __sock_wfree()
> 
> So it is safe to adjust skb->sk->sk_wmem_alloc, 
> because alive skb keeps reference to alive sk and last one keeps sk_wmem_alloc > 0
> 
> So any destructor used sk->sk_refcnt will already have sk_wmem_alloc > 0, 
> because last sock_put() calls sk_free().
> 
> However now I'm not sure in reversed direction.
> skb_set_owner_w() check !sk_fullsock(sk) and call sock_hold(sk);
> If sk->sk_refcnt can be 0 here (i.e. after execution of old destructor inside skb_orphan) 
> -- it can be trigger pointed problem:
> "refcount_add() will trigger a warning (panic under KASAN)".
> 
> Could you please explain where I'm wrong?

To clarify:
I'm agree it is unsafe  to call on alive skb:
skb_orphan(skb)
adjust(skb_>sk->sk_wmem_alloc)

becasue 2 reasone:
1) old destructor can decrease sk_vmem_alloc to zero and free sk
2) becasue old destructor if !sk_fullsock(sk) can call sock_out and release last sk->sk_refcnt reference.
  in this case sock_hold() will trigger warning.

1) can be handled, we can adjust(sk_wmem_alloc) before skb_orphan()
but I badly understand how to handle 2nd case.

Thank you,
	Vasily Averin