lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <7331d6d3f9044e386e425e89b1fc32d60b046cf3.camel@gmail.com>
Date:   Tue, 04 Apr 2023 08:51:18 -0700
From:   Alexander H Duyck <alexander.duyck@...il.com>
To:     Liang Chen <liangchen.linux@...il.com>,
        ilias.apalodimas@...aro.org, hawk@...nel.org
Cc:     davem@...emloft.net, edumazet@...gle.com, kuba@...nel.org,
        pabeni@...hat.com, netdev@...r.kernel.org
Subject: Re: [PATCH] skbuff: Fix a race between coalescing and releasing SKBs

On Tue, 2023-04-04 at 15:47 +0800, Liang Chen wrote:
> Commit 1effe8ca4e34 allowed coalescing to proceed with non page pool page
> and page pool page when @from is cloned, i.e.
> 
> to->pp_recycle    --> false
> from->pp_recycle  --> true
> skb_cloned(from)  --> true
> 
> However, it actually requires skb_cloned(@from) to hold true until
> coalescing finishes in this situation. If the other cloned SKB is
> released while the merging is in process, from_shinfo->nr_frags will be
> set to 0 toward the end of the function, causing the increment of frag
> page _refcount to be unexpectedly skipped resulting in inconsistent
> reference counts. Later when SKB(@to) is released, it frees the page
> directly even though the page pool page is still in use, leading to
> use-after-free or double-free errors. So it should be prohibitted.
> 
> The double-free error message below prompted us to investigate:
> BUG: Bad page state in process swapper/1  pfn:0e0d1
> page:00000000c6548b28 refcount:-1 mapcount:0 mapping:0000000000000000
> index:0x2 pfn:0xe0d1
> flags: 0xfffffc0000000(node=0|zone=1|lastcpupid=0x1fffff)
> raw: 000fffffc0000000 0000000000000000 ffffffff00000101 0000000000000000
> raw: 0000000000000002 0000000000000000 ffffffffffffffff 0000000000000000
> page dumped because: nonzero _refcount
> 
> CPU: 1 PID: 0 Comm: swapper/1 Tainted: G            E      6.2.0+
> Call Trace:
>  <IRQ>
> dump_stack_lvl+0x32/0x50
> bad_page+0x69/0xf0
> free_pcp_prepare+0x260/0x2f0
> free_unref_page+0x20/0x1c0
> skb_release_data+0x10b/0x1a0
> napi_consume_skb+0x56/0x150
> net_rx_action+0xf0/0x350
> ? __napi_schedule+0x79/0x90
> __do_softirq+0xc8/0x2b1
> __irq_exit_rcu+0xb9/0xf0
> common_interrupt+0x82/0xa0
> </IRQ>
> <TASK>
> asm_common_interrupt+0x22/0x40
> RIP: 0010:default_idle+0xb/0x20
> 
> Signed-off-by: Liang Chen <liangchen.linux@...il.com>
> ---
>  net/core/skbuff.c | 17 +++++++----------
>  1 file changed, 7 insertions(+), 10 deletions(-)
> 
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index 050a875d09c5..9be23ece5f03 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -5598,17 +5598,14 @@ bool skb_try_coalesce(struct sk_buff *to, struct sk_buff *from,
>  		return false;
>  
>  	/* In general, avoid mixing slab allocated and page_pool allocated
> -	 * pages within the same SKB. However when @to is not pp_recycle and
> -	 * @from is cloned, we can transition frag pages from page_pool to
> -	 * reference counted.
> -	 *
> -	 * On the other hand, don't allow coalescing two pp_recycle SKBs if
> -	 * @from is cloned, in case the SKB is using page_pool fragment
> -	 * references (PP_FLAG_PAGE_FRAG). Since we only take full page
> -	 * references for cloned SKBs at the moment that would result in
> -	 * inconsistent reference counts.
> +	 * pages within the same SKB. However don't allow coalescing two
> +	 * pp_recycle SKBs if @from is cloned, in case the SKB is using
> +	 * page_pool fragment references (PP_FLAG_PAGE_FRAG). Since we only
> +	 * take full page references for cloned SKBs at the moment that would
> +	 * result in inconsistent reference counts.
>  	 */
> -	if (to->pp_recycle != (from->pp_recycle && !skb_cloned(from)))
> +	if ((to->pp_recycle != from->pp_recycle)
> +		|| (from->pp_recycle && skb_cloned(from)))
>  		return false;
>  
>  	if (len <= skb_tailroom(to)) {

I'm not quite sure I agree with the fix. Couldn't we just modify the
check further down that does:

        if (!skb_cloned(from))
                from_shinfo->nr_frags = 0;

And instead just make that:
	if (!skb->cloned || (!skb_cloned(from) && !from->pp_recycle))
                from_shinfo->nr_frags = 0;

With that we would retain the existing behavior and in the case of
cloned from frames we would take the references and let the original
from skb freed to take care of pulling the pages from the page pool.


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ