lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Fri, 25 Mar 2022 09:59:33 +0800 From: Yunsheng Lin <linyunsheng@...wei.com> To: Jean-Philippe Brucker <jean-philippe@...aro.org>, <ilias.apalodimas@...aro.org>, <hawk@...nel.org> CC: <davem@...emloft.net>, <kuba@...nel.org>, <pabeni@...hat.com>, <netdev@...r.kernel.org>, Alexander Duyck <alexanderduyck@...com> Subject: Re: [PATCH net] skbuff: disable coalescing for page_pool recycling +cc Alexander Duyck On 2022/3/25 1:29, Jean-Philippe Brucker wrote: > Fix a use-after-free when using page_pool with page fragments. We > encountered this problem during normal RX in the hns3 driver: > > (1) Initially we have three descriptors in the RX queue. The first one > allocates PAGE1 through page_pool, and the other two allocate one > half of PAGE2 each. Page references look like this: > > RX_BD1 _______ PAGE1 > RX_BD2 _______ PAGE2 > RX_BD3 _________/ > > (2) Handle RX on the first descriptor. Allocate SKB1, eventually added > to the receive queue by tcp_queue_rcv(). > > (3) Handle RX on the second descriptor. Allocate SKB2 and pass it to > netif_receive_skb(): > > netif_receive_skb(SKB2) > ip_rcv(SKB2) > SKB3 = skb_clone(SKB2) > > SKB2 and SKB3 share a reference to PAGE2 through > skb_shinfo()->dataref. The other ref to PAGE2 is still held by > RX_BD3: > > SKB2 ---+- PAGE2 > SKB3 __/ / > RX_BD3 _________/ > > (3b) Now while handling TCP, coalesce SKB3 with SKB1: > > tcp_v4_rcv(SKB3) > tcp_try_coalesce(to=SKB1, from=SKB3) // succeeds > kfree_skb_partial(SKB3) > skb_release_data(SKB3) // drops one dataref > > SKB1 _____ PAGE1 > \____ > SKB2 _____ PAGE2 > / > RX_BD3 _________/ > > The problem is here: both SKB1 and SKB2 point to PAGE2 but SKB1 does > not actually hold a reference to PAGE2. it seems the SKB1 *does* hold a reference to PAGE2 by calling __skb_frag_ref(), which increments the page->_refcount instead of incrementing pp_frag_count, as skb_cloned(SKB3) is true and __skb_frag_ref() does not handle page pool case: https://elixir.bootlin.com/linux/v5.17-rc1/source/net/core/skbuff.c#L5308 Without coalescing, when > releasing both SKB2 and SKB3, a single reference to PAGE2 would be > dropped. Now when releasing SKB1 and SKB2, two references to PAGE2 > will be dropped, resulting in underflow. > > (3c) Drop SKB2: > > af_packet_rcv(SKB2) > consume_skb(SKB2) > skb_release_data(SKB2) // drops second dataref > page_pool_return_skb_page(PAGE2) // drops one pp_frag_count > > SKB1 _____ PAGE1 > \____ > PAGE2 > / > RX_BD3 _________/ > > (4) Userspace calls recvmsg() > Copies SKB1 and releases it. Since SKB3 was coalesced with SKB1, we > release the SKB3 page as well: > > tcp_eat_recv_skb(SKB1) > skb_release_data(SKB1) > page_pool_return_skb_page(PAGE1) > page_pool_return_skb_page(PAGE2) // drops second pp_frag_count > > (5) PAGE2 is freed, but the third RX descriptor was still using it! > In our case this causes IOMMU faults, but it would silently corrupt > memory if the IOMMU was disabled. > > A proper implementation would probably take another reference from the > page_pool at step (3b), but that seems too complicated for a fix. Keep > it simple for now, prevent coalescing for page_pool users. > > Fixes: 53e0961da1c7 ("page_pool: add frag page recycling support in page pool") > Signed-off-by: Jean-Philippe Brucker <jean-philippe@...aro.org> > --- > > The Fixes tag says frag page recycling but I'm not sure, does it not > also affect full page recycling? Coalescing is one case, are there > other places where we move page fragments between skbuffs? I'm still > too unfamiliar with this code to figure it out. > > Previously discussed here: > https://lore.kernel.org/netdev/YfFbDivUPbpWjh%2Fm@myrica/ > --- > net/core/skbuff.c | 8 ++++---- > 1 file changed, 4 insertions(+), 4 deletions(-) > > diff --git a/net/core/skbuff.c b/net/core/skbuff.c > index 10bde7c6db44..431f7ce88049 100644 > --- a/net/core/skbuff.c > +++ b/net/core/skbuff.c > @@ -5276,11 +5276,11 @@ bool skb_try_coalesce(struct sk_buff *to, struct sk_buff *from, > if (skb_cloned(to)) > return false; > > - /* The page pool signature of struct page will eventually figure out > - * which pages can be recycled or not but for now let's prohibit slab > - * allocated and page_pool allocated SKBs from being coalesced. > + /* Prohibit adding page_pool allocated SKBs, because that would require > + * transferring the page fragment reference. For now let's also prohibit > + * slab allocated and page_pool allocated SKBs from being coalesced. > */ > - if (to->pp_recycle != from->pp_recycle) > + if (to->pp_recycle || from->pp_recycle) > return false; > > if (len <= skb_tailroom(to)) { >
Powered by blists - more mailing lists