netdev - Re: [net-next v1 PATCH 1/2] xdp: revert forced mem allocator removal for page

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20191110085939.23013f83@carbon>
Date:   Sun, 10 Nov 2019 08:59:39 +0100
From:   Jesper Dangaard Brouer <brouer@...hat.com>
To:     "Jonathan Lemon" <jonathan.lemon@...il.com>
Cc:     "Toke Høiland-Jørgensen" <toke@...hat.com>,
        netdev@...r.kernel.org,
        "Ilias Apalodimas" <ilias.apalodimas@...aro.org>,
        "Saeed Mahameed" <saeedm@...lanox.com>,
        "Matteo Croce" <mcroce@...hat.com>,
        "Lorenzo Bianconi" <lorenzo@...nel.org>,
        "Tariq Toukan" <tariqt@...lanox.com>, brouer@...hat.com
Subject: Re: [net-next v1 PATCH 1/2] xdp: revert forced mem allocator
 removal for page_pool

On Sat, 09 Nov 2019 09:34:50 -0800
"Jonathan Lemon" <jonathan.lemon@...il.com> wrote:

> On 9 Nov 2019, at 8:11, Jesper Dangaard Brouer wrote:
> 
> > On Fri, 08 Nov 2019 11:16:43 -0800
> > "Jonathan Lemon" <jonathan.lemon@...il.com> wrote:
> >  
> >>> diff --git a/net/core/page_pool.c b/net/core/page_pool.c
> >>> index 5bc65587f1c4..226f2eb30418 100644
> >>> --- a/net/core/page_pool.c
> >>> +++ b/net/core/page_pool.c
> >>> @@ -346,7 +346,7 @@ static void __warn_in_flight(struct page_pool
> >>> *pool)
> >>>
> >>>  	distance = _distance(hold_cnt, release_cnt);
> >>>
> >>> -	/* Drivers should fix this, but only problematic when DMA is used */
> >>> +	/* BUG but warn as kernel should crash later */
> >>>  	WARN(1, "Still in-flight pages:%d hold:%u released:%u",
> >>>  	     distance, hold_cnt, release_cnt);  
> >
> > Because this is kept as a WARN, I set pool->ring.queue = NULL later.  
> 
> ... which is also an API violation, reaching into the ring internals.
> I strongly dislike this.

I understand your dislike of reaching into ptr_ring "internals".
But my plan was to add this here, and then in a followup patch move this
pool->ring.queue=NULL into the ptr_ring.

 
> >>>  }
> >>> @@ -360,12 +360,16 @@ void __page_pool_free(struct page_pool *pool)
> >>>  	WARN(pool->alloc.count, "API usage violation");
> >>>  	WARN(!ptr_ring_empty(&pool->ring), "ptr_ring is not empty");
> >>>
> >>> -	/* Can happen due to forced shutdown */
> >>>  	if (!__page_pool_safe_to_destroy(pool))
> >>>  		__warn_in_flight(pool);  
> >>
> >> If it's not safe to destroy, we shouldn't be getting here.  
> >
> > Don't make such assumptions. The API is going to be used by driver
> > developer and they are always a little too creative...  
> 
> If the driver hits this case, the driver has a bug, and it isn't
> safe to continue in any fashion.  The developer needs to fix their
> driver in that case.  (see stmmac code)

The stmmac driver is NOT broken, they simply use page_pool as their
driver level page-cache.  That is exactly what page_pool was designed
for, creating a generic page-cache for drivers to use.  They use this
to simplify their driver.  They don't use the advanced features, which
requires hooking into mem model reg.

> 
> > The page_pool is a separate facility, it is not tied to the
> > xdp_rxq_info memory model.  Some drivers use page_pool directly e.g.
> > drivers/net/ethernet/stmicro/stmmac.  It can easily trigger this case,
> > when some extend that driver.  
> 
> Yes, and I pointed out that the mem_info should likely be completely
> detached from xdp.c since it really has nothing to do with XDP.
> The stmmac driver is actually broken at the moment, as it tries to
> free the pool immediately without a timeout.
> 
> What should be happening is that drivers just call page_pool_destroy(),
> which kicks off the shutdown process if this was the last user ref,
> and delays destruction if packets are in flight.

Sorry, but I'm getting frustrated with you. I've already explained you
(offlist), that the memory model reg/unreg system have been created to
support multiple memory models (even per RX-queue).  We already have
AF_XDP zero copy, but I actually want to keep the flexibility and add
more in the future.

 
> >>>  	ptr_ring_cleanup(&pool->ring, NULL);
> >>>
> >>> +	/* Make sure kernel will crash on use-after-free */
> >>> +	pool->ring.queue = NULL;
> >>> +	pool->alloc.cache[PP_ALLOC_CACHE_SIZE - 1] = NULL;
> >>> +	pool->alloc.count = PP_ALLOC_CACHE_SIZE;  
> >>
> >> The pool is going to be freed.  This is useless code; if we're
> >> really concerned about use-after-free, the correct place for catching
> >> this is with the memory-allocator tools, not scattering things like
> >> this ad-hoc over the codebase.  
> >
> > No, I need this code here, because we kept the above WARN() and didn't
> > change that into a BUG().  It is obviously not a full solution for
> > use-after-free detection.  The memory subsystem have kmemleak to catch
> > this kind of stuff, but nobody runs this in production.  I need this
> > here to catch some obvious runtime cases.  
> 
> The WARN() indicates something went off the rails already.  I really
> don't like half-assed solutions like the above; it may or may not work
> properly.  If it doesn't work properly, then what's the point?

So, you are suggesting to use BUG_ON() instead and crash the kernel
immediately... you do know Linus hates when we do that, right?

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer