[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20230503184702.65dceb90@kernel.org>
Date: Wed, 3 May 2023 18:47:02 -0700
From: Jakub Kicinski <kuba@...nel.org>
To: Jesper Dangaard Brouer <jbrouer@...hat.com>
Cc: Toke Høiland-Jørgensen <toke@...hat.com>,
brouer@...hat.com, Ilias Apalodimas <ilias.apalodimas@...aro.org>,
netdev@...r.kernel.org, Eric Dumazet <eric.dumazet@...il.com>,
linux-mm@...ck.org, Mel Gorman <mgorman@...hsingularity.net>,
lorenzo@...nel.org, linyunsheng@...wei.com, bpf@...r.kernel.org, "David S.
Miller" <davem@...emloft.net>, Paolo Abeni <pabeni@...hat.com>, Andrew
Morton <akpm@...ux-foundation.org>, willy@...radead.org
Subject: Re: [PATCH RFC net-next/mm V3 1/2] page_pool: Remove workqueue in
new shutdown scheme
On Wed, 3 May 2023 17:49:34 +0200 Jesper Dangaard Brouer wrote:
> On 03/05/2023 13.18, Toke Høiland-Jørgensen wrote:
> > Jakub Kicinski <kuba@...nel.org> writes:
> >> We can remove the warning without removing the entire delayed freeing
> >> scheme. I definitely like the SHUTDOWN flag and patch 2 but I'm a bit
> >> less clear on why the complexity of datapath freeing is justified.
> >> Can you explain?
> >
> > You mean just let the workqueue keep rescheduling itself every minute
> > for the (potentially) hours that skbs will stick around? Seems a bit
> > wasteful, doesn't it? :)
>
> I agree that this workqueue that keeps rescheduling is wasteful.
> It actually reschedules every second, even more wasteful.
> NIC drivers will have many HW RX-queues, with separate PP instances,
> that each can start a workqueue that resched every sec.
There's a lot of work items flying around on a working system.
I don't think the rare (and very cheap) PP check work is going
to move the needle. You should see how many work items DIM schedules :(
I'd think that potentially having the extra memory pinned is much more
of an issue than a periodic check, and that does not go away by
changing the checking mechanism.
> Eric have convinced me that SKBs can "stick around" for longer than the
> assumptions in PP. The old PP assumptions came from XDP-return path.
> It is time to cleanup.
I see. Regardless - should we have some count/statistic for the number
of "zombie" page pools sitting around in the system? Could be useful
for debug.
> > We did see an issue where creating and tearing down lots of page pools
> > in a short period of time caused significant slowdowns due to the
> > workqueue mechanism. Lots being "thousands per second". This is possible
> > using the live packet mode of bpf_prog_run() for XDP, which will setup
> > and destroy a page pool for each syscall...
>
> Yes, the XDP live packet mode of bpf_prog_run is IMHO abusing the
> page_pool API. We should fix that somehow, at least the case where live
> packet mode is only injecting a single packet, but still creates a PP
> instance. The PP in live packet mode IMHO only makes sense when
> repeatedly sending packets that gets recycles and are pre-inited by PP.
+1, FWIW, I was surprised that we have a init_callback which sits in
the fastpath exists purely for testing.
> This use of PP does exemplify why is it problematic to keep the workqueue.
>
> I have considered (and could be convinced) delaying the free via
> call_rcu, but it also create an unfortunate backlog of work in the case
> of live packet mode of bpf_prog_run.
Maybe let the pp used by BPF testing be reusable? No real driver will
create thousands of PPs a seconds, that's not sane.
Anyway, you'll choose what you'll choose. I just wanted to cast my vote
for the work queue rather than the tricky lockless release code.
Powered by blists - more mailing lists