netdev - Re: [net-next v3 00/10] page_pool: Add page

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <CALALjgwMnisAeNzwtGmNGzqAuWZGGtBpT2NWX1TcYZaPFvAixw@mail.gmail.com>
Date:   Thu, 3 Feb 2022 11:31:04 -0800
From:   Joe Damato <jdamato@...tly.com>
To:     Tariq Toukan <ttoukan.linux@...il.com>
Cc:     Jesper Dangaard Brouer <jbrouer@...hat.com>,
        netdev@...r.kernel.org, kuba@...nel.org,
        ilias.apalodimas@...aro.org, davem@...emloft.net, hawk@...nel.org,
        Saeed Mahameed <saeed@...nel.org>, brouer@...hat.com
Subject: Re: [net-next v3 00/10] page_pool: Add page_pool stat counters

On Thu, Feb 3, 2022 at 11:21 AM Tariq Toukan <ttoukan.linux@...il.com> wrote:
>
>
>
> On 2/2/2022 7:30 PM, Joe Damato wrote:
> > On Wed, Feb 2, 2022 at 6:31 AM Jesper Dangaard Brouer
> > <jbrouer@...hat.com> wrote:
> >>
> >>
> >> Adding Cc. Tariq and Saeed, as they wanted page_pool stats in the past.
> >>
> >> On 02/02/2022 02.12, Joe Damato wrote:
> >>> Greetings:
> >>>
> >>> Sending a v3 as I noted some issues with the procfs code in patch 10 I
> >>> submit in v2 (thanks, kernel test robot) and fixing the placement of the
> >>> refill stat increment in patch 8.
> >>
> >> Could you explain why a single global stats (/proc/net/page_pool_stat)
> >> for all page_pool instances for all RX-queues makes sense?
> >>
> >> I think this argument/explanation belongs in the cover letter.
> >
> > I included an explanation in the v2 cover letter where those changes
> > occurred, but you are right: I should have also included it in the v3
> > cover letter.
> >
> > My thought process was this:
> >
> > - Stats now have to be enabled by an explicit kernel config option, so
> > the user has to know what they are doing
> > - Advanced users can move softirqs to CPUs as they wish and they could
> > isolate a particular set of RX-queues on a set of CPUs this way
> > - The result is that there is no need to expose anything to the
> > drivers and no modifications to drivers are necessary once the single
> > kernel config option is enabled and softirq affinity is configured
> >
> > I had assumed by not exposing new APIs / page pool internals and by
> > not requiring drivers to make any changes, I would have a better shot
> > of getting my patches accepted.
> >
> > It sounds like both you and Ilias strongly prefer per-pool-per-cpu
> > stats, so I can make that change in the v4.
> >
> >> What are you using this for?
> >
> > I currently graph NIC driver stats from a number of different vendors
> > to help better understand the performance of those NICs under my
> > company's production workload.
> >
> > For example, on i40e, I submit changes to the upstream driver [1] and
> > am graphing those stats to better understand memory reuse rate. We
> > have seen some issues around mm allocation contention in production
> > workloads with certain NICs and system architectures.
> >
> > My findings with mlx5 have indicated that the proprietary page reuse
> > algorithm in the driver, with our workload, does not provide much
> > memory re-use, and causes pressure against the kernel's page
> > allocator.  The page pool should help remedy this, but without stats I
> > don't have a clear way to measure the effect.
> >
> > So in short: I'd like to gather and graph stats about the page pool
> > API to determine how much impact the page pool API has on page reuse
> > for mlx5 in our workload.
> >
> Hi Joe, Jesper, Ilias, and all,
>
> We plan to totally remove the in-driver page-cache and fully rely on
> page-pool for the allocations and dma mapping. This did not happen until
> now as the page pool did not support elevated page refcount (multiple
> frags per-page) and stats.
>
> I'm happy to see that these are getting attention! Thanks for investing
> time and effort to push these tasks forward!
>
> >> And do Tariq and Saeeds agree with this single global stats approach?
> >
> > I don't know; I hope they'll chime in.
> >
>
> I agree with Jesper and Ilias. Global per-cpu pool stats are very
> limited. There is not much we can do with the super-position of several
> page-pools. IMO, these stats can be of real value only when each cpu has
> a single pool. Otherwise, the summed stats of two or more pools won't
> help much in observability, or debug.

OK thanks Tariq -- that makes sense to me.

I can propose a v4 that converts the stats to per-pool-per-cpu and
re-run the benchmarks, with the modification Jesper suggested to make
them run a bit longer.

I'm still thinking through what the best API design is for accessing
stats from the drivers, but I'll propose something and see what you
all think in the v4.

Thanks,
Joe