netdev - Re: [PATCH net-next v2 4/4] net: hns3: support skb's frag page recycling based on page pool

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <77945d26-2961-df8f-d0b5-8ecafafefd7e@redhat.com>
Date:   Wed, 8 Sep 2021 18:47:09 +0200
From:   Jesper Dangaard Brouer <jbrouer@...hat.com>
To:     Jakub Kicinski <kuba@...nel.org>,
        Ilias Apalodimas <ilias.apalodimas@...aro.org>
Cc:     brouer@...hat.com, moyufeng <moyufeng@...wei.com>,
        Yunsheng Lin <linyunsheng@...wei.com>, davem@...emloft.net,
        alexander.duyck@...il.com, linux@...linux.org.uk, mw@...ihalf.com,
        linuxarm@...neuler.org, yisen.zhuang@...wei.com,
        salil.mehta@...wei.com, thomas.petazzoni@...tlin.com,
        hawk@...nel.org, ast@...nel.org, daniel@...earbox.net,
        john.fastabend@...il.com, akpm@...ux-foundation.org,
        peterz@...radead.org, will@...nel.org, willy@...radead.org,
        vbabka@...e.cz, fenghua.yu@...el.com, guro@...com,
        peterx@...hat.com, feng.tang@...el.com, jgg@...pe.ca,
        mcroce@...rosoft.com, hughd@...gle.com, jonathan.lemon@...il.com,
        alobakin@...me, willemb@...gle.com, wenxu@...oud.cn,
        cong.wang@...edance.com, haokexin@...il.com, nogikh@...gle.com,
        elver@...gle.com, yhs@...com, kpsingh@...nel.org,
        andrii@...nel.org, kafai@...com, songliubraving@...com,
        netdev@...r.kernel.org, linux-kernel@...r.kernel.org,
        bpf@...r.kernel.org, chenhao288@...ilicon.com
Subject: Re: [PATCH net-next v2 4/4] net: hns3: support skb's frag page
 recycling based on page pool



On 08/09/2021 17.57, Jakub Kicinski wrote:
> On Wed, 8 Sep 2021 18:26:35 +0300 Ilias Apalodimas wrote:
>>> Normally I'd say put the stats in ethtool -S and the rest in debugfs
>>> but I'm not sure if exposing pages_state_hold_cnt and
>>> pages_state_release_cnt directly. Those are short counters, and will
>>> very likely wrap. They are primarily meaningful for calculating
>>> page_pool_inflight(). Given this I think their semantics may be too
>>> confusing for an average ethtool -S user.
>>>
>>> Putting all the information in debugfs seems like a better idea.
>>
>> I can't really disagree on the aforementioned stats being confusing.
>> However at some point we'll want to add more useful page_pool stats (e.g the
>> percentage of the page/page fragments that are hitting the recycling path).
>> Would it still be 'ok' to have info split across ethtool and debugfs?
> 
> Possibly. We'll also see what Alex L comes up with for XDP stats. Maybe
> we can arrive at a netlink API for standard things (broken record).
> 
> You said percentage - even tho I personally don't like it - there is a
> small precedent of ethtool -S containing non-counter information (IOW
> not monotonically increasing event counters), e.g. some vendors rammed
> PCI link quality in there. So if all else fails ethtool -S should be
> fine.

I agree with Ilias, that we ought-to add some page_pool stats.
*BUT* ONLY if this doesn't hurt performance!!!

We have explained before, how this is possible, e.g. by keeping consumer 
vs. producer counters on separate cache-lines (internally in page_pool 
struct and likely on per CPU for returning pages).  Then the drivers 
ethtool functions can request the page_pool to fillout a driver provided 
stats area, such that the collection and aggregation of counters are not 
on the fast-path.

I definitely don't want to see pages_state_hold_cnt and 
pages_state_release_cnt being exposed directly.  These were carefully 
designed to not hurt performance. An inflight counter can be deducted by 
above ethtool-driver step and presented to userspace.


Notice that while developing page_pool, I've been using tracepoints and 
bpftrace scripts to inspect the behavior and internals of page_pool.
See[1] and I've even written a page leak detector[2].

In principle you could write a bpftrace tool that extract stats, the 
same way. But I would only recommend doing this for devel phase, because 
these tracepoints do add some overhead.
Originally I wanted to push people to use this for stats, but I've 
realized that not having these stats easy available is annoying ;-)

-Jesper

[1] 
https://github.com/xdp-project/xdp-project/tree/master/areas/mem/bpftrace
[2] 
https://github.com/xdp-project/xdp-project/blob/master/areas/mem/bpftrace/page_pool_track_leaks02.bt