lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <11091d33ff7803257e38ee921e4ba9597acfccfc.camel@kernel.org>
Date:   Wed, 04 Aug 2021 11:27:09 -0700
From:   Saeed Mahameed <saeed@...nel.org>
To:     David Ahern <dsahern@...il.com>, Jakub Kicinski <kuba@...nel.org>,
        Tariq Toukan <ttoukan.linux@...il.com>,
        Tariq Toukan <tariqt@...dia.com>
Cc:     Alexander Lobakin <alexandr.lobakin@...el.com>,
        "David S. Miller" <davem@...emloft.net>,
        Jesse Brandeburg <jesse.brandeburg@...el.com>,
        Lukasz Czapnik <lukasz.czapnik@...el.com>,
        Marcin Kubiak <marcin.kubiak@...el.com>,
        Michal Kubiak <michal.kubiak@...el.com>,
        Michal Swiatkowski <michal.swiatkowski@...el.com>,
        Jonathan Corbet <corbet@....net>,
        Netanel Belgazal <netanel@...zon.com>,
        Arthur Kiyanovski <akiyano@...zon.com>,
        Guy Tzalik <gtzalik@...zon.com>,
        Saeed Bishara <saeedb@...zon.com>,
        Ioana Ciornei <ioana.ciornei@....com>,
        Claudiu Manoil <claudiu.manoil@....com>,
        Thomas Petazzoni <thomas.petazzoni@...tlin.com>,
        Marcin Wojtas <mw@...ihalf.com>,
        Russell King <linux@...linux.org.uk>,
        Edward Cree <ecree.xilinx@...il.com>,
        Martin Habets <habetsm.xilinx@...il.com>,
        "Michael S. Tsirkin" <mst@...hat.com>,
        Jason Wang <jasowang@...hat.com>,
        Alexei Starovoitov <ast@...nel.org>,
        Daniel Borkmann <daniel@...earbox.net>,
        Jesper Dangaard Brouer <hawk@...nel.org>,
        John Fastabend <john.fastabend@...il.com>,
        Andrii Nakryiko <andrii@...nel.org>,
        Martin KaFai Lau <kafai@...com>,
        Song Liu <songliubraving@...com>, Yonghong Song <yhs@...com>,
        KP Singh <kpsingh@...nel.org>,
        Shay Agroskin <shayagr@...zon.com>,
        Sameeh Jubran <sameehj@...zon.com>,
        Alexander Duyck <alexanderduyck@...com>,
        Danielle Ratson <danieller@...dia.com>,
        Ido Schimmel <idosch@...dia.com>, Andrew Lunn <andrew@...n.ch>,
        Vladyslav Tarasiuk <vladyslavt@...dia.com>,
        Arnd Bergmann <arnd@...db.de>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Jian Shen <shenjian15@...wei.com>,
        Petr Vorel <petr.vorel@...il.com>, Dan Murphy <dmurphy@...com>,
        Yangbo Lu <yangbo.lu@....com>,
        Michal Kubecek <mkubecek@...e.cz>,
        Zheng Yongjun <zhengyongjun3@...wei.com>,
        Heiner Kallweit <hkallweit1@...il.com>,
        YueHaibing <yuehaibing@...wei.com>,
        Johannes Berg <johannes@...solutions.net>,
        netdev@...r.kernel.org, linux-doc@...r.kernel.org,
        linux-kernel@...r.kernel.org,
        virtualization@...ts.linux-foundation.org, bpf@...r.kernel.org
Subject: Re: [PATCH net-next 03/21] ethtool, stats: introduce standard XDP
 statistics

On Wed, 2021-08-04 at 11:28 -0600, David Ahern wrote:
> On 8/4/21 10:44 AM, Jakub Kicinski wrote:
> > On Wed, 4 Aug 2021 10:17:56 -0600 David Ahern wrote:
> > > On 8/4/21 6:36 AM, Jakub Kicinski wrote:
> > > > > XDP is going to always be eBPF based ! why not just report
> > > > > such stats
> > > > > to a special BPF_MAP ? BPF stack can collect the stats from
> > > > > the driver
> > > > > and report them to this special MAP upon user request.  
> > > > Do you mean replacing the ethtool-netlink / rtnetlink etc. with
> > > > a new BPF_MAP? I don't think adding another category of uAPI
> > > > thru 
> > > > which netdevice stats are exposed would do much good :( Plus it
> > > > doesn't address the "yet another cacheline" concern.
> > > > 
> > > > To my understanding the need for stats recognizes the fact that
> > > > (in
> > > > large organizations) fleet monitoring is done by different
> > > > teams than
> > > > XDP development. So XDP team may have all the stats they need,
> > > > but the
> > > > team doing fleet monitoring has no idea how to get to them.
> > > > 
> > > > To bridge the two worlds we need a way for the infra team to
> > > > ask the
> > > > XDP for well-defined stats. Maybe we should take a page from
> > > > the BPF
> > > > iterators book and create a program type for bridging the two
> > > > worlds?
> > > > Called by networking core when duping stats to extract from the
> > > > existing BPF maps all the relevant stats and render them into a
> > > > well
> > > > known struct? Users' XDP design can still use a single per-cpu
> > > > map with
> > > > all the stats if they so choose, but there's a way to implement
> > > > more
> > > > optimal designs and still expose well-defined stats.
> > > > 
> > > > Maybe that's too complex, IDK.  
> > > 

The main question here, do we want the prog to count or driver ? 
and the answer will lead to more questions :) :

1) will the prog/user need to access driver for driver only stats ? or
driver shall report to a special program and all the collection and
reporting is done in XDP/BPF internally .. 
2) stats per prog/queue/cpu/interface ? 
3) how to eventually report to user ethtool/ip -s/bpftool ?

too complex, IDK too .. :D


> > > I was just explaining to someone internally how to get stats at
> > > all of
> > > the different points in the stack to track down reasons for
> > > dropped packets:
> > > 
> > > ethtool -S for h/w and driver
> > > tc -s for drops by the qdisc
> > > /proc/net/softnet_stat for drops at the backlog layer
> > > netstat -s for network and transport layer
> > > 
> > > yet another command and API just adds to the nightmare of
> > > explaining and
> > > understanding these stats.
> > 
> > Are you referring to RTM_GETSTATS when you say "yet another
> > command"?
> > RTM_GETSTATS exists and is used by offloads today.
> > 
> > I'd expect ip -s (-s) to be extended to run GETSTATS and display
> > the xdp
> > stats. (Not sure why ip -s was left out of your list :))
> 
> It's on my diagram, and yes, forgot to add it here.
> 

i think ip -s is a good place for "standard" driver based xdp stats.
but as Jakub already explained, adding such driver mechanism is like
making a statement that drivers must implement this.

> > 
> > > There is real value in continuing to use ethtool API for XDP
> > > stats. Not
> > > saying this reorg of the XDP stats is the right thing to do, only
> > > that
> > > the existing API has real user benefits.
> > 
> > RTM_GETSTATS is an existing API. New ethtool stats are intended to
> > be HW
> > stats. I don't want to go back to ethtool being a dumping ground
> > for all
> > stats because that's what the old interface encouraged.
> 
> driver stats are important too. e.g., mlx5's cache stats and per-
> queue
> stats.
> 

one could claim that mlx5 cache stats should move to page_pool and
per_queue stats should move to the stack.

> > 
> > > Does anyone have data that shows bumping a properly implemented
> > > counter
> > > causes a noticeable performance degradation and if so by how
> > > much? You
> > > mention 'yet another cacheline' but collecting stats on stack and
> > > incrementing the driver structs at the end of the napi loop
> > > should not
> > > have a huge impact versus the value the stats provide.
> > 
> > Not sure, maybe Jesper has some numbers. Maybe Intel folks do?
> 

A properly implemented counter that doesn't introduce new cache misses,
will hardly show any measurable difference, the only way to measure is
via instructions per packet.

usually the way we implement counters in mlx5 is that if this is the
fastest flow that we expect then we only increment the good counters
"packet++/drop++/redirect++" any slower path should include counters to
indicate the slower path and the effect of the new "slower" counters
will still be negligible as we already are at a higher instructions per
packet hence the slower path .. 

the only time you measure a difference is when you introduce new
counting on a counter-free flow, e.g page_pool ;)

> I just ran some quick tests with my setup and measured about 1.2%
> worst

1.2% is a lot ! what was the test ? what is the change ?

> case. Certainly not exhaustive. Perhaps Intel or Mellanox can provide
> numbers for their high speed nics - e.g. ConnectX-6 and a saturated
> host.
> 

let's define what are we testing first, there are multiple places we
need to check, Tariq will be exploring transitioning mlx5 cache to
page_pool with all the counters, maybe it is a good place to measure.. 

> > 
> > I'm just allergic to situations when there is a decision made and 
> > then months later patches are posted disregarding the decision, 
> > without analysis on why that decision was wrong. And while the
> > maintainer who made the decision is on vacation.
> > 
> 
> stats is one of the many sensitive topics. I have been consistent in
> defending the need to use existing APIs and tooling and not relying
> on
> XDP program writers to add the relevant stats and then provide
> whatever
> tool is needed to extract and print them. Standardization for
> fundamental analysis tools.




Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ