[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <6d5cb0ef-fabc-7ca3-94b2-5b1925a6805f@iogearbox.net>
Date: Tue, 31 Oct 2023 23:20:23 +0100
From: Daniel Borkmann <daniel@...earbox.net>
To: Jakub Kicinski <kuba@...nel.org>
Cc: Peilin Ye <yepeilin.cs@...il.com>,
"David S. Miller" <davem@...emloft.net>,
Eric Dumazet <edumazet@...gle.com>,
Paolo Abeni <pabeni@...hat.com>,
Alexei Starovoitov <ast@...nel.org>,
Andrii Nakryiko <andrii@...nel.org>,
Martin KaFai Lau <martin.lau@...ux.dev>,
Song Liu <song@...nel.org>,
Yonghong Song <yonghong.song@...ux.dev>,
John Fastabend <john.fastabend@...il.com>,
KP Singh <kpsingh@...nel.org>,
Stanislav Fomichev <sdf@...gle.com>,
Hao Luo <haoluo@...gle.com>, Jiri Olsa <jolsa@...nel.org>,
Jesper Dangaard Brouer <hawk@...nel.org>,
Peilin Ye <peilin.ye@...edance.com>, netdev@...r.kernel.org,
bpf@...r.kernel.org, linux-kernel@...r.kernel.org,
Cong Wang <cong.wang@...edance.com>,
Jiang Wang <jiang.wang@...edance.com>,
Youlun Zhang <zhangyoulun@...edance.com>
Subject: Re: [PATCH net] veth: Fix RX stats for bpf_redirect_peer() traffic
On 10/31/23 8:53 PM, Jakub Kicinski wrote:
> On Mon, 30 Oct 2023 15:19:26 +0100 Daniel Borkmann wrote:
>>> Since I didn't want to update host-veth's TX counters. If we
>>> bpf_redirect_peer()ed a packet from NIC TC ingress to Pod-veth TC ingress,
>>> I think it means we've bypassed host-veth TX?
>>
>> Yes. So the idea is to transition to tstats replace the location where
>> we used to bump lstats with tstat's tx counter, and only the peer redirect
>> would bump the rx counter.. then upon stats traversal we fold the latter into
>> the rx stats which was populated by the opposite's tx counters. Makes sense.
>>
>> OT: does cadvisor run inside the Pod to collect the device stats? Just
>> curious how it gathers them.
>
> Somewhat related - where does netkit count stats?
Yeap, it needs it as well, I have a local branch here where I pushed all
of it - coming out soon; I was planning to add some selftests in addition
till end of this week:
https://github.com/cilium/linux/commits/pr/ndo_peer
>>>> Definitely no new stats ndo resp indirect call in fast path.
>>>
>>> Yeah, I think I'll put a comment saying that all devices that support
>>> BPF_F_PEER must use tstats (or must use lstats), then.
>>
>> sgtm.
>
> Is comment good enough? Can we try to do something more robust?
> Move the allocation of stats into the core at registration based
> on some u8 assigned in the driver? (I haven't looked at the code TBH)
Hm, not sure. One thing that comes to mind is lazy one-time allocation
like in case of netdev_core_stats_inc(), so whenever one of the helpers
like dev_sw_netstats_{rx,tx}_add() are called and dev->tstats are still
NULL, the core knows about the driver's intent, but tbh that doesn't
feel overly clean and in case of netdev_core_stats_inc() it's more in
the exception case rather than fast-path.
Other option could be to have two small helpers in the core which then
set a flag as well:
static inline int netdev_tstats_alloc(struct net_device *dev)
{
dev->tstats = netdev_alloc_pcpu_stats(struct pcpu_sw_netstats);
if (!dev->tstats)
return -ENOMEM;
dev->priv_flags |= IFF_USES_TSTATS;
return 0;
}
static inline void netdev_tstats_free(struct net_device *dev)
{
free_percpu(dev->tstats);
}
They can then be used from .ndo_init/uninit - not sure if this would
be overall nicer.. or just leaving it at the .ndo callback comment for
the time being until really more users show up (which I doubt tbh).
Thanks,
Daniel
Powered by blists - more mailing lists