lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <94c88020-5282-c82b-8f88-a2d012444699@iogearbox.net>
Date:   Mon, 30 Oct 2023 15:19:26 +0100
From:   Daniel Borkmann <daniel@...earbox.net>
To:     Peilin Ye <yepeilin.cs@...il.com>
Cc:     "David S. Miller" <davem@...emloft.net>,
        Eric Dumazet <edumazet@...gle.com>,
        Jakub Kicinski <kuba@...nel.org>,
        Paolo Abeni <pabeni@...hat.com>,
        Alexei Starovoitov <ast@...nel.org>,
        Andrii Nakryiko <andrii@...nel.org>,
        Martin KaFai Lau <martin.lau@...ux.dev>,
        Song Liu <song@...nel.org>,
        Yonghong Song <yonghong.song@...ux.dev>,
        John Fastabend <john.fastabend@...il.com>,
        KP Singh <kpsingh@...nel.org>,
        Stanislav Fomichev <sdf@...gle.com>,
        Hao Luo <haoluo@...gle.com>, Jiri Olsa <jolsa@...nel.org>,
        Jesper Dangaard Brouer <hawk@...nel.org>,
        Peilin Ye <peilin.ye@...edance.com>, netdev@...r.kernel.org,
        bpf@...r.kernel.org, linux-kernel@...r.kernel.org,
        Cong Wang <cong.wang@...edance.com>,
        Jiang Wang <jiang.wang@...edance.com>,
        Youlun Zhang <zhangyoulun@...edance.com>
Subject: Re: [PATCH net] veth: Fix RX stats for bpf_redirect_peer() traffic

On 10/29/23 1:11 AM, Peilin Ye wrote:
> On Sat, Oct 28, 2023 at 09:06:44AM +0200, Daniel Borkmann wrote:
>>>> diff --git a/net/core/filter.c b/net/core/filter.c
>>>> index 21d75108c2e9..7aca28b7d0fd 100644
>>>> --- a/net/core/filter.c
>>>> +++ b/net/core/filter.c
>>>> @@ -2492,6 +2492,7 @@ int skb_do_redirect(struct sk_buff *skb)
>>>>    			     net_eq(net, dev_net(dev))))
>>>>    			goto out_drop;
>>>>    		skb->dev = dev;
>>>> +		dev_sw_netstats_rx_add(dev, skb->len);
>>>
>>> This assumes that all devices that support BPF_F_PEER (currently only
>>> veth) use tstats (instead of lstats, or dstats) - is that okay?
>>
>> Dumb question, but why all this change and not simply just call ...
>>
>>    dev_lstats_add(dev, skb->len)
>>
>> ... on the host dev ?
> 
> Since I didn't want to update host-veth's TX counters.  If we
> bpf_redirect_peer()ed a packet from NIC TC ingress to Pod-veth TC ingress,
> I think it means we've bypassed host-veth TX?

Yes. So the idea is to transition to tstats replace the location where
we used to bump lstats with tstat's tx counter, and only the peer redirect
would bump the rx counter.. then upon stats traversal we fold the latter into
the rx stats which was populated by the opposite's tx counters. Makes sense.

OT: does cadvisor run inside the Pod to collect the device stats? Just
curious how it gathers them.

>>> If not, should I add another NDO e.g. ->ndo_stats_rx_add()?
>>
>> Definitely no new stats ndo resp indirect call in fast path.
> 
> Yeah, I think I'll put a comment saying that all devices that support
> BPF_F_PEER must use tstats (or must use lstats), then.

sgtm.

Thanks,
Daniel

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ