lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <66acf6cc551a0_2751b6294bf@willemb.c.googlers.com.notmuch>
Date: Fri, 02 Aug 2024 11:10:04 -0400
From: Willem de Bruijn <willemdebruijn.kernel@...il.com>
To: Randy Li <ayaka@...lik.info>, 
 Willem de Bruijn <willemdebruijn.kernel@...il.com>
Cc: netdev@...r.kernel.org, 
 jasowang@...hat.com, 
 davem@...emloft.net, 
 edumazet@...gle.com, 
 kuba@...nel.org, 
 pabeni@...hat.com, 
 linux-kernel@...r.kernel.org
Subject: Re: [PATCH] net: tuntap: add ioctl() TUNGETQUEUEINDX to fetch queue
 index

Randy Li wrote:
> 
> On 2024/8/1 22:17, Willem de Bruijn wrote:
> > Randy Li wrote:
> >> On 2024/8/1 21:04, Willem de Bruijn wrote:
> >>> Randy Li wrote:
> >>>> On 2024/8/1 05:57, Willem de Bruijn wrote:
> >>>>> nits:
> >>>>>
> >>>>> - INDX->INDEX. It's correct in the code
> >>>>> - prefix networking patches with the target tree: PATCH net-next
> >>>> I see.
> >>>>> Randy Li wrote:
> >>>>>> On 2024/7/31 22:12, Willem de Bruijn wrote:
> >>>>>>> Randy Li wrote:
> >>>>>>>> We need the queue index in qdisc mapping rule. There is no way to
> >>>>>>>> fetch that.
> >>>>>>> In which command exactly?
> >>>>>> That is for sch_multiq, here is an example
> >>>>>>
> >>>>>> tc qdisc add dev  tun0 root handle 1: multiq
> >>>>>>
> >>>>>> tc filter add dev tun0 parent 1: protocol ip prio 1 u32 match ip dst
> >>>>>> 172.16.10.1 action skbedit queue_mapping 0
> >>>>>> tc filter add dev tun0 parent 1: protocol ip prio 1 u32 match ip dst
> >>>>>> 172.16.10.20 action skbedit queue_mapping 1
> >>>>>>
> >>>>>> tc filter add dev tun0 parent 1: protocol ip prio 1 u32 match ip dst
> >>>>>> 172.16.10.10 action skbedit queue_mapping 2
> >>>>> If using an IFF_MULTI_QUEUE tun device, packets are automatically
> >>>>> load balanced across the multiple queues, in tun_select_queue.
> >>>>>
> >>>>> If you want more explicit queue selection than by rxhash, tun
> >>>>> supports TUNSETSTEERINGEBPF.
> >>>> I know this eBPF thing. But I am newbie to eBPF as well I didn't figure
> >>>> out how to config eBPF dynamically.
> >>> Lack of experience with an existing interface is insufficient reason
> >>> to introduce another interface, of course.
> >> tc(8) was old interfaces but doesn't have the sufficient info here to
> >> complete its work.
> > tc is maintained.
> >
> >> I think eBPF didn't work in all the platforms? JIT doesn't sound like a
> >> good solution for embeded platform.
> >>
> >> Some VPS providers doesn't offer new enough kernel supporting eBPF is
> >> another problem here, it is far more easy that just patching an old
> >> kernel with this.
> > We don't add duplicative features because they are easier to
> > cherry-pick to old kernels.
> I was trying to say the tc(8) or netlink solution sound more suitable 
> for general deploying.
> >> Anyway, I would learn into it while I would still send out the v2 of
> >> this patch. I would figure out whether eBPF could solve all the problem
> >> here.
> > Most importantly, why do you need a fixed mapping of IP address to
> > queue? Can you explain why relying on the standard rx_hash based
> > mapping is not sufficient for your workload?
> 
> Server
> 
>    |
> 
>    |------ tun subnet (e.x. 172.16.10.0/24) ------- peer A (172.16.10.1)
> 
> |------ peer B (172.16.10.3)
> 
> |------  peer C (172.16.10.20)
> 
> I am not even sure the rx_hash could work here, the server here acts as 
> a router or gateway, I don't know how to filter the connection from the 
> external interface based on rx_hash. Besides, VPN application didn't 
> operate on the socket() itself.
> 
> I think this question is about why I do the filter in the kernel not the 
> userspace?
> 
> It would be much more easy to the dispatch work in kernel, I only need 
> to watch the established peer with the help of epoll(). Kernel could 
> drop all the unwanted packets. Besides, if I do the filter/dispatcher 
> work in the userspace, it would need to copy the packet's data to the 
> userspace first, even decide its fate by reading a few bytes from its 
> beginning offset. I think we can avoid such a cost.

A custom mapping function is exactly the purpose of TUNSETSTEERINGEBPF.

Please take a look at that. It's a lot more elegant than going through
userspace and then inserting individual tc skbedit filters.


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ