[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <66acf6cc551a0_2751b6294bf@willemb.c.googlers.com.notmuch>
Date: Fri, 02 Aug 2024 11:10:04 -0400
From: Willem de Bruijn <willemdebruijn.kernel@...il.com>
To: Randy Li <ayaka@...lik.info>,
Willem de Bruijn <willemdebruijn.kernel@...il.com>
Cc: netdev@...r.kernel.org,
jasowang@...hat.com,
davem@...emloft.net,
edumazet@...gle.com,
kuba@...nel.org,
pabeni@...hat.com,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH] net: tuntap: add ioctl() TUNGETQUEUEINDX to fetch queue
index
Randy Li wrote:
>
> On 2024/8/1 22:17, Willem de Bruijn wrote:
> > Randy Li wrote:
> >> On 2024/8/1 21:04, Willem de Bruijn wrote:
> >>> Randy Li wrote:
> >>>> On 2024/8/1 05:57, Willem de Bruijn wrote:
> >>>>> nits:
> >>>>>
> >>>>> - INDX->INDEX. It's correct in the code
> >>>>> - prefix networking patches with the target tree: PATCH net-next
> >>>> I see.
> >>>>> Randy Li wrote:
> >>>>>> On 2024/7/31 22:12, Willem de Bruijn wrote:
> >>>>>>> Randy Li wrote:
> >>>>>>>> We need the queue index in qdisc mapping rule. There is no way to
> >>>>>>>> fetch that.
> >>>>>>> In which command exactly?
> >>>>>> That is for sch_multiq, here is an example
> >>>>>>
> >>>>>> tc qdisc add dev tun0 root handle 1: multiq
> >>>>>>
> >>>>>> tc filter add dev tun0 parent 1: protocol ip prio 1 u32 match ip dst
> >>>>>> 172.16.10.1 action skbedit queue_mapping 0
> >>>>>> tc filter add dev tun0 parent 1: protocol ip prio 1 u32 match ip dst
> >>>>>> 172.16.10.20 action skbedit queue_mapping 1
> >>>>>>
> >>>>>> tc filter add dev tun0 parent 1: protocol ip prio 1 u32 match ip dst
> >>>>>> 172.16.10.10 action skbedit queue_mapping 2
> >>>>> If using an IFF_MULTI_QUEUE tun device, packets are automatically
> >>>>> load balanced across the multiple queues, in tun_select_queue.
> >>>>>
> >>>>> If you want more explicit queue selection than by rxhash, tun
> >>>>> supports TUNSETSTEERINGEBPF.
> >>>> I know this eBPF thing. But I am newbie to eBPF as well I didn't figure
> >>>> out how to config eBPF dynamically.
> >>> Lack of experience with an existing interface is insufficient reason
> >>> to introduce another interface, of course.
> >> tc(8) was old interfaces but doesn't have the sufficient info here to
> >> complete its work.
> > tc is maintained.
> >
> >> I think eBPF didn't work in all the platforms? JIT doesn't sound like a
> >> good solution for embeded platform.
> >>
> >> Some VPS providers doesn't offer new enough kernel supporting eBPF is
> >> another problem here, it is far more easy that just patching an old
> >> kernel with this.
> > We don't add duplicative features because they are easier to
> > cherry-pick to old kernels.
> I was trying to say the tc(8) or netlink solution sound more suitable
> for general deploying.
> >> Anyway, I would learn into it while I would still send out the v2 of
> >> this patch. I would figure out whether eBPF could solve all the problem
> >> here.
> > Most importantly, why do you need a fixed mapping of IP address to
> > queue? Can you explain why relying on the standard rx_hash based
> > mapping is not sufficient for your workload?
>
> Server
>
> |
>
> |------ tun subnet (e.x. 172.16.10.0/24) ------- peer A (172.16.10.1)
>
> |------ peer B (172.16.10.3)
>
> |------ peer C (172.16.10.20)
>
> I am not even sure the rx_hash could work here, the server here acts as
> a router or gateway, I don't know how to filter the connection from the
> external interface based on rx_hash. Besides, VPN application didn't
> operate on the socket() itself.
>
> I think this question is about why I do the filter in the kernel not the
> userspace?
>
> It would be much more easy to the dispatch work in kernel, I only need
> to watch the established peer with the help of epoll(). Kernel could
> drop all the unwanted packets. Besides, if I do the filter/dispatcher
> work in the userspace, it would need to copy the packet's data to the
> userspace first, even decide its fate by reading a few bytes from its
> beginning offset. I think we can avoid such a cost.
A custom mapping function is exactly the purpose of TUNSETSTEERINGEBPF.
Please take a look at that. It's a lot more elegant than going through
userspace and then inserting individual tc skbedit filters.
Powered by blists - more mailing lists