lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Fri, 6 Dec 2019 10:05:20 -0800 From: Florian Fainelli <f.fainelli@...il.com> To: Alexander Lobakin <alobakin@...nk.ru> Cc: Andrew Lunn <andrew@...n.ch>, "David S. Miller" <davem@...emloft.net>, Muciri Gatimu <muciri@...nmesh.com>, Shashidhar Lakkavalli <shashidhar.lakkavalli@...nmesh.com>, John Crispin <john@...ozen.org>, Vivien Didelot <vivien.didelot@...il.com>, Stanislav Fomichev <sdf@...gle.com>, Daniel Borkmann <daniel@...earbox.net>, Song Liu <songliubraving@...com>, Alexei Starovoitov <ast@...nel.org>, Matteo Croce <mcroce@...hat.com>, Jakub Sitnicki <jakub@...udflare.com>, Eric Dumazet <edumazet@...gle.com>, Paul Blakey <paulb@...lanox.com>, Yoshiki Komachi <komachi.yoshiki@...il.com>, netdev@...r.kernel.org, linux-kernel@...r.kernel.org Subject: Re: [PATCH net] net: dsa: fix flow dissection on Tx path On 12/5/19 11:37 PM, Alexander Lobakin wrote: > Florian Fainelli wrote 06.12.2019 06:32: >> On 12/5/2019 6:58 AM, Alexander Lobakin wrote: >>> Andrew Lunn wrote 05.12.2019 17:01: >>>>> Hi, >>>>> >>>>> > What i'm missing here is an explanation why the flow dissector is >>>>> > called here if the protocol is already set? It suggests there is a >>>>> > case when the protocol is not correctly set, and we do need to look >>>>> > into the frame? >>>>> >>>>> If we have a device with multiple Tx queues, but XPS is not configured >>>>> or system is running on uniprocessor system, then networking core code >>>>> selects Tx queue depending on the flow to utilize as much Tx queues as >>>>> possible but without breaking frames order. >>>>> This selection happens in net/core/dev.c:skb_tx_hash() as: >>>>> >>>>> reciprocal_scale(skb_get_hash(skb), qcount) >>>>> >>>>> where 'qcount' is the total number of Tx queues on the network device. >>>>> >>>>> If skb has not been hashed prior to this line, then skb_get_hash() >>>>> will >>>>> call flow dissector to generate a new hash. That's why flow dissection >>>>> can occur on Tx path. >>>> >>>> >>>> Hi Alexander >>>> >>>> So it looks like you are now skipping this hash. Which in your >>>> testing, give better results, because the protocol is already set >>>> correctly. But are there cases when the protocol is not set correctly? >>>> We really do need to look into the frame? >>> >>> Actually no, I'm not skipping the entire hashing, I'm only skipping >>> tag_ops->flow_dissect() (helper that only alters network offset and >>> replaces fake ETH_P_XDSA with the actual protocol) call on Tx path, >>> because this only breaks flow dissection logics. All skbs are still >>> processed and hashed by the generic code that goes after that call. >>> >>>> How about when an outer header has just been removed? The frame was >>>> received on a GRE tunnel, the GRE header has just been removed, and >>>> now the frame is on its way out? Is the protocol still GRE, and we >>>> should look into the frame to determine if it is IPv4, ARP etc? >>>> >>>> Your patch looks to improve things for the cases you have tested, but >>>> i'm wondering if there are other use cases where we really do need to >>>> look into the frame? In which case, your fix is doing the wrong thing. >>>> Should we be extending the tagger to handle the TX case as well as the >>>> RX case? >>> >>> We really have two options: don't call tag_ops->flow_dissect() on Tx >>> (this patch), or extend tagger callbacks to handle Tx path too. I was >>> using both of this for several months each and couldn't detect cases >>> where the first one was worse than the second. >>> I mean, there _might_ be such cases in theory, and if they will appear >>> we should extend our taggers. But for now I don't see the necessity to >>> do this as generic flow dissection logics works as expected after this >>> patch and is completely broken without it. >>> And remember that we have the reverse logic on Tx and all skbs are >>> firstly queued on slave netdevice and only then on master/CPU port. >>> >>> It would be nice to see what other people think about it anyways. >> >> Your patch seems appropriate to me and quite frankly I am not sure why >> flow dissection on RX is done at the DSA master device level, where we >> have not parsed the DSA tag yet, instead of being done at the DSA slave >> network device level. It seems to me that if the DSA master has N RX >> queues, we should be creating the DSA slave devices with the same amount >> of RX queues and perform RPS there against a standard Ethernet frame >> (sans DSA tag). >> >> For TX the story is a little different because we can have multiqueue >> DSA slave network devices in order to steer traffic towards particular >> switch queues and we could do XPS there that way. >> >> What do you think? > > Hi Florian, > > First of all, thank you for the "Reviewed-by"! > > I agree with you that all the network stack processing should be > performed on standard frames without CPU tags and on corresponding > slave netdevices. So I think we really should think about extending > DSA core code to create slaves with at least as many Rx queues as > master device have. With this done we could remove .flow_dissect() > callback from DSA taggers entirely and simplify traffic flow. Indeed. > > Also, if we get back to Tx processing, number of Tx queues on slaves > should be equal to number of queues on switch inself in ideal case. > Maybe we should then apply this rule to Rx queues too, i.e. create > slaves with the number of Rx queues that switch has? Yes, I would offer the same configuration knob we have today with TX queues for RX queues. > > (for example, I'm currently working with the switches that have 8 Rxqs > and 8 Txqs, but their Ethernet controlers / CPU ports have only 4/4) Yes, that is not uncommon unfortunately, we have a similar set-up with BCM7278 which has 16 TX queues for its DSA master and we have 4 switch ports with 8 TX queues per port, what I did is basically clamp the number of DSA slave device TX queues to have a 1:1 mapping and that seems acceptable to the users :) -- Florian
Powered by blists - more mailing lists