lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Tue, 16 Aug 2011 22:47:40 -0700 From: John Fastabend <john.r.fastabend@...el.com> To: Rasesh Mody <rmody@...cade.com> CC: Ben Hutchings <bhutchings@...arflare.com>, "davem@...emloft.net" <davem@...emloft.net>, "netdev@...r.kernel.org" <netdev@...r.kernel.org>, Adapter Linux Open SRC Team <adapter_linux_open_src_team@...cade.COM>, Gurunatha Karaje <gkaraje@...cade.com> Subject: Re: [PATCH 04/14] bna: Add Multiple Tx Queue Support On 8/16/2011 7:14 PM, Rasesh Mody wrote: >> From: John Fastabend [mailto:john.r.fastabend@...el.com] >> Sent: Tuesday, August 16, 2011 5:18 PM >> >> On 8/16/2011 4:43 PM, Ben Hutchings wrote: >>> On Tue, 2011-08-16 at 16:32 -0700, Rasesh Mody wrote: >>>>> From: Ben Hutchings [mailto:bhutchings@...arflare.com] >>>>> Sent: Tuesday, August 16, 2011 2:49 PM >>>>> >>>>> On Tue, 2011-08-16 at 14:19 -0700, Rasesh Mody wrote: >>>>>> Change details: >>>>>> - Add macros bna_prio_allow, bna_default_nw_prio, bna_iscsi_prio, >>>>>> bna_is_iscsi_over_cee >>>>>> - Added support for multipe Tx queues with a separate iSCSI Tx >> queue >>>>> based >>>>>> on the default value of iSCSI port number. The feature is >> supported >>>>> based >>>>>> on the underlying hardware and enabled for DCB (CEE) mode only. >>>>>> - Allocate multiple TxQ resource in netdev >>>>>> - Implement bnad_tx_select_queue() which enables the correct >>>>> selection of >>>>>> TxQ Id (and tcb). This function is called either by the kernel >> to >>>>> channel >>>>>> packets to the right TxQ >>>>>> - bnad_tx_select_queue() returns priority, while only a few >> packets >>>>> during >>>>>> transition could have wrong priority, all will be associated >> with a >>>>> valid >>>>>> non-NULL tcb. >>>>>> - Implement bnad_iscsi_tcb_get() and BNAD_IS_ISCSI_PKT() for iSCSI >>>>> packet >>>>>> inspection and retrieval of tcb corresponding to the iSCSI >>>>> priority. >>>>>> - Construction of priority indirection table to be used by bnad to >>>>> direct >>>>>> packets into TxQs >>>>> [...] >>>>> >>>>> You probably should implement TX priority classes through the >>>>> ndo_setup_tc operation, not ndo_select_queue. >>>> >>>> The reason we went with ndo_select_queue is due to the need for >> mapping >>>> iSCSI packets (TCP port 3260) to a priority derived from DCB >> configuration. >>>> Here the iSCSI packets may not have any tc defined in the packet >> header. >>> >>> There is an skb priority, which is derived from the socket priority >>> option (SO_PRIORITY). If you implement ndo_setup_tc then the >> networking >>> core will take care of mapping the skb priority onto a different queue >>> (or range of queues). >>> >>> I don't know whether the socket priority option is easily configurable >>> for the existing iSCSI implementations. But looking at port numbers >>> really doesn't seem like the right way to do this. >>> >>> Ben. >>> >> >> At least open-iscsi supports DCB by listening to the RTNLGRP_DCB for >> events. >> These are generated with dcbnl_cee_notify and dcbnl_ieee_notify. To >> support >> this in your driver you need to call dcb_setapp() or dcb_ieee_setapp() >> to >> add the application data and follow this with the appropriate notify >> call. >> Then assuming you do the necessary tc setup work the stack will handle >> the >> mapping of priority to queues. > > This is how iSCSI over DCB feature is expected to works in BNA driver:- > FW running in the BNA adapter implements the DCB protocol. It learns the > iSCSI priority from the switch through iSCSI TLV exchange. BNA driver > extracts the iSCSI priority from the FW that needs to be used for iSCSI > packets. Up to here this is fine. What I was suggesting was to then use the dcb_setapp() routines to program the iSCSI TLV and generate an event so any user space applications listening to DCB events could handle this. It would also be nice if your driver notified user space of any PG or PFC changes as well. Then management agents (lldpad) could use the DCB attributes to make policy decisions. > For every outgoing packet, BNA driver does a TCP header > inspection to classify iSCSI packet and attach right 802.1q priority & > send it on the correct TX queue. > > This is expected to work with iSCSI applications that do not configure the > priority with SO_PRIORITY - here the iSCSI priority configuration actually > comes from the switch to the adapter. > Although this works I don't think it is optimal for a few reason. Your L2 driver is inspecting TCP frames which is bad layering IMHO. The iSCSI port number is hard coded into the driver so it will only work with the well known port number. Also it adds more driver specific behavior into select_queue() where I think the trend is to try to use select_queue() less not more. To address iSCSI applications that do not configure the priority we could either work on adding the DCB hooks needed in those applications. Or look at adding a hook in the qdisc layer to to do what your select_queue() hook does here. I started prototyping this awhile ago but this requires running the packet classifiers and associated actions before picking a queue assuming you want to use mq or mqprio. I hope to get back to this soon I have some more details to flush out wrt this and need to run it by some other folks to make sure its a sane idea. > The goal of this iSCSI priority is: > a) adapter applies prioritized scheduling for packets in its egress - to > guarantee minimum bandwidth as per ETS > b) packets are tagged with right priority so that switch can also identify > and guarantee BW on its egress. Correct. This is the same for other drivers that support DCB and use tc_setup as Ben suggested. See the bnx2x driver for an example that also uses a FW based LLDP engine and does this. > > Hope this explains. I think the one valid item is the unsupported applications but hopefully that can be addressed. Thanks for the details. John. > > Thanks, > Rasesh -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@...r.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists