lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Thu, 28 Aug 2014 03:42:54 +0200 From: Hannes Frederic Sowa <hannes@...essinduktion.org> To: David Miller <davem@...emloft.net> Cc: netdev@...r.kernel.org, therbert@...gle.com, jhs@...atatu.com, edumazet@...gle.com, jeffrey.t.kirsher@...el.com, rusty@...tcorp.com.au, dborkman@...hat.com, brouer@...hat.com, John Fastabend <john.r.fastabend@...el.com> Subject: Re: [PATCH 0/2] Get rid of ndo_xmit_flush On Mi, 2014-08-27 at 13:45 -0700, David Miller wrote: > From: Hannes Frederic Sowa <hannes@...essinduktion.org> > Date: Wed, 27 Aug 2014 14:31:12 +0200 > > > On Mo, 2014-08-25 at 16:34 -0700, David Miller wrote: > >> Given Jesper's performance numbers, it's not the way to go. > >> > >> Instead, go with a signalling scheme via new boolean skb->xmit_more. > >> > >> This has several advantages: > >> > >> 1) Nearly trivial driver support, just protect the tail pointer > >> update with the skb->xmit_more check. > > > > One thing one should keep in mind is, that there must be a skb available > > to trigger the flush, maybe this will hurt us one day. > > > > Thinking more about it should we go with a coccinelle script and > > replace/extend ndo_start_xmit with an additional argument? > > > > We can also add a new function pointer and call that instead of > > ndo_start_xmit. I think only the callq *%rax hurts performance. > > I don't think we will have any problems here, the caller will always > be the entity which analyzes the upcoming set of SKBs to submit and > tag them properly. > > I really do not want to add a new OP and I even more so do not want to > adjust the ndo_start_xmit() signature. It's effect is far reaching, > and for absolutely no gain as far as I can see. > > Thanks. Just brainstorming a bit: I wonder if we still might need a separate call for tx_flush, e.g. for af_packet if one wants to allow user space control of batching, MSG_MORE with tx hangcheck (also in case user space has control over it) or implement TCP_CORK alike option in af_packet. Even in the case we someday might allow concurrent dequeueing from a qdisc we might want to use xmit_more opportunistic (also user space might use xmit_more opportunistic if dequeueing from lockless data structures). I just want to ensure we thought about possible reuses of this driver api change. ;) To do so just some proposals: 1) export tx_flush function nonetheless but normal/fast path just depends on skb->xmit_more (so driver just does a static call to the flushing function, no indirection here). We would keep the ndo_op for flushing around, but ndo_start_xmit must still honor xmit_more by itself. af_packet sockets could then flush the queues separately if wanted. Also if ndo_op != NULL we know that driver supports batching. 2) add new op like ndo_start_xmit_flags op with an additional flags parameter and document that NULL skbs can happen. In this case tx-queue should only get flushed. By checking for the new method we also know that the driver supports batching. 3) provide special skb (only head) with no data to ndo_start_xmit which only clears the queue but prepare drivers to not enqueue any new data. I am concerned that if user space can control batching to the NIC we can have side-effects if user space did not flush the queue of the nic before and we dequeue more packets from qdisc into the same nic's tx queue. Allowing user space control over batching might make the conversion of drivers more painful, as driver might have to check for "dirtiness" before it tries to enqueue the packet internally. (To track queueing state of nic we also must track the state globally per queue which might also become a performance problem cache trashing wise, I am unsure about that; at least seems not to be the case with bql. My idea was a tx-queue dirty bit somewhere, or check bql state before submitting skb. But maybe one has to extend bql to keep track of unflushed vs. flushed skbs to make this safe.) If queuing failed because of missing tx_flush we either have to requeue the skb to the qdisc or do a fast retry, but I don't think that NICs will catch up that fast, no? Or can we just blame user space applications that they are wrong if they use xmit_more irresponsible? I liked the separate ndo ops approach because it seemed so very well extendable to me. :/ I know that these concerns a bit messy but wanted to share them nonetheless. Maybe they are all void in the end. ;) Also maybe these concerns can get solved by the addition of future AF_PACKETV4 ndo ops and we can just postpone this discussion. Bye, Hannes -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@...r.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists