lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20160407130859.GA2818@oracle.com>
Date:	Thu, 7 Apr 2016 09:11:01 -0400
From:	Sowmini Varadhan <sowmini.varadhan@...cle.com>
To:	alexei.starovoitov@...il.com, tom@...bertland.com,
	netdev@...r.kernel.org, sowmini.varadhan@...cle.com
Subject: optimizations to sk_buff handling in rds_tcp_data_ready


I was going back to Alexei's comment here:
  http://permalink.gmane.org/gmane.linux.network/387806
In rds-stress profiling,  we are indeed seeing the pksb_pull 
(from rds_tcp_data_recv) show up in the perf profile.

At least for rds-tcp, we cannot re-use the skb even if
it is not shared, because what we need to do is to carve out
the interesting bits (start at offset, and trim it to to_copy)
and queue up those interesting bits on the PF_RDS socket,
while allowing tcp_data_read to go back and read the next 
tcp segment (which may be part of the next rds dgram).

But, when  pskb_expand_head is invoked in the call-stack
  pskb_pull(.., offset) -> ... -> __pskb_pull_tail(.., delta) 
it will memcpy the offset bytes to the start of data. At least 
for the rds_tcp_data_recv, we are not interested in being able 
to do a *skb_push after the *skb_pull, so we dont really care 
about the intactness of these bytes in offset.
Thus what I am finding is that when delta > 0, if we skip the 
following in pskb_expand_head (for the rds-tcp recv case only!)

        /* Copy only real data... and, alas, header. This should be
         * optimized for the cases when header is void.
         */
        memcpy(data + nhead, skb->head, skb_tail_pointer(skb) - skb->head);

and also (only for this case!) this one in __pskb_pull_tail

 if (skb_copy_bits(skb, skb_headlen(skb), skb_tail_pointer(skb), delta))
                BUG();

I am able to get a 40% improvement in the measured IOPS (req/response 
transactions per second, using 8K byte requests, 256 byte responses,
16 concurrent threads), so this optimization seems worth doing.

Does my analysis above make sense? If yes, the question is, how to
achieve this bypass in a neat way.  Clearly there are many callers of
pskb_expand_head who will expect to find the skb_header_len bytes at
the start of data, but we also dont want to duplicate code in these
functions. One thought I had is to pass a flag arouund saying "caller
doesnt care about retaining offset bytes", and use this flag
- in __pskb_pull_tail, to avoid skb_copy_bits() above,  and to
  pass delta to pskb_expand_head,
- in pskb_expand_head, only do the memcpy listed above 
  if delta <= 0
Any other ideas for how to achieve this?

--Sowmini


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ