lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 11 Oct 2018 21:56:25 -0700
From:   John Fastabend <john.fastabend@...il.com>
To:     Alexei Starovoitov <alexei.starovoitov@...il.com>,
        Daniel Borkmann <daniel@...earbox.net>
Cc:     davejwatson@...com, netdev@...r.kernel.org
Subject: Re: [PATCH bpf-next 3/8] bpf, sockmap: convert to generic sk_msg
 interface

On 10/11/2018 03:57 PM, Alexei Starovoitov wrote:
> On Thu, Oct 11, 2018 at 02:45:42AM +0200, Daniel Borkmann wrote:
>> Add a generic sk_msg layer, and convert current sockmap and later
>> kTLS over to make use of it. While sk_buff handles network packet
>> representation from netdevice up to socket, sk_msg handles data
>> representation from application to socket layer.
>>
>> This means that sk_msg framework spans across ULP users in the
>> kernel, and enables features such as introspection or filtering
>> of data with the help of BPF programs that operate on this data
>> structure.
>>
>> Latter becomes in particular useful for kTLS where data encryption
>> is deferred into the kernel, and as such enabling the kernel to
>> perform L7 introspection and policy based on BPF for TLS connections
>> where the record is being encrypted after BPF has run and came to
>> a verdict. In order to get there, first step is to transform open
>> coding of scatter-gather list handling into a common core framework
>> that subsystems use.
>>
>> Joint work with John.
>>
>> Signed-off-by: Daniel Borkmann <daniel@...earbox.net>
>> Signed-off-by: John Fastabend <john.fastabend@...il.com>
>> ---
>>  include/linux/bpf.h       |   33 +-
>>  include/linux/bpf_types.h |    2 +-
>>  include/linux/filter.h    |   21 -
>>  include/linux/skmsg.h     |  371 +++++++
>>  include/net/tcp.h         |   27 +
>>  kernel/bpf/Makefile       |    5 -
>>  kernel/bpf/core.c         |    2 -
>>  kernel/bpf/sockmap.c      | 2610 ---------------------------------------------
>>  kernel/bpf/syscall.c      |    6 +-
>>  net/Kconfig               |   11 +
>>  net/core/Makefile         |    2 +
>>  net/core/filter.c         |  270 ++---
>>  net/core/skmsg.c          |  763 +++++++++++++
>>  net/core/sock_map.c       | 1002 +++++++++++++++++
>>  net/ipv4/Makefile         |    1 +
>>  net/ipv4/tcp_bpf.c        |  655 ++++++++++++
>>  net/strparser/Kconfig     |    4 +-
>>  17 files changed, 2925 insertions(+), 2860 deletions(-)
> 
>> +void sk_msg_trim(struct sock *sk, struct sk_msg *msg, int len)
>> +{
>> +	int trim = msg->sg.size - len;
>> +	u32 i = msg->sg.end;
>> +
>> +	if (trim <= 0) {
>> +		WARN_ON(trim < 0);
>> +		return;
>> +	}
>> +
>> +	sk_msg_iter_var_prev(i);
>> +	msg->sg.size = len;
>> +	while (msg->sg.data[i].length &&
>> +	       trim >= msg->sg.data[i].length) {
>> +		trim -= msg->sg.data[i].length;
>> +		sk_msg_free_elem(sk, msg, i, true);
>> +		sk_msg_iter_var_prev(i);
>> +		if (!trim)
>> +			goto out;
>> +	}
>> +
>> +	msg->sg.data[i].length -= trim;
>> +	sk_mem_uncharge(sk, trim);
>> +out:
>> +	/* If we trim data before curr pointer update copybreak and current
>> +	 * so that any future copy operations start at new copy location.
>> +	 * However trimed data that has not yet been used in a copy op
>> +	 * does not require an update.
>> +	 */
>> +	if (msg->sg.curr >= i) {
>> +		msg->sg.curr = i;//msg->sg.end;
> 
> is this a leftover of some debugging ?

Correct the comment needs to be removed.

> 
> I think such giant patchset is impossible to review in reasonable
> amount of time. I guess you've considered splitting it, but
> couldn't find a way to do so ?

Right, so I looked at splitting this up but it was hard to make
any sense out of how to split this up well. This patch is moving existing
code in ./kernel/bpf/sockmap.c into skmsg.c, sock_map.c, and tcp_bpf.c
Along the way some of the naming/structures are changed a bit to
fit into the new file layout. Splitting it up to just move bits and
pieces at a time didn't seem to help much, made it more difficult
to review IMO, and I also had trouble breaking it into isolated changes.

> 
> May be expand the commit log a bit more to explain not only _why_
> (which is typical requiremnt), but _how_ the patchset is doing it?
> 

Sounds like a good idea Daniel and I can work something up tomorrow.
The gist is to isolate the sock_map changes a bit so they can be
used by multiple ULPs (as noted in the commit log). But the how here
is to create a sock_map.c file to manage BPF map APIs, a skmsg.c
file full of all the generic code used with sk_msg structs (the
core structure throughout) and finally tcp_bpf to handle the TCP
specific parts. The nice fallout of all this is we then can add
_just_ the TLS bits in the subsequent patches. Also note the overall
+/- diff for the entire series is actually fairly small. I also
believe this is going to be very useful adding more socket types to
sock map, UDP for instance.

Along the way the naming is cleaned up to reflect the above changes.


> Overall sk_msg generalization looks great to me. Especially
> considering how it's used in the next patches,
> but details are scary. sockmap was plagued with bugs that
> syzbot found. Are we changing it a lot again or it's mainly
> copy-paste/split into different files plus rename?
> 

There should not be any functional changes here just copy-paste,
split into above noted files and renames. Besides all the self
tests I've been running this in Cilium as well now for some time.

> I'm willing to take a leap of faith and merge it, but would
> love if commit log could make it easier to see what is going on.
> 

Great. Lets get some extra details in the commit log!

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ