linux-kernel - Re: [PATCH bpf-next v2 2/3] bpf: btf: add btf print functionality

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20180703152331.151d1c4b@cakuba.netronome.com>
Date:   Tue, 3 Jul 2018 15:23:31 -0700
From:   Jakub Kicinski <jakub.kicinski@...ronome.com>
To:     Okash Khawaja <osk@...com>
Cc:     Daniel Borkmann <daniel@...earbox.net>,
        Martin KaFai Lau <kafai@...com>,
        Alexei Starovoitov <ast@...nel.org>,
        Yonghong Song <yhs@...com>,
        "Quentin Monnet" <quentin.monnet@...ronome.com>,
        "David S. Miller" <davem@...emloft.net>, <netdev@...r.kernel.org>,
        <kernel-team@...com>, <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH bpf-next v2 2/3] bpf: btf: add btf print functionality

On Tue, 3 Jul 2018 22:46:00 +0100, Okash Khawaja wrote:
> On Mon, Jul 02, 2018 at 10:06:59PM -0700, Jakub Kicinski wrote:
> > On Mon, 2 Jul 2018 11:39:15 -0700, Okash Khawaja wrote:  
> > > +#define BITS_PER_BYTE_MASK (BITS_PER_BYTE - 1)
> > > +#define BITS_PER_BYTE_MASKED(bits) ((bits) & BITS_PER_BYTE_MASK)  
> > 
> > Perhaps it's just me but BIT_OFFSET or BIT_COUNT as a name of this macro
> > would make it more obvious to parse in the code below.  
> I don't mind either. However these macro names are also used inside
> kernel for same purpose. For sake of consistency, I'd recommend we keep
> them :)

Ugh, okay :)

> > > +	} print_num;
> > > +
> > > +	total_bits_offset = bit_offset + BTF_INT_OFFSET(int_type);
> > > +	data += BITS_ROUNDDOWN_BYTES(total_bits_offset);
> > > +	bit_offset = BITS_PER_BYTE_MASKED(total_bits_offset);
> > > +	bits_to_copy = bits + bit_offset;
> > > +	bytes_to_copy = BITS_ROUNDUP_BYTES(bits_to_copy);
> > > +
> > > +	print_num.u64_num = 0;
> > > +	memcpy(&print_num.u64_num, data, bytes_to_copy);  
> > 
> > This scheme is unlikely to work on big endian machines...  
> Can you give an example how?

On BE:

Input:         [0x01, 0x82]
Bit length:    15
Bytes to copy:  2
bit_offset:     0
upper_bits:     7

print_num.u64_num = 0;
# [0, 0, 0, 0,   0, 0, 0, 0]

memcpy(&print_num.u64_num, data, bytes_to_copy);  
# [0x01, 0x82, 0, 0,   0, 0, 0, 0]

mask = (1 << upper_bits) - 1;
# mask = 0x7f

print_num.u8_nums[bytes_to_copy - 1] &= mask;
# [0x01, 0x02, 0, 0,   0, 0, 0, 0]

printf("0x%llx", print_num.u64_num);
# 0x0102000000000000 AKA 72620543991349248
# expected:
# 0x0102             AKA 258

Am I missing something?

> > > +	upper_bits = BITS_PER_BYTE_MASKED(bits_to_copy);
> > > +	if (upper_bits) {
> > > +		uint8_t mask = (1 << upper_bits) - 1;
> > > +
> > > +		print_num.u8_nums[bytes_to_copy - 1] &= mask;
> > > +	}
> > > +
> > > +	print_num.u64_num >>= bit_offset;
> > > +
> > > +	if (is_plain_text)
> > > +		jsonw_printf(jw, "0x%llx", print_num.u64_num);
> > > +	else
> > > +		jsonw_printf(jw, "%llu", print_num.u64_num);
> > > +}
> > > +
> > > +static int btf_dumper_int(const struct btf_type *t, uint8_t bit_offset,
> > > +			  const void *data, json_writer_t *jw,
> > > +			  bool is_plain_text)
> > > +{
> > > +	uint32_t *int_type = (uint32_t *)(t + 1);
> > > +	uint32_t bits = BTF_INT_BITS(*int_type);
> > > +	int ret = 0;
> > > +
> > > +	/* if this is bit field */
> > > +	if (bit_offset || BTF_INT_OFFSET(*int_type) ||
> > > +	    BITS_PER_BYTE_MASKED(bits)) {
> > > +		btf_dumper_int_bits(*int_type, bit_offset, data, jw,
> > > +				    is_plain_text);
> > > +		return ret;
> > > +	}
> > > +
> > > +	switch (BTF_INT_ENCODING(*int_type)) {
> > > +	case 0:
> > > +		if (BTF_INT_BITS(*int_type) == 64)
> > > +			jsonw_printf(jw, "%lu", *((uint64_t *)data));
> > > +		else if (BTF_INT_BITS(*int_type) == 32)
> > > +			jsonw_printf(jw, "%u", *((uint32_t *)data));
> > > +		else if (BTF_INT_BITS(*int_type) == 16)
> > > +			jsonw_printf(jw, "%hu", *((uint16_t *)data));
> > > +		else if (BTF_INT_BITS(*int_type) == 8)
> > > +			jsonw_printf(jw, "%hhu", *((uint8_t *)data));
> > > +		else
> > > +			btf_dumper_int_bits(*int_type, bit_offset, data, jw,
> > > +					    is_plain_text);
> > > +		break;
> > > +	case BTF_INT_SIGNED:
> > > +		if (BTF_INT_BITS(*int_type) == 64)
> > > +			jsonw_printf(jw, "%ld", *((int64_t *)data));
> > > +		else if (BTF_INT_BITS(*int_type) == 32)
> > > +			jsonw_printf(jw, "%d", *((int32_t *)data));
> > > +		else if (BTF_INT_BITS(*int_type) ==  16)  
> > 
> > Please drop the double space.  Both for 16 where it makes no sense and
> > for 8 where it's marginally useful but not really.
> >   
> > > +			jsonw_printf(jw, "%hd", *((int16_t *)data));
> > > +		else if (BTF_INT_BITS(*int_type) ==  8)
> > > +			jsonw_printf(jw, "%hhd", *((int8_t *)data));
> > > +		else
> > > +			btf_dumper_int_bits(*int_type, bit_offset, data, jw,
> > > +					    is_plain_text);
> > > +		break;
> > > +	case BTF_INT_CHAR:
> > > +		if (*((char *)data) == '\0')
> > > +			jsonw_null(jw);  
> > 
> > Mm.. I don't think 0 char is equivalent to null.  
> Yes, thanks. Will fix.
> 
> >   
> > > +		else if (isprint(*((char *)data)))
> > > +			jsonw_printf(jw, "\"%c\"", *((char *)data));  
> > 
> > This looks very suspicious.  So if I see a "6" for a char field it's
> > either a 6 ('\u0006') or a 54 ('6')...  
> It will always be 54. May be I missed your point. Could you explain why
> it would be other than 54?

Ah, I think I missed that %c is in quotes...

> > > +		else
> > > +			if (is_plain_text)
> > > +				jsonw_printf(jw, "%hhx", *((char *)data));

This seems to be missing a "0x" prefix?

> > > +			else
> > > +				jsonw_printf(jw, "%hhd", *((char *)data));  
> > 
> > ... I think you need to always print a string, and express it as
> > \u00%02hhx for non-printable.  
> Okay that makes sense

Yeah, IDK, char can be used as a byte as well as a string.  In eBPF
it may actually be more likely to just be used as a raw byte buffer...
Either way I think it may be nice to keep it consistent, at least for
the JSON output could we do either always ints or always characters?