netdev - Re: [PATCH v6 bpf-next 6/6] selftests/bpf: add test for bpf_seq_printf

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.LRH.2.21.2009281500220.13299@localhost>
Date:   Mon, 28 Sep 2020 15:12:04 +0100 (IST)
From:   Alan Maguire <alan.maguire@...cle.com>
To:     Alexei Starovoitov <alexei.starovoitov@...il.com>
cc:     Alan Maguire <alan.maguire@...cle.com>, ast@...nel.org,
        daniel@...earbox.net, andriin@...com, yhs@...com,
        linux@...musvillemoes.dk, andriy.shevchenko@...ux.intel.com,
        pmladek@...e.com, kafai@...com, songliubraving@...com,
        john.fastabend@...il.com, kpsingh@...omium.org, shuah@...nel.org,
        rdna@...com, scott.branden@...adcom.com, quentin@...valent.com,
        cneirabustos@...il.com, jakub@...udflare.com, mingo@...hat.com,
        rostedt@...dmis.org, bpf@...r.kernel.org, netdev@...r.kernel.org,
        linux-kernel@...r.kernel.org, linux-kselftest@...r.kernel.org,
        acme@...nel.org
Subject: Re: [PATCH v6 bpf-next 6/6] selftests/bpf: add test for bpf_seq_printf_btf
 helper



On Thu, 24 Sep 2020, Alexei Starovoitov wrote:

> to whatever number, but printing single task_struct needs ~800 lines and
> ~18kbytes. Humans can scroll through that much spam, but can we make it less
> verbose by default somehow?
> May be not in this patch set, but in the follow up?
>

One approach that might work would be to devote 4 bits or so of
flag space to a "maximum depth" specifier; i.e. at depth 1,
only base types are displayed, no aggregate types like arrays,
structs and unions.  We've already got depth processing in the
code to figure out if possibly zeroed nested data needs to be
displayed, so it should hopefully be a simple follow-up.

One way to express it would be to use "..." to denote field(s)
were omitted. We could even use the number of "."s to denote
cases where multiple fields were omitted, giving a visual sense
of how much data was omitted.  So for example with 
BTF_F_MAX_DEPTH(1), task_struct looks like this:

(struct task_struct){
 .state = ()1,
 .stack = ( *)0x00000000029d1e6f,
 ...
 .flags = (unsigned int)4194560,
 ...
 .cpu = (unsigned int)36,
 .wakee_flips = (unsigned int)11,
 .wakee_flip_decay_ts = (long unsigned int)4294914874,
 .last_wakee = (struct task_struct *)0x000000006c7dfe6d,
 .recent_used_cpu = (int)19,
 .wake_cpu = (int)36,
 .prio = (int)120,
 .static_prio = (int)120,
 .normal_prio = (int)120,
 .sched_class = (struct sched_class *)0x00000000ad1561e6,
 ...
 .exec_start = (u64)674402577156,
 .sum_exec_runtime = (u64)5009664110,
 .vruntime = (u64)167038057,
 .prev_sum_exec_runtime = (u64)5009578167,
 .nr_migrations = (u64)54,
 .depth = (int)1,
 .parent = (struct sched_entity *)0x00000000cba60e7d,
 .cfs_rq = (struct cfs_rq *)0x0000000014f353ed,
 ...
 
...etc. What do you think?

> > +SEC("iter/task")
> > +int dump_task_fs_struct(struct bpf_iter__task *ctx)
> > +{
> > +	static const char fs_type[] = "struct fs_struct";
> > +	struct seq_file *seq = ctx->meta->seq;
> > +	struct task_struct *task = ctx->task;
> > +	struct fs_struct *fs = (void *)0;
> > +	static struct btf_ptr ptr = { };
> > +	long ret;
> > +
> > +	if (task)
> > +		fs = task->fs;
> > +
> > +	ptr.type = fs_type;
> > +	ptr.ptr = fs;
> 
> imo the following is better:
>        ptr.type_id = __builtin_btf_type_id(*fs, 1);
>        ptr.ptr = fs;
> 

I'm still seeing lookup failures using __builtin_btf_type_id(,1) -
whereas both __builtin_btf_type_id(,0) and Andrii's
suggestion of bpf_core_type_id_kernel() work. Not sure what's
going on - pahole is v1.17, clang is

clang version 12.0.0 (/mnt/src/llvm-project/clang 
7ab7b979d29e1e43701cf690f5cf1903740f50e3)

> > +
> > +	if (ctx->meta->seq_num == 0)
> > +		BPF_SEQ_PRINTF(seq, "Raw BTF fs_struct per task\n");
> > +
> > +	ret = bpf_seq_printf_btf(seq, &ptr, sizeof(ptr), 0);
> > +	switch (ret) {
> > +	case 0:
> > +		tasks++;
> > +		break;
> > +	case -ERANGE:
> > +		/* NULL task or task->fs, don't count it as an error. */
> > +		break;
> > +	default:
> > +		seq_err = ret;
> > +		break;
> > +	}
> 
> Please add handling of E2BIG to this switch. Otherwise
> printing large amount of tiny structs will overflow PAGE_SIZE and E2BIG
> will be send to user space.
> Like this:
> @@ -40,6 +40,8 @@ int dump_task_fs_struct(struct bpf_iter__task *ctx)
>         case -ERANGE:
>                 /* NULL task or task->fs, don't count it as an error. */
>                 break;
> +       case -E2BIG:
> +               return 1;
> 

Done.

> Also please change bpf_seq_read() like this:
> diff --git a/kernel/bpf/bpf_iter.c b/kernel/bpf/bpf_iter.c
> index 30833bbf3019..8f10e30ea0b0 100644
> --- a/kernel/bpf/bpf_iter.c
> +++ b/kernel/bpf/bpf_iter.c
> @@ -88,8 +88,8 @@ static ssize_t bpf_seq_read(struct file *file, char __user *buf, size_t size,
>         mutex_lock(&seq->lock);
> 
>         if (!seq->buf) {
> -               seq->size = PAGE_SIZE;
> -               seq->buf = kmalloc(seq->size, GFP_KERNEL);
> +               seq->size = PAGE_SIZE << 3;
> +               seq->buf = kvmalloc(seq->size, GFP_KERNEL);
> 
> So users can print task_struct by default.
> Hopefully we will figure out how to deal with spam later.
> 

Thanks for all the help and suggestions! I didn't want to
attribute the patch bumping seq size in v7 to you without your 
permission, but it's all your work so if I need to respin let me
know if you'd like me to fix that. Thanks again!

Alan