netdev - Re: [PATCH bpf v2 1/3] bpf: fix a rcu_sched stall issue with bpf task/task

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <ac2c7081-8e6b-76c6-e032-ed2be3727e4d@fb.com>
Date:   Tue, 18 Aug 2020 17:30:37 -0700
From:   Yonghong Song <yhs@...com>
To:     Alexei Starovoitov <alexei.starovoitov@...il.com>
CC:     <bpf@...r.kernel.org>, <netdev@...r.kernel.org>,
        Alexei Starovoitov <ast@...nel.org>,
        Daniel Borkmann <daniel@...earbox.net>, <kernel-team@...com>,
        "Paul E . McKenney" <paulmck@...nel.org>
Subject: Re: [PATCH bpf v2 1/3] bpf: fix a rcu_sched stall issue with bpf
 task/task_file iterator



On 8/18/20 5:05 PM, Alexei Starovoitov wrote:
> On Tue, Aug 18, 2020 at 03:23:09PM -0700, Yonghong Song wrote:
>>
>> We did not use cond_resched() since for some iterators, e.g.,
>> netlink iterator, where rcu read_lock critical section spans between
>> consecutive seq_ops->next(), which makes impossible to do cond_resched()
>> in the key while loop of function bpf_seq_read().
> 
> but after this patch we can, right?

We can do cond_resched() after seq->op->stop(). See more below.

> 
>>   
>> +/* maximum visited objects before bailing out */
>> +#define MAX_ITER_OBJECTS	1000000
>> +
>>   /* bpf_seq_read, a customized and simpler version for bpf iterator.
>>    * no_llseek is assumed for this file.
>>    * The following are differences from seq_read():
>> @@ -79,7 +82,7 @@ static ssize_t bpf_seq_read(struct file *file, char __user *buf, size_t size,
>>   {
>>   	struct seq_file *seq = file->private_data;
>>   	size_t n, offs, copied = 0;
>> -	int err = 0;
>> +	int err = 0, num_objs = 0;
>>   	void *p;
>>   
>>   	mutex_lock(&seq->lock);
>> @@ -135,6 +138,7 @@ static ssize_t bpf_seq_read(struct file *file, char __user *buf, size_t size,
>>   	while (1) {
>>   		loff_t pos = seq->index;
>>   
>> +		num_objs++;
>>   		offs = seq->count;
>>   		p = seq->op->next(seq, p, &seq->index);
>>   		if (pos == seq->index) {
>> @@ -153,6 +157,15 @@ static ssize_t bpf_seq_read(struct file *file, char __user *buf, size_t size,
>>   		if (seq->count >= size)
>>   			break;
>>   
>> +		if (num_objs >= MAX_ITER_OBJECTS) {
>> +			if (offs == 0) {
>> +				err = -EAGAIN;
>> +				seq->op->stop(seq, p);
>> +				goto done;
>> +			}
>> +			break;
>> +		}
>> +
> 
> should this block be after op->show() and error processing?
> Otherwise bpf_iter_inc_seq_num() will be incorrectly incremented?

The purpose of op->next() is to calculate the "next" object position,
stored in the seq private data. So for next read() syscall, start()
will try to fetch the data based on the info in seq private data.

This is true for conditions "if (seq->count >= size) break"
in the above so next op->start() can try to locate the correct
object. The same is for this -EAGAIN thing.

> 
>>   		err = seq->op->show(seq, p);
>>   		if (err > 0) {
>>   			bpf_iter_dec_seq_num(seq);
> 
> After op->stop() we can do cond_resched() in all cases,
> since rhashtable walk does rcu_unlock in stop() callback, right?

Yes, we can. I am thinking since we return to user space,
cond_resched() might not be needed since returning to user space
will trigger some kind of scheduling. This patch fixed
the rcu stall issue. But if my understanding is incorrect,
I am happy to add cond_reched().

> I think copy_to_user() and mutex_unlock() don't do cond_resched()
> equivalent work.
>