[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ac2c7081-8e6b-76c6-e032-ed2be3727e4d@fb.com>
Date: Tue, 18 Aug 2020 17:30:37 -0700
From: Yonghong Song <yhs@...com>
To: Alexei Starovoitov <alexei.starovoitov@...il.com>
CC: <bpf@...r.kernel.org>, <netdev@...r.kernel.org>,
Alexei Starovoitov <ast@...nel.org>,
Daniel Borkmann <daniel@...earbox.net>, <kernel-team@...com>,
"Paul E . McKenney" <paulmck@...nel.org>
Subject: Re: [PATCH bpf v2 1/3] bpf: fix a rcu_sched stall issue with bpf
task/task_file iterator
On 8/18/20 5:05 PM, Alexei Starovoitov wrote:
> On Tue, Aug 18, 2020 at 03:23:09PM -0700, Yonghong Song wrote:
>>
>> We did not use cond_resched() since for some iterators, e.g.,
>> netlink iterator, where rcu read_lock critical section spans between
>> consecutive seq_ops->next(), which makes impossible to do cond_resched()
>> in the key while loop of function bpf_seq_read().
>
> but after this patch we can, right?
We can do cond_resched() after seq->op->stop(). See more below.
>
>>
>> +/* maximum visited objects before bailing out */
>> +#define MAX_ITER_OBJECTS 1000000
>> +
>> /* bpf_seq_read, a customized and simpler version for bpf iterator.
>> * no_llseek is assumed for this file.
>> * The following are differences from seq_read():
>> @@ -79,7 +82,7 @@ static ssize_t bpf_seq_read(struct file *file, char __user *buf, size_t size,
>> {
>> struct seq_file *seq = file->private_data;
>> size_t n, offs, copied = 0;
>> - int err = 0;
>> + int err = 0, num_objs = 0;
>> void *p;
>>
>> mutex_lock(&seq->lock);
>> @@ -135,6 +138,7 @@ static ssize_t bpf_seq_read(struct file *file, char __user *buf, size_t size,
>> while (1) {
>> loff_t pos = seq->index;
>>
>> + num_objs++;
>> offs = seq->count;
>> p = seq->op->next(seq, p, &seq->index);
>> if (pos == seq->index) {
>> @@ -153,6 +157,15 @@ static ssize_t bpf_seq_read(struct file *file, char __user *buf, size_t size,
>> if (seq->count >= size)
>> break;
>>
>> + if (num_objs >= MAX_ITER_OBJECTS) {
>> + if (offs == 0) {
>> + err = -EAGAIN;
>> + seq->op->stop(seq, p);
>> + goto done;
>> + }
>> + break;
>> + }
>> +
>
> should this block be after op->show() and error processing?
> Otherwise bpf_iter_inc_seq_num() will be incorrectly incremented?
The purpose of op->next() is to calculate the "next" object position,
stored in the seq private data. So for next read() syscall, start()
will try to fetch the data based on the info in seq private data.
This is true for conditions "if (seq->count >= size) break"
in the above so next op->start() can try to locate the correct
object. The same is for this -EAGAIN thing.
>
>> err = seq->op->show(seq, p);
>> if (err > 0) {
>> bpf_iter_dec_seq_num(seq);
>
> After op->stop() we can do cond_resched() in all cases,
> since rhashtable walk does rcu_unlock in stop() callback, right?
Yes, we can. I am thinking since we return to user space,
cond_resched() might not be needed since returning to user space
will trigger some kind of scheduling. This patch fixed
the rcu stall issue. But if my understanding is incorrect,
I am happy to add cond_reched().
> I think copy_to_user() and mutex_unlock() don't do cond_resched()
> equivalent work.
>
Powered by blists - more mailing lists