[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <84032c64-8e5e-6ad1-63ea-57adee7a2875@huawei.com>
Date: Mon, 21 Oct 2019 21:42:03 +0800
From: Hou Tao <houtao1@...wei.com>
To: Alexei Starovoitov <alexei.starovoitov@...il.com>
CC: <linux-block@...r.kernel.org>, bpf <bpf@...r.kernel.org>,
"Network Development" <netdev@...r.kernel.org>,
Jens Axboe <axboe@...nel.dk>,
"Alexei Starovoitov" <ast@...nel.org>, <hare@...e.com>,
<osandov@...com>, <ming.lei@...hat.com>, <damien.lemoal@....com>,
bvanassche <bvanassche@....org>,
Daniel Borkmann <daniel@...earbox.net>,
"Martin KaFai Lau" <kafai@...com>,
Song Liu <songliubraving@...com>, Yonghong Song <yhs@...com>
Subject: Re: [RFC PATCH 1/2] block: add support for redirecting IO completion
through eBPF
Hi,
On 2019/10/16 5:04, Alexei Starovoitov wrote:
> On Mon, Oct 14, 2019 at 5:21 AM Hou Tao <houtao1@...wei.com> wrote:
>>
>> For network stack, RPS, namely Receive Packet Steering, is used to
>> distribute network protocol processing from hardware-interrupted CPU
>> to specific CPUs and alleviating soft-irq load of the interrupted CPU.
>>
>> For block layer, soft-irq (for single queue device) or hard-irq
>> (for multiple queue device) is used to handle IO completion, so
>> RPS will be useful when the soft-irq load or the hard-irq load
>> of a specific CPU is too high, or a specific CPU set is required
>> to handle IO completion.
>>
>> Instead of setting the CPU set used for handling IO completion
>> through sysfs or procfs, we can attach an eBPF program to the
>> request-queue, provide some useful info (e.g., the CPU
>> which submits the request) to the program, and let the program
>> decides the proper CPU for IO completion handling.
>>
>> Signed-off-by: Hou Tao <houtao1@...wei.com>
> ...
>>
>> + rcu_read_lock();
>> + prog = rcu_dereference_protected(q->prog, 1);
>> + if (prog)
>> + bpf_ccpu = BPF_PROG_RUN(q->prog, NULL);
>> + rcu_read_unlock();
>> +
>> cpu = get_cpu();
>> - if (!test_bit(QUEUE_FLAG_SAME_FORCE, &q->queue_flags))
>> - shared = cpus_share_cache(cpu, ctx->cpu);
>> + if (bpf_ccpu < 0 || !cpu_online(bpf_ccpu)) {
>> + ccpu = ctx->cpu;
>> + if (!test_bit(QUEUE_FLAG_SAME_FORCE, &q->queue_flags))
>> + shared = cpus_share_cache(cpu, ctx->cpu);
>> + } else
>> + ccpu = bpf_ccpu;
>>
>> - if (cpu != ctx->cpu && !shared && cpu_online(ctx->cpu)) {
>> + if (cpu != ccpu && !shared && cpu_online(ccpu)) {
>> rq->csd.func = __blk_mq_complete_request_remote;
>> rq->csd.info = rq;
>> rq->csd.flags = 0;
>> - smp_call_function_single_async(ctx->cpu, &rq->csd);
>> + smp_call_function_single_async(ccpu, &rq->csd);
>
> Interesting idea.
> Not sure whether such programability makes sense from
> block layer point of view.
>
>>>From bpf side having a program with NULL input context is
> a bit odd. We never had such things in the past, so this patchset
> won't work as-is.
No, it just works.
> Also no-input means that the program choices are quite limited.
> Other than round robin and random I cannot come up with other
> cpu selection idea> I suggest to do writable tracepoint here instead.
> Take a look at trace_nbd_send_request.
> BPF prog can write into 'request'.
> For your use case it will be able to write into 'bpf_ccpu' local variable.
> If you keep it as raw tracepoint and don't add the actual tracepoint
> with TP_STRUCT__entry and TP_fast_assign then it won't be abi
> and you can change it later or remove it altogether.
>
Your suggestion is much simpler, so there will be no need for adding a new
program type, and all things need to be done are adding a raw tracepoint,
moving bpf_ccpu into struct request, and letting a BPF program to modify it.
I will try and thanks for your suggestions.
Regards,
Tao
> .
>
Powered by blists - more mailing lists