netdev - Re: [RFC PATCH 1/2] block: add support for redirecting IO completion through eBPF

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <113e46d4-2a90-a694-8a24-7a6a3c019e88@suse.de>
Date:   Wed, 16 Oct 2019 09:05:26 +0200
From:   Hannes Reinecke <hare@...e.de>
To:     Alexei Starovoitov <alexei.starovoitov@...il.com>,
        Hou Tao <houtao1@...wei.com>
Cc:     linux-block@...r.kernel.org, bpf <bpf@...r.kernel.org>,
        Network Development <netdev@...r.kernel.org>,
        Jens Axboe <axboe@...nel.dk>,
        Alexei Starovoitov <ast@...nel.org>, hare@...e.com,
        osandov@...com, ming.lei@...hat.com, damien.lemoal@....com,
        bvanassche <bvanassche@....org>,
        Daniel Borkmann <daniel@...earbox.net>,
        Martin KaFai Lau <kafai@...com>,
        Song Liu <songliubraving@...com>, Yonghong Song <yhs@...com>
Subject: Re: [RFC PATCH 1/2] block: add support for redirecting IO completion
 through eBPF

On 10/15/19 11:04 PM, Alexei Starovoitov wrote:
> On Mon, Oct 14, 2019 at 5:21 AM Hou Tao <houtao1@...wei.com> wrote:
>>
>> For network stack, RPS, namely Receive Packet Steering, is used to
>> distribute network protocol processing from hardware-interrupted CPU
>> to specific CPUs and alleviating soft-irq load of the interrupted CPU.
>>
>> For block layer, soft-irq (for single queue device) or hard-irq
>> (for multiple queue device) is used to handle IO completion, so
>> RPS will be useful when the soft-irq load or the hard-irq load
>> of a specific CPU is too high, or a specific CPU set is required
>> to handle IO completion.
>>
>> Instead of setting the CPU set used for handling IO completion
>> through sysfs or procfs, we can attach an eBPF program to the
>> request-queue, provide some useful info (e.g., the CPU
>> which submits the request) to the program, and let the program
>> decides the proper CPU for IO completion handling.
>>
>> Signed-off-by: Hou Tao <houtao1@...wei.com>
> ...
>>
>> +       rcu_read_lock();
>> +       prog = rcu_dereference_protected(q->prog, 1);
>> +       if (prog)
>> +               bpf_ccpu = BPF_PROG_RUN(q->prog, NULL);
>> +       rcu_read_unlock();
>> +
>>         cpu = get_cpu();
>> -       if (!test_bit(QUEUE_FLAG_SAME_FORCE, &q->queue_flags))
>> -               shared = cpus_share_cache(cpu, ctx->cpu);
>> +       if (bpf_ccpu < 0 || !cpu_online(bpf_ccpu)) {
>> +               ccpu = ctx->cpu;
>> +               if (!test_bit(QUEUE_FLAG_SAME_FORCE, &q->queue_flags))
>> +                       shared = cpus_share_cache(cpu, ctx->cpu);
>> +       } else
>> +               ccpu = bpf_ccpu;
>>
>> -       if (cpu != ctx->cpu && !shared && cpu_online(ctx->cpu)) {
>> +       if (cpu != ccpu && !shared && cpu_online(ccpu)) {
>>                 rq->csd.func = __blk_mq_complete_request_remote;
>>                 rq->csd.info = rq;
>>                 rq->csd.flags = 0;
>> -               smp_call_function_single_async(ctx->cpu, &rq->csd);
>> +               smp_call_function_single_async(ccpu, &rq->csd);
> 
> Interesting idea.
> Not sure whether such programability makes sense from
> block layer point of view.
> 
> From bpf side having a program with NULL input context is
> a bit odd. We never had such things in the past, so this patchset
> won't work as-is.
> Also no-input means that the program choices are quite limited.
> Other than round robin and random I cannot come up with other
> cpu selection ideas.
> I suggest to do writable tracepoint here instead.
> Take a look at trace_nbd_send_request.
> BPF prog can write into 'request'.
> For your use case it will be able to write into 'bpf_ccpu' local variable.
> If you keep it as raw tracepoint and don't add the actual tracepoint
> with TP_STRUCT__entry and TP_fast_assign then it won't be abi
> and you can change it later or remove it altogether.
> 
That basically was my idea, too.

Actually I was coming from a different angle, namely trying to figure
out how we could do generic error injection in the block layer.
eBPF would be one way of doing it, kprobes another.

But writable trace events ... I'll have to check if we can leverage that
here, too.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      Teamlead Storage & Networking
hare@...e.de			                  +49 911 74053 688
SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 247165 (AG München), GF: Felix Imendörffer