lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 01 Sep 2021 12:42:03 +0200
From:   Toke Høiland-Jørgensen <toke@...hat.com>
To:     John Fastabend <john.fastabend@...il.com>,
        Cong Wang <xiyou.wangcong@...il.com>,
        Martin KaFai Lau <kafai@...com>
Cc:     Linux Kernel Network Developers <netdev@...r.kernel.org>,
        bpf <bpf@...r.kernel.org>, Cong Wang <cong.wang@...edance.com>,
        Jamal Hadi Salim <jhs@...atatu.com>,
        Jiri Pirko <jiri@...nulli.us>
Subject: Re: [RFC Patch net-next] net_sched: introduce eBPF based Qdisc

John Fastabend <john.fastabend@...il.com> writes:

> Cong Wang wrote:
>> On Tue, Aug 24, 2021 at 4:47 PM Martin KaFai Lau <kafai@...com> wrote:
>> > Please explain more on this.  What is currently missing
>> > to make qdisc in struct_ops possible?
>> 
>> I think you misunderstand this point. The reason why I avoid it is
>> _not_ anything is missing, quite oppositely, it is because it requires
>> a lot of work to implement a Qdisc with struct_ops approach, literally
>> all those struct Qdisc_ops (not to mention struct Qdisc_class_ops).
>> WIth current approach, programmers only need to implement two
>> eBPF programs (enqueue and dequeue).
>> 
>> Thanks.
>
> Another idea. Rather than work with qdisc objects which creates all
> these issues with how to work with existing interfaces, filters, etc.
> Why not create an sk_buff map? Then this can be used from the existing
> egress/ingress hooks independent of the actual qdisc being used.

I agree. In fact, I'm working on doing just this for XDP, and I see no
reason why the map type couldn't be reused for skbs as well. Doing it
this way has a couple of benefits:

- It leaves more flexibility to BPF: want a simple FIFO queue? just
  implement that with a single queue map. Or do you want to build a full
  hierarchical queueing structure? Just instantiate as many queue maps
  as you need to achieve this. Etc.

- The behaviour is defined entirely by BPF program behaviour, and does
  not require setting up a qdisc hierarchy in addition to writing BPF
  code.

- It should be possible to structure the hooks in a way that allows
  reusing queueing algorithm implementations between the qdisc and XDP
  layers.

> You mention skb should not be exposed to userspace? Why? Whats the
> reason for this? Anyways we can make kernel only maps if we want or
> scrub the data before passing it to userspace. We do this already in
> some cases.

Yup, that's my approach as well.

> IMO it seems cleaner and more general to allow sk_buffs
> to be stored in maps and pulled back out later for enqueue/dequeue.

FWIW there's some gnarly details here (for instance, we need to make
sure the BPF program doesn't leak packet references after they are
dequeued from the map). My idea is to use a scheme similar to what we do
for XDP_REDIRECT, where a helper sets some hidden variables and doesn't
actually remove the packet from the queue until the BPF program exits
(so the kernel can make sure things are accounted correctly).

> I think one trick might be how to trigger the dequeue event on
> transition from stopped to running net_device or other events like
> this, but that could be solved with another program attached to
> those events to kick the dequeue logic.

This is actually easy in the qdisc case, I think: there's already a
qdisc_dequeue() operation, which just needs to execute a BPF program
that picks which packet to dequeue (by pulling it off a queue map). For
XDP we do need a new hook, on driver TX completion or something like
that. Details TBD. Also, we need a way to BPF to kick an idle interface
and make it start transmitting; that way we can implement a traffic
shaper (that delays packets) by using BPF timers :)

-Toke

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ