lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <55E75C1D.8020006@mojatatu.com>
Date:	Wed, 2 Sep 2015 16:29:17 -0400
From:	Jamal Hadi Salim <jhs@...atatu.com>
To:	Cong Wang <cwang@...pensource.com>,
	Daniel Borkmann <daniel@...earbox.net>
Cc:	David Miller <davem@...emloft.net>,
	john fastabend <john.fastabend@...il.com>,
	Alexei Starovoitov <ast@...mgrid.com>,
	netdev <netdev@...r.kernel.org>,
	John Fastabend <john.r.fastabend@...el.com>
Subject: Re: [PATCH net-next 1/4] net: qdisc: add op to run filters/actions
 before enqueue

On 09/02/15 02:22, Cong Wang wrote:
> (Why not Cc'ing Jamal for net_sched pathes?)
>
> On Tue, Sep 1, 2015 at 9:34 AM, Daniel Borkmann <daniel@...earbox.net> wrote:
>> From: John Fastabend <john.r.fastabend@...el.com>
>>
>> Add a new ->preclassify() op to allow multiqueue queuing disciplines
>> to call tc_classify() or perform other work before dev_pick_tx().
>>
>> This helps, for example, with mqprio queueing discipline that has
>> offload support by most popular 10G NICs, where the txq effectively
>> picks the qdisc.
>>
>> Once traffic is being directed to a specific queue then hardware TX
>> rings may be tuned to support this traffic type. mqprio already
>> gives the ability to do this via skb->priority where the ->preclassify()
>> provides more control over packet steering, it can classify the skb
>> and set the priority, for example, from an eBPF classifier (or action).
>>
>> Also this allows traffic classifiers to be run without holding the
>> qdisc lock and gives one place to attach filters when mqprio is
>> in use. ->preclassify() could also be added to other mq qdiscs later
>> on: f.e. most classful qdiscs first check major/minor numbers of
>> skb->priority before actually consulting a more complex classifier.
>>
>> For mqprio case today, a filter has to be attached to each txq qdisc
>> to have all traffic hit the filter. Since ->preclassify() is currently
>> only used by mqprio, the __dev_queue_xmit() fast path is guarded by
>> a generic, hidden Kconfig option (NET_CLS_PRECLASSIFY) that is only
>> selected by mqprio, otherwise it defaults to off. Also, the Qdisc
>> structure size will stay the same, we move __parent, used by cbq only
>> into a write-mostly hole. If actions are enabled, __parent is written
>> on every enqueue, and only read, rewritten in reshape_fail() phase.
>> Therefore, this place in the read-mostly cacheline could be used by
>> preclassify, which is written only once.
>>
>
> I don't like this approach. Ideally, qdisc layer should be totally
> on top of tx queues, which means tx queue selection should
> happen after dequeue. I looked at this before, the change is not
> trivial at all given the fact that qdisc ties too much with tx queue
> probably due to historical reasons, especially the tx softirq part.
> But that is really a long-term solution for me.
>
> I have no big objection for this as a short-term solution, however,
> once we add these filters before enqueue, we can't remove them
> any more. We really need to think twice about it.
>
> Jamal, do you have any better idea?
>

Sorry for the top quote:
Given the rcu-fication of classifiers i believe the idea will
mostly work; expect user will go nuts sticking all kinds of
classifiers and actions that wont work (example, I dont think
connmark action would work nicely here).
Could we strive to do proper offload ala switchdev?

The comment on the patch on reshape_fail + __parent: for the
record, that is an extremely useful feature (allows an inner qdisc
to provide an opportunity for a classful parent qdisc to
reclassify and therefore reschedule).
Yes, CBQ is the only user - but maybe if it was properly documented
more schedulers could put it to good use.

cheers,
jamal

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ