lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 15 Oct 2021 01:24:49 +0200
From:   Toke Høiland-Jørgensen <toke@...hat.com>
To:     Eric Dumazet <edumazet@...gle.com>
Cc:     Eric Dumazet <eric.dumazet@...il.com>,
        "David S . Miller" <davem@...emloft.net>,
        Jakub Kicinski <kuba@...nel.org>,
        netdev <netdev@...r.kernel.org>,
        Neal Cardwell <ncardwell@...gle.com>,
        Ingemar Johansson S <ingemar.s.johansson@...csson.com>,
        Tom Henderson <tomh@...h.org>, Bob Briscoe <in@...briscoe.net>
Subject: Re: [PATCH net-next 2/2] fq_codel: implement L4S style
 ce_threshold_ect1 marking

Eric Dumazet <edumazet@...gle.com> writes:

> On Thu, Oct 14, 2021 at 12:54 PM Toke Høiland-Jørgensen <toke@...hat.com> wrote:
>>
>> Eric Dumazet <eric.dumazet@...il.com> writes:
>>
>> > From: Eric Dumazet <edumazet@...gle.com>
>> >
>> > Add TCA_FQ_CODEL_CE_THRESHOLD_ECT1 boolean option to select Low Latency,
>> > Low Loss, Scalable Throughput (L4S) style marking, along with ce_threshold.
>> >
>> > If enabled, only packets with ECT(1) can be transformed to CE
>> > if their sojourn time is above the ce_threshold.
>> >
>> > Note that this new option does not change rules for codel law.
>> > In particular, if TCA_FQ_CODEL_ECN is left enabled (this is
>> > the default when fq_codel qdisc is created), ECT(0) packets can
>> > still get CE if codel law (as governed by limit/target) decides so.
>>
>> The ability to have certain packets receive a shallow marking threshold
>> and others regular ECN semantics is no doubt useful. However, given that
>> it is by no means certain how the L4S experiment will pan out (and I for
>> one remain sceptical that the real-world benefits will turn out to match
>> the tech demos), I think it's premature to bake the ECT(1) semantics
>> into UAPI.
>
> Chicken and egg problem.
> We had fq_codel in linux kernel years before RFC after all :)

Sure, but fq_codel is a self-contained algorithm, it doesn't add new
meanings to bits of the IP header... :)

>> So how about tying this behaviour to a configurable skb->mark instead?
>> That way users can get the shallow marking behaviour for any subset of
>> packets they want, simply by installing a suitable filter on the
>> qdisc...
>
> This seems an idea, but do you really expect users installing a sophisticated
> filter ? Please provide more details, and cost analysis.

Not sure it's that sophisticated; pretty simple to do with tc-u32
(although it's complicated a bit by having to restore the default
hashing behaviour of fq_codel with a second filter). Something like:

# tc qdisc replace dev $DEV handle 1: fq_codel
# tc filter add dev $DEV parent 1: pref 1 protocol ipv6 u32 match u32 00100000 00100000 action skbedit mark 2 continue
# tc filter add dev $DEV parent 1: pref 2 protocol ip u32 match ip dsfield 1 1 action skbedit mark 2 continue
# tc filter add dev $DEV parent 1: handle 1 pref 3 protocol all flow hash keys src,dst,proto,proto-src,proto-dst divisor 1024

or one could write a single BPF program that combines all three to save
some cycles walking the filter chain.

> (Having to install a filter is probably more expensive than testing a
> boolean, after the sojourn time has exceeded the threshold)

No doubt, all other things being equal. But odds are they're not: if
you're already running a BPF filter somewhere in the path, adding the
logic above to an existing filter reduces it back down to a couple of
boolean comparisons, for instance.

But even if it does add a bit of overhead, IMO the flexibility makes up
for this. We can always revisit it if L4S becomes a standards-track RFC
at some point :)

> Given that INET_ECN_set_ce(skb) only operates on ECT(1) and ECT(0),
> I guess we could  use a bitmask of two bits so that users can decide
> which code points can become CE.

That would be an improvement. But if we're doing bitmasks, and since the
code is reading the whole dsfield anyway, why not extend that bitmask to
the whole dsfield?

-Toke

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ