netdev - Re: [RFC PATCH 1/1] dql: add dql_set_min

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAA93jw5+wB=va5tqUpCiPu20N+pn8VcMxUdySSWoQE_zqH8Qtg@mail.gmail.com>
Date:   Tue, 9 Mar 2021 11:44:05 -0800
From:   Dave Taht <dave.taht@...il.com>
To:     Vincent Mailhol <mailhol.vincent@...adoo.fr>
Cc:     Marc Kleine-Budde <mkl@...gutronix.de>, linux-can@...r.kernel.org,
        LKML <linux-kernel@...r.kernel.org>,
        Linux Kernel Network Developers <netdev@...r.kernel.org>,
        Tom Herbert <therbert@...gle.com>,
        Eric Dumazet <eric.dumazet@...il.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Randy Dunlap <rdunlap@...radead.org>
Subject: Re: [RFC PATCH 1/1] dql: add dql_set_min_limit()

I note that "proof" is very much in the developer's opinion and
limited testing base.

Actual operational experience, as in a real deployment, with other applications,
heavy context switching, or virtualization, might yield better results.

There's lots of defaults in the linux kernel that are just swags, the
default NAPI and rx/tx ring buffer sizes being two where devs just
copy/paste stuff, which either doesn't scale up, or doesn't scale
down.

This does not mean I oppose your patch! However I have two points I'd
like to make
regarding bql and dql in general that I have long longed be explored.

0) Me being an advocate of low latency in general, does mean that I
have no problem
and even prefer, starving the device rather than always keeping it busy.

/me hides

1) BQL is MIAD - multiplicative increase, additive decrease. While in
practice so far this does not seem to matter much (and also measuring
things down to "us" really hard), a stabler algorithm is AIMD. BQL
often absorbs a large TSO burst - usually a minimum of 128k is
observed on gbit, where a stabler state (without GSO) seemed to be
around 40k on many of the chipsets I worked with, back when I was
working in this area.

(cake's gso-splitting also gets lower bql values in general, if you
have enough cpu to run cake)

2) BQL + hardware mq is increasingly an issue in my mind in that, say,
you are hitting
64 hw queues, each with 128k stored in there, is additive, where in
order to service interrupts properly and keep the media busy might
only require 128k total, spread across the active queues and flows. I
have often thought that making BQL scale better to multiple hw queues
by globally sharing the buffering state(s), would lead to lower
latency, but
also that probably sharing that state would be too high overhead.

Having not worked out a solution to 2), and preferring to start with
1), and not having a whole lot of support for item 0) in the world, I
just thought I'd mention it, in the hope
someone might give it a go.