lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAM0EoMmJaC3OAncWnUOkz6mn7BVXudnG1YKUYZomUkbVu8Zb+g@mail.gmail.com>
Date: Tue, 9 Sep 2025 23:32:40 -0400
From: Jamal Hadi Salim <jhs@...atatu.com>
To: Jay Vosburgh <jay.vosburgh@...onical.com>
Cc: netdev@...r.kernel.org, Stephen Hemminger <stephen@...workplumber.org>, 
	David Ahern <dsahern@...il.com>
Subject: Re: [PATCH iproute2-next] tc/police: Allow 64 bit burst size

On Sat, Sep 6, 2025 at 9:50 PM Jay Vosburgh <jay.vosburgh@...onical.com> wrote:
>
>
>         In summary, this patchset changes the user space handling of the
> tc police burst parameter to permit burst sizes that exceed 4 GB when the
> specified rate is high enough that the kernel API for burst can accomodate
> such.
>
>         Additionally, if the burst exceeds the upper limit of the kernel
> API, this is now flagged as an error.  The existing behavior silently
> overflows, resulting in arbitrary values passed to the kernel.
>
>         In detail, as presently implemented, the tc police burst option
> limits the size of the burst to to 4 GB, i.e., UINT_MAX for a 32 bit
> unsigned int.  This is a reasonable limit for the low rates common when
> this was developed.  However, the underlying implementation of burst is
> computed as "time at the specified rate," and for higher rates, a burst
> size exceeding 4 GB is feasible without modification to the kernel.
>
>         The burst size provided on the command line is translated into a
> duration, representing how much time is required at the specified rate to
> transmit the given burst size.
>
>         This time is calculated in units of "psched ticks," each of which
> is 64 nsec[0].  The computed number of psched ticks is sent to the kernel
> as a __u32 value.
>

Please run tdc tests. David/Stephen - can we please make this a
requirement for iproute2 tc related changes?

Jay, your patches fail at least one test because you changed the unit outputs.
Either we fix the tdc test or you make your changes backward compatible.
In the future also cc kernel tc maintainers (I only saw this because
someone pointed it to me).
Overall the changes look fine.

cheers,
jamal

>         Because burst is ultimately calculated as a time duration, the
> real upper limit for a burst is UINT_MAX psched ticks, i.e.,
>
>         UINT_MAX * psched tick duration / NSEC_PER_SEC
>         (2^32-1) *         64           / 1E9
>
>         which is roughly 274.88 seconds (274.8779...).
>
>         At low rates, e.g., 5 Mbit/sec, UINT_MAX psched ticks does not
> correspond to a burst size in excess of 4 GB, so the above is moot, e.g.,
>
>         5Mbit/sec / 8 = 625000 MBytes/sec
>         625000 * ~274.88 seconds = ~171800000 max burst size, below UINT_MAX
>
>         Thus, the burst size at 5Mbit/sec is limited by the __u32 size of
> the psched tick field in the kernel API, not the 4 GB limit of the tc
> police burst user space API.
>
>         However, at higher rates, e.g., 10 Gbit/sec, the burst size is
> currently limited by the 4 GB maximum for the burst command line parameter
> value, rather than UINT_MAX psched ticks:
>
>         10 Gbit/sec / 8 = 1250000000 MBbytes/sec
>         1250000000 * ~274.88 seconds = ~343600000000, more than UINT_MAX
>
>         Here, the maximum duration of a burst the kernel can handle
> exceeds 4 GB of burst size.
>
>         While the above maximum may be an excessively large burst value,
> at 10 Gbit/sec, a 4 GB burst size corresponds to just under 3.5 seconds in
> duration:
>
>         2^32 bytes / 10 Gbit/sec
>         2^32 bytes / 1250000000 bytes/sec
>         equals ~3.43 sec
>
>         So, at higher rates, burst sizes exceeding 4 GB are both
> reasonable and feasible, up to the UINT_MAX limit for psched ticks.
> Enabling this requires changes only to the user space processing of the
> burst size parameter in tc.
>
>         In principle, the other packet schedulers utilizing psched ticks
> for burst sizing, htb and tbf, could be similarly changed to permit larger
> burst sizes, but this patch set does not do so.
>
>         Separately, for the burst duration calculation overflow (i.e.,
> that the number of psched ticks exceeds UINT_MAX), under the current
> implementation, one example of overflow is as follows:
>
> # /sbin/tc filter add dev eth0 protocol ip prio 1 parent ffff: handle 1 fw police rate 1Mbit peakrate 10Gbit burst 34375000 mtu 64Kb conform-exceed reclassify
>
> # /sbin/tc -raw filter get dev eth0 ingress protocol ip pref 1 handle 1 fw
> filter ingress protocol ip pref 1 fw chain 0 handle 0x1  police 0x1 rate 1Mbit burst 15261b mtu 64Kb [001d1bf8] peakrate 10Gbit action reclassify overhead 0b
>         ref 1 bind 1
>
>         Note that the returned burst value is 15261b, which does not match
> the supplied value of 34375000.  With this patch set applied, this
> situation is flagged as an error.
>
>
> [0] psched ticks are defined in the kernel in include/net/pkt_sched.h:
>
> #define PSCHED_SHIFT                    6
> #define PSCHED_TICKS2NS(x)              ((s64)(x) << PSCHED_SHIFT)
> #define PSCHED_NS2TICKS(x)              ((x) >> PSCHED_SHIFT)
>
> #define PSCHED_TICKS_PER_SEC            PSCHED_NS2TICKS(NSEC_PER_SEC)
>
>         where PSCHED_TICKS_PER_SEC is 15625000.
>
>         These values are exported to user space via /proc/net/psched, the
> second field being PSCHED_TICKS2NS(1), which at present is 64 (0x40).  tc
> uses this value to compute its internal "tick_in_usec" variable containing
> the number of psched ticks per usec (15.625) used for the psched tick
> computations.
>
>         Lastly, note that PSCHED_SHIFT was previously 10, and changed to 6
> in commit a4a710c4a7490 in 2009.  I have not tested backwards
> compatibility of these changes with kernels of that era.
>
>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ