lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAM0EoMnrrvKBdz4WjzFmzZhkhjGiJ-+=yr8RMkT5USzTafSybw@mail.gmail.com>
Date: Wed, 10 Sep 2025 00:29:41 -0400
From: Jamal Hadi Salim <jhs@...atatu.com>
To: Jay Vosburgh <jv@...sburgh.net>
Cc: netdev@...r.kernel.org, Stephen Hemminger <stephen@...workplumber.org>, 
	David Ahern <dsahern@...il.com>
Subject: Re: [PATCH iproute2-next] tc/police: Allow 64 bit burst size

On Wed, Sep 10, 2025 at 12:04 AM Jay Vosburgh <jv@...sburgh.net> wrote:
>
> Jamal Hadi Salim <jhs@...atatu.com> wrote:
>
> >On Sat, Sep 6, 2025 at 9:50 PM Jay Vosburgh <jay.vosburgh@...onical.com> wrote:
> >>
> >>
> >>         In summary, this patchset changes the user space handling of the
> >> tc police burst parameter to permit burst sizes that exceed 4 GB when the
> >> specified rate is high enough that the kernel API for burst can accomodate
> >> such.
> >>
> >>         Additionally, if the burst exceeds the upper limit of the kernel
> >> API, this is now flagged as an error.  The existing behavior silently
> >> overflows, resulting in arbitrary values passed to the kernel.
> >>
> >>         In detail, as presently implemented, the tc police burst option
> >> limits the size of the burst to to 4 GB, i.e., UINT_MAX for a 32 bit
> >> unsigned int.  This is a reasonable limit for the low rates common when
> >> this was developed.  However, the underlying implementation of burst is
> >> computed as "time at the specified rate," and for higher rates, a burst
> >> size exceeding 4 GB is feasible without modification to the kernel.
> >>
> >>         The burst size provided on the command line is translated into a
> >> duration, representing how much time is required at the specified rate to
> >> transmit the given burst size.
> >>
> >>         This time is calculated in units of "psched ticks," each of which
> >> is 64 nsec[0].  The computed number of psched ticks is sent to the kernel
> >> as a __u32 value.
> >>
> >
> >Please run tdc tests. David/Stephen - can we please make this a
> >requirement for iproute2 tc related changes?
>
>         I was not familiar with those tests (but see them now that I
> look in the kernel source).  I did run the tests included in the
> iproute2-next git repository.
>
> >Jay, your patches fail at least one test because you changed the unit outputs.
> >Either we fix the tdc test or you make your changes backward compatible.
>
>         I'll run the tdc tests and have a look at the failures.
>
> >In the future also cc kernel tc maintainers (I only saw this because
> >someone pointed it to me).
> >Overall the changes look fine.
>
>         Understood, but this isn't documented in iproute2.  Perhaps the
> iproute2 MAINTAINERS should have a tc section to clarify this
> expectation?
>

Ive been asking for a while now;->

cheers,
jamal
>         -J
>
> >cheers,
> >jamal
> >
> >>         Because burst is ultimately calculated as a time duration, the
> >> real upper limit for a burst is UINT_MAX psched ticks, i.e.,
> >>
> >>         UINT_MAX * psched tick duration / NSEC_PER_SEC
> >>         (2^32-1) *         64           / 1E9
> >>
> >>         which is roughly 274.88 seconds (274.8779...).
> >>
> >>         At low rates, e.g., 5 Mbit/sec, UINT_MAX psched ticks does not
> >> correspond to a burst size in excess of 4 GB, so the above is moot, e.g.,
> >>
> >>         5Mbit/sec / 8 = 625000 MBytes/sec
> >>         625000 * ~274.88 seconds = ~171800000 max burst size, below UINT_MAX
> >>
> >>         Thus, the burst size at 5Mbit/sec is limited by the __u32 size of
> >> the psched tick field in the kernel API, not the 4 GB limit of the tc
> >> police burst user space API.
> >>
> >>         However, at higher rates, e.g., 10 Gbit/sec, the burst size is
> >> currently limited by the 4 GB maximum for the burst command line parameter
> >> value, rather than UINT_MAX psched ticks:
> >>
> >>         10 Gbit/sec / 8 = 1250000000 MBbytes/sec
> >>         1250000000 * ~274.88 seconds = ~343600000000, more than UINT_MAX
> >>
> >>         Here, the maximum duration of a burst the kernel can handle
> >> exceeds 4 GB of burst size.
> >>
> >>         While the above maximum may be an excessively large burst value,
> >> at 10 Gbit/sec, a 4 GB burst size corresponds to just under 3.5 seconds in
> >> duration:
> >>
> >>         2^32 bytes / 10 Gbit/sec
> >>         2^32 bytes / 1250000000 bytes/sec
> >>         equals ~3.43 sec
> >>
> >>         So, at higher rates, burst sizes exceeding 4 GB are both
> >> reasonable and feasible, up to the UINT_MAX limit for psched ticks.
> >> Enabling this requires changes only to the user space processing of the
> >> burst size parameter in tc.
> >>
> >>         In principle, the other packet schedulers utilizing psched ticks
> >> for burst sizing, htb and tbf, could be similarly changed to permit larger
> >> burst sizes, but this patch set does not do so.
> >>
> >>         Separately, for the burst duration calculation overflow (i.e.,
> >> that the number of psched ticks exceeds UINT_MAX), under the current
> >> implementation, one example of overflow is as follows:
> >>
> >> # /sbin/tc filter add dev eth0 protocol ip prio 1 parent ffff: handle 1 fw police rate 1Mbit peakrate 10Gbit burst 34375000 mtu 64Kb conform-exceed reclassify
> >>
> >> # /sbin/tc -raw filter get dev eth0 ingress protocol ip pref 1 handle 1 fw
> >> filter ingress protocol ip pref 1 fw chain 0 handle 0x1  police 0x1 rate 1Mbit burst 15261b mtu 64Kb [001d1bf8] peakrate 10Gbit action reclassify overhead 0b
> >>         ref 1 bind 1
> >>
> >>         Note that the returned burst value is 15261b, which does not match
> >> the supplied value of 34375000.  With this patch set applied, this
> >> situation is flagged as an error.
> >>
> >>
> >> [0] psched ticks are defined in the kernel in include/net/pkt_sched.h:
> >>
> >> #define PSCHED_SHIFT                    6
> >> #define PSCHED_TICKS2NS(x)              ((s64)(x) << PSCHED_SHIFT)
> >> #define PSCHED_NS2TICKS(x)              ((x) >> PSCHED_SHIFT)
> >>
> >> #define PSCHED_TICKS_PER_SEC            PSCHED_NS2TICKS(NSEC_PER_SEC)
> >>
> >>         where PSCHED_TICKS_PER_SEC is 15625000.
> >>
> >>         These values are exported to user space via /proc/net/psched, the
> >> second field being PSCHED_TICKS2NS(1), which at present is 64 (0x40).  tc
> >> uses this value to compute its internal "tick_in_usec" variable containing
> >> the number of psched ticks per usec (15.625) used for the psched tick
> >> computations.
> >>
> >>         Lastly, note that PSCHED_SHIFT was previously 10, and changed to 6
> >> in commit a4a710c4a7490 in 2009.  I have not tested backwards
> >> compatibility of these changes with kernels of that era.
>
> ---
>         -Jay Vosburgh, jv@...sburgh.net
>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ