[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAGXJAmzfRJWv7tsw8jq-jR0ax3noQ9jMJEAkdtF8uki6DVDMzQ@mail.gmail.com>
Date: Wed, 7 May 2025 13:46:36 -0700
From: John Ousterhout <ouster@...stanford.edu>
To: Andrew Lunn <andrew@...n.ch>
Cc: Paolo Abeni <pabeni@...hat.com>, netdev@...r.kernel.org, edumazet@...gle.com,
horms@...nel.org, kuba@...nel.org
Subject: Re: [PATCH net-next v8 08/15] net: homa: create homa_pacer.h and homa_pacer.c
get_link_ksettings is what I was thinking of. Some of the issues you
mentioned, such as switch egress contention, are explicitly handled by
Homa, so those needn't (and shouldn't) be factored into the link
"speed". And don't pretty much all modern datacenter switches allow
all of their links to operate at full speed?
-John-
On Wed, May 7, 2025 at 1:31 PM Andrew Lunn <andrew@...n.ch> wrote:
>
> On Wed, May 07, 2025 at 11:55:23AM -0700, John Ousterhout wrote:
> > In Tue, May 6, 2025 at 7:05 AM Paolo Abeni <pabeni@...hat.com> wrote:
> > >
> > > On 5/3/25 1:37 AM, John Ousterhout wrote:
> > > > + /**
> > > > + * @link_mbps: The raw bandwidth of the network uplink, in
> > > > + * units of 1e06 bits per second. Set externally via sysctl.
> > > > + */
> > > > + int link_mbps;
> > >
> > > This is will be extremely problematic. In practice nobody will set this
> > > correctly and in some cases the info is not even available (VM) or will
> > > change dynamically due to policing/shaping.
> > >
> > > I think you need to build your own estimator of the available B/W. I'm
> > > unsure/I don't think you can re-use bql info here.
> >
> > I agree about the issues, but I'd like to defer addressing them. I
> > have begun working on a new Homa-specific qdisc, which will improve
> > performance when there is concurrent TCP and Homa traffic. It
> > retrieves link speed from the net_device, which will eliminate the
> > need for the link_mbps configuration option.
>
> I would be sceptical of the link speed, if you mean to use ethtool
> get_link_ksettings(). Not all switches have sufficient core bandwidth
> to allow all their ports to operate at line rate at the same
> time. There could be pause frames being sent back to slow the link
> down. And there could be FEC reducing the actual bandwidth you can get
> over the media. You also need to consider congestion on switch egress,
> when multiple sources are sending to one sink etc.
>
> BQL gives you a better idea of what the link is actually capable of,
> over the last few seconds, to the first switch. But after that,
> further hops across the network, it does not help.
>
> Andrew
Powered by blists - more mailing lists