[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <1725437142.8429277-2-xuanzhuo@linux.alibaba.com>
Date: Wed, 4 Sep 2024 16:05:42 +0800
From: Xuan Zhuo <xuanzhuo@...ux.alibaba.com>
To: "Arinzon, David" <darinzon@...zon.com>
Cc: David Miller <davem@...emloft.net>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
Eric Dumazet <edumazet@...gle.com>,
Paolo Abeni <pabeni@...hat.com>,
"Woodhouse, David" <dwmw@...zon.co.uk>,
"Machulsky, Zorik" <zorik@...zon.com>,
"Matushevsky, Alexander" <matua@...zon.com>,
"Bshara, Saeed" <saeedb@...zon.com>,
"Wilson, Matt" <msw@...zon.com>,
"Liguori, Anthony" <aliguori@...zon.com>,
"Bshara, Nafea" <nafea@...zon.com>,
"Belgazal, Netanel" <netanel@...zon.com>,
"Saidi, Ali" <alisaidi@...zon.com>,
"Herrenschmidt, Benjamin" <benh@...zon.com>,
"Kiyanovski, Arthur" <akiyano@...zon.com>,
"Dagan, Noam" <ndagan@...zon.com>,
"Agroskin, Shay" <shayagr@...zon.com>,
"Itzko, Shahar" <itzko@...zon.com>,
"Abboud, Osama" <osamaabb@...zon.com>,
"Ostrovsky, Evgeny" <evostrov@...zon.com>,
"Tabachnik, Ofir" <ofirt@...zon.com>,
"Beider, Ron" <rbeider@...zon.com>,
"Chauskin, Igor" <igorch@...zon.com>,
"Bernstein, Amit" <amitbern@...zon.com>,
Cornelia Huck <cohuck@...hat.com>,
Parav Pandit <parav@...dia.com>,
Jakub Kicinski <kuba@...nel.org>,
"Michael S. Tsirkin" <mst@...hat.com>
Subject: Re: RE: [PATCH v1 net-next 2/2] net: ena: Extend customer metrics reporting support
On Tue, 3 Sep 2024 04:29:18 +0000, "Arinzon, David" <darinzon@...zon.com> wrote:
> > > > I've looked into the definition of the metrics under question
> > > >
> > > > Based on AWS documentation
> > > > (https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/monitoring-
> > > networ
> > > > k-performance-ena.html)
> > > >
> > > > bw_in_allowance_exceeded: The number of packets queued or dropped
> > > because the inbound aggregate bandwidth exceeded the maximum for the
> > > instance.
> > > > bw_out_allowance_exceeded: The number of packets queued or
> > dropped
> > > because the outbound aggregate bandwidth exceeded the maximum for
> > the
> > > instance.
> > > >
> > > > Based on the netlink spec
> > > > (https://docs.kernel.org/next/networking/netlink_spec/netdev.html)
> > > >
> > > > rx-hw-drop-ratelimits (uint)
> > > > doc: Number of the packets dropped by the device due to the received
> > > packets bitrate exceeding the device rate limit.
> > > > tx-hw-drop-ratelimits (uint)
> > > > doc: Number of the packets dropped by the device due to the transmit
> > > packets bitrate exceeding the device rate limit.
> > > >
> > > > The AWS metrics are counting for packets dropped or queued (delayed,
> > > > but
> > > are sent/received with a delay), a change in these metrics is an
> > > indication to customers to check their applications and workloads due
> > > to risk of exceeding limits.
> > > > There's no distinction between dropped and queued in these metrics,
> > > therefore, they do not match the ratelimits in the netlink spec.
> > > > In case there will be a separation of these metrics in the future to
> > > > dropped
> > > and queued, we'll be able to add the support for hw-drop-ratelimits.
> > >
> > > Xuan, Michael, the virtio spec calls out drops due to b/w limit being
> > > exceeded, but AWS people say their NICs also count packets buffered
> > > but not dropped towards a similar metric.
> > >
> > > I presume the virtio spec is supposed to cover the same use cases.
> > On tx side, number of packets may not be queued, but may not be even
> > DMAed if the rate has exceeded.
> > This is hw nic implementation detail and a choice with trade-offs.
> >
> > Similarly on rx, one may implement drop or queue or both (queue upto some
> > limit, and drop beyond it).
> >
> > > Have the stats been approved?
> > Yes. it is approved last year; I have also reviewed it; It is part of the spec
> > nearly 10 months ago at [1].
> > GH PR is merged but GH is not updated yet.
> >
> > [1] https://github.com/oasis-tcs/virtio-
> > spec/commit/42f389989823039724f95bbbd243291ab0064f82
> >
> > > Is it reasonable to extend the definition of the "exceeded" stats in
> > > the virtio spec to cover what AWS specifies?
> > Virtio may add new stats for exceeded stats in future.
> > But I do not understand how AWS ENA nic is related to virtio PCI HW nic.
> >
> > Should virtio implement it? may be yes. Looks useful to me.
> > Should it be now in virtio spec, not sure, this depends on virtio community
> > and actual hw/sw supporting it.
> >
> > > Looks like PR is still open:
> > > https://github.com/oasis-tcs/virtio-spec/issues/180
> > Spec already has it at [1] for drops. GH PR is not upto date.
>
> Thank you for the reply, Parav.
> I've raised the query and the summary of this discussion in the above mentioned github ticket.
>
I saw your reply on github.
So what is the question?
Now the stats are rx/tx_hw_drop_ratelimits, so I think these stats should only
count the number of dropped packets.
Yes, I also think the stats of queue packets are good. But that may be
new stats in the next version of the virtio spec or with new virtio feature.
But for the user, I thinks these are important. For me, I think nic
should provide all these stats.
Thanks.
Powered by blists - more mailing lists