[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <6236150118de4e499304ba9d0a426663@amazon.com>
Date: Fri, 16 Aug 2024 17:32:56 +0000
From: "Arinzon, David" <darinzon@...zon.com>
To: Jakub Kicinski <kuba@...nel.org>, Xuan Zhuo <xuanzhuo@...ux.alibaba.com>,
"Michael S. Tsirkin" <mst@...hat.com>
CC: David Miller <davem@...emloft.net>, "netdev@...r.kernel.org"
<netdev@...r.kernel.org>, Eric Dumazet <edumazet@...gle.com>, Paolo Abeni
<pabeni@...hat.com>, "Woodhouse, David" <dwmw@...zon.co.uk>, "Machulsky,
Zorik" <zorik@...zon.com>, "Matushevsky, Alexander" <matua@...zon.com>,
"Bshara, Saeed" <saeedb@...zon.com>, "Wilson, Matt" <msw@...zon.com>,
"Liguori, Anthony" <aliguori@...zon.com>, "Bshara, Nafea" <nafea@...zon.com>,
"Belgazal, Netanel" <netanel@...zon.com>, "Saidi, Ali" <alisaidi@...zon.com>,
"Herrenschmidt, Benjamin" <benh@...zon.com>, "Kiyanovski, Arthur"
<akiyano@...zon.com>, "Dagan, Noam" <ndagan@...zon.com>, "Agroskin, Shay"
<shayagr@...zon.com>, "Itzko, Shahar" <itzko@...zon.com>, "Abboud, Osama"
<osamaabb@...zon.com>, "Ostrovsky, Evgeny" <evostrov@...zon.com>, "Tabachnik,
Ofir" <ofirt@...zon.com>, "Beider, Ron" <rbeider@...zon.com>, "Chauskin,
Igor" <igorch@...zon.com>, "Bernstein, Amit" <amitbern@...zon.com>, "Parav
Pandit" <parav@...dia.com>, Cornelia Huck <cohuck@...hat.com>
Subject: RE: [PATCH v1 net-next 2/2] net: ena: Extend customer metrics reporting
support
> > I've looked into the definition of the metrics under question
> >
> > Based on AWS documentation
> > (https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/monitoring-
> networ
> > k-performance-ena.html)
> >
> > bw_in_allowance_exceeded: The number of packets queued or dropped
> because the inbound aggregate bandwidth exceeded the maximum for the
> instance.
> > bw_out_allowance_exceeded: The number of packets queued or dropped
> because the outbound aggregate bandwidth exceeded the maximum for the
> instance.
> >
> > Based on the netlink spec
> > (https://docs.kernel.org/next/networking/netlink_spec/netdev.html)
> >
> > rx-hw-drop-ratelimits (uint)
> > doc: Number of the packets dropped by the device due to the received
> packets bitrate exceeding the device rate limit.
> > tx-hw-drop-ratelimits (uint)
> > doc: Number of the packets dropped by the device due to the transmit
> packets bitrate exceeding the device rate limit.
> >
> > The AWS metrics are counting for packets dropped or queued (delayed,
> but are sent/received with a delay), a change in these metrics is an indication
> to customers to check their applications and workloads due to risk of
> exceeding limits.
> > There's no distinction between dropped and queued in these metrics,
> therefore, they do not match the ratelimits in the netlink spec.
> > In case there will be a separation of these metrics in the future to dropped
> and queued, we'll be able to add the support for hw-drop-ratelimits.
>
> Xuan, Michael, the virtio spec calls out drops due to b/w limit being
> exceeded, but AWS people say their NICs also count packets buffered but
> not dropped towards a similar metric.
>
> I presume the virtio spec is supposed to cover the same use cases.
> Have the stats been approved? Is it reasonable to extend the definition of
> the "exceeded" stats in the virtio spec to cover what AWS specifies?
> Looks like PR is still open:
> https://github.com/oasis-tcs/virtio-spec/issues/180
How do we move forward with this patchset?
Regarding the counter itself, even though we don't support this at the moment, I would recommend to keep the queued and dropped
as split (for example, add tx/rx-hw-queued-ratelimits, or something similar, if that makes sense).
Thanks
David
Powered by blists - more mailing lists