[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <8aea0fda1e48485291312a4451aa5d7c@amazon.com>
Date: Wed, 14 Aug 2024 15:31:49 +0000
From: "Arinzon, David" <darinzon@...zon.com>
To: Jakub Kicinski <kuba@...nel.org>
CC: David Miller <davem@...emloft.net>, "netdev@...r.kernel.org"
<netdev@...r.kernel.org>, Eric Dumazet <edumazet@...gle.com>, Paolo Abeni
<pabeni@...hat.com>, "Woodhouse, David" <dwmw@...zon.co.uk>, "Machulsky,
Zorik" <zorik@...zon.com>, "Matushevsky, Alexander" <matua@...zon.com>,
"Bshara, Saeed" <saeedb@...zon.com>, "Wilson, Matt" <msw@...zon.com>,
"Liguori, Anthony" <aliguori@...zon.com>, "Bshara, Nafea" <nafea@...zon.com>,
"Belgazal, Netanel" <netanel@...zon.com>, "Saidi, Ali" <alisaidi@...zon.com>,
"Herrenschmidt, Benjamin" <benh@...zon.com>, "Kiyanovski, Arthur"
<akiyano@...zon.com>, "Dagan, Noam" <ndagan@...zon.com>, "Agroskin, Shay"
<shayagr@...zon.com>, "Itzko, Shahar" <itzko@...zon.com>, "Abboud, Osama"
<osamaabb@...zon.com>, "Ostrovsky, Evgeny" <evostrov@...zon.com>, "Tabachnik,
Ofir" <ofirt@...zon.com>, "Beider, Ron" <rbeider@...zon.com>, "Chauskin,
Igor" <igorch@...zon.com>, "Bernstein, Amit" <amitbern@...zon.com>
Subject: RE: [PATCH v1 net-next 2/2] net: ena: Extend customer metrics reporting
support
> > I will note that this patch modifies the infrastructure/logic in which
> > these stats are retrieved to allow expandability and flexibility of
> > the interface between the driver and the device (written in the commit
> > message). The top five (0 - 4) are already part of the upstream code
> > and the last one (5) is added in this patch.
>
> That's not clear at all from the one sentence in the commit message.
> Please don't assume that the reviewers are familiar with your driver.
>
> > The statistics discussed here and are exposed by ENA are not on a
> > queue level but on an interface level, therefore, I am not sure that
> > the ones pointed out by you would be a good fit for us.
>
> The API in question is queue-capable, but it also supports reporting the stats
> for the overall device, without per-queue breakdown (via the
> "get_base_stats" callback).
>
> > But in any case, would it be possible from your point of view to
> > progress in two paths, one would be this patchset with the addition of
> > the new metric and another would be to explore whether there are such
> > stats on an interface level that can be exposed?
>
> Adding a callback and filling in two stats is not a large ask.
> Just do it, please.
Hi Jakub,
I've looked into the definition of the metrics under question
Based on AWS documentation (https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/monitoring-network-performance-ena.html)
bw_in_allowance_exceeded: The number of packets queued or dropped because the inbound aggregate bandwidth exceeded the maximum for the instance.
bw_out_allowance_exceeded: The number of packets queued or dropped because the outbound aggregate bandwidth exceeded the maximum for the instance.
Based on the netlink spec (https://docs.kernel.org/next/networking/netlink_spec/netdev.html)
rx-hw-drop-ratelimits (uint)
doc: Number of the packets dropped by the device due to the received packets bitrate exceeding the device rate limit.
tx-hw-drop-ratelimits (uint)
doc: Number of the packets dropped by the device due to the transmit packets bitrate exceeding the device rate limit.
The AWS metrics are counting for packets dropped or queued (delayed, but are sent/received with a delay), a change in these metrics is an indication to customers to check their applications and workloads due to risk of exceeding limits.
There's no distinction between dropped and queued in these metrics, therefore, they do not match the ratelimits in the netlink spec.
In case there will be a separation of these metrics in the future to dropped and queued, we'll be able to add the support for hw-drop-ratelimits.
Thanks,
David
Powered by blists - more mailing lists