lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 13 Jan 2023 16:43:19 +0100
From:   Daniele Palmas <dnlplm@...il.com>
To:     Alexander H Duyck <alexander.duyck@...il.com>
Cc:     David Miller <davem@...emloft.net>,
        Jakub Kicinski <kuba@...nel.org>,
        Paolo Abeni <pabeni@...hat.com>,
        Eric Dumazet <edumazet@...gle.com>,
        Subash Abhinov Kasiviswanathan <quic_subashab@...cinc.com>,
        Sean Tranchetti <quic_stranche@...cinc.com>,
        Jonathan Corbet <corbet@....net>,
        Alexander Lobakin <alexandr.lobakin@...el.com>,
        Gal Pressman <gal@...dia.com>, Dave Taht <dave.taht@...il.com>,
        Bjørn Mork <bjorn@...k.no>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        netdev@...r.kernel.org
Subject: Re: [PATCH net-next v4 0/3] add tx packets aggregation to ethtool and rmnet

Hello Alexander,

Il giorno ven 13 gen 2023 alle ore 00:00 Alexander H Duyck
<alexander.duyck@...il.com> ha scritto:
>
> On Wed, 2023-01-11 at 14:05 +0100, Daniele Palmas wrote:
> > Hello maintainers and all,
> >
> > this patchset implements tx qmap packets aggregation in rmnet and generic
> > ethtool support for that.
> >
> > Some low-cat Thread-x based modems are not capable of properly reaching the maximum
> > allowed throughput both in tx and rx during a bidirectional test if tx packets
> > aggregation is not enabled.
>
> One question I would have about this is if you are making use of Byte
> Queue Limits and netdev_xmit_more at all? I know for high speed devices
> most of us added support for xmit_more because PCIe bandwidth was
> limiting when we were writing the Tx ring indexes/doorbells with every
> packet. To overcome that we added netdev_xmit_more which told us when
> the Qdisc had more packets to give us. This allowed us to better
> utilize the PCIe bus by bursting packets through without adding
> additional latency.
>

no, I was not aware of BQL: this development has been basically
modelled on what other mobile broadband drivers do (e.g.
cdc_mbim/cdc_ncm, Qualcomm downstream rmnet implementation), that are
not using BQL.

If I understand properly documentation

rmnet0/queues/tx-0/byte_queue_limits/limit_max

would be the candidate for replacing ETHTOOL_A_COALESCE_TX_AGGR_MAX_BYTES.

But I can't find anything for ETHTOOL_A_COALESCE_TX_AGGR_MAX_FRAMES
and ETHTOOL_A_COALESCE_TX_AGGR_TIME_USECS, something that should work
in combination with the bytes limit: at least the first one is
mandatory, since the modem can't receive more than a certain number
(this is a variable value depending on the modem model and is
collected through userspace tools).

ETHTOOL_A_COALESCE_TX_AGGR_MAX_FRAMES works also as a way to determine
that tx aggregation has been enabled by the userspace tool managing
the qmi requests, otherwise no aggregation should be performed.

> > I verified this problem with rmnet + qmi_wwan by using a MDM9207 Cat. 4 based modem
> > (50Mbps/150Mbps max throughput). What is actually happening is pictured at
> > https://drive.google.com/file/d/1gSbozrtd9h0X63i6vdkNpN68d-9sg8f9/view
> >
> > Testing with iperf TCP, when rx and tx flows are tested singularly there's no issue
> > in tx and minor issues in rx (not able to reach max throughput). When there are concurrent
> > tx and rx flows, tx throughput has an huge drop. rx a minor one, but still present.
> >
> > The same scenario with tx aggregation enabled is pictured at
> > https://drive.google.com/file/d/1jcVIKNZD7K3lHtwKE5W02mpaloudYYih/view
> > showing a regular graph.
> >
> > This issue does not happen with high-cat modems (e.g. SDX20), or at least it
> > does not happen at the throughputs I'm able to test currently: maybe the same
> > could happen when moving close to the maximum rates supported by those modems.
> > Anyway, having the tx aggregation enabled should not hurt.
> >
> > The first attempt to solve this issue was in qmi_wwan qmap implementation,
> > see the discussion at https://lore.kernel.org/netdev/20221019132503.6783-1-dnlplm@gmail.com/
> >
> > However, it turned out that rmnet was a better candidate for the implementation.
> >
> > Moreover, Greg and Jakub suggested also to use ethtool for the configuration:
> > not sure if I got their advice right, but this patchset add also generic ethtool
> > support for tx aggregation.
>
> I have concerns about this essentially moving queueing disciplines down
> into the device. The idea of doing Tx aggregation seems like something
> that should be done with the qdisc rather than the device driver.
> Otherwise we are looking at having multiple implementations of this
> aggregation floating around. It seems like it would make more sense to
> have this batching happen at the qdisc layer, and then the qdisc layer
> would pass down a batch of frames w/ xmit_more set to indicate it is
> flushing the batch.

Honestly, I'm not expert enough to give a reliable opinion about this.

I feel like these settings are more related to the hardware, requiring
also a configuration on the hardware itself done by the user, so
ethtool would seem to me a good choice, but I may be biased since I
did this development :-)

Thanks,
Daniele

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ