lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250306002825.rva7wjsymmms7kbd@skbuf>
Date: Thu, 6 Mar 2025 02:28:25 +0200
From: Vladimir Oltean <vladimir.oltean@....com>
To: Faizal Rahim <faizal.abdul.rahim@...ux.intel.com>
Cc: Tony Nguyen <anthony.l.nguyen@...el.com>,
	Przemek Kitszel <przemyslaw.kitszel@...el.com>,
	Andrew Lunn <andrew+netdev@...n.ch>,
	"David S . Miller" <davem@...emloft.net>,
	Eric Dumazet <edumazet@...gle.com>,
	Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>,
	Maxime Coquelin <mcoquelin.stm32@...il.com>,
	Alexandre Torgue <alexandre.torgue@...s.st.com>,
	Simon Horman <horms@...nel.org>,
	Russell King <linux@...linux.org.uk>,
	Alexei Starovoitov <ast@...nel.org>,
	Daniel Borkmann <daniel@...earbox.net>,
	Jesper Dangaard Brouer <hawk@...nel.org>,
	John Fastabend <john.fastabend@...il.com>,
	Furong Xu <0x1207@...il.com>,
	Russell King <rmk+kernel@...linux.org.uk>,
	Serge Semin <fancer.lancer@...il.com>,
	Xiaolei Wang <xiaolei.wang@...driver.com>,
	Suraj Jaiswal <quic_jsuraj@...cinc.com>,
	Kory Maincent <kory.maincent@...tlin.com>,
	Gal Pressman <gal@...dia.com>,
	Jesper Nilsson <jesper.nilsson@...s.com>,
	Choong Yong Liang <yong.liang.choong@...ux.intel.com>,
	Chwee-Lin Choong <chwee.lin.choong@...el.com>,
	Kunihiko Hayashi <hayashi.kunihiko@...ionext.com>,
	Vinicius Costa Gomes <vinicius.gomes@...el.com>,
	intel-wired-lan@...ts.osuosl.org, netdev@...r.kernel.org,
	linux-kernel@...r.kernel.org,
	linux-stm32@...md-mailman.stormreply.com,
	linux-arm-kernel@...ts.infradead.org, bpf@...r.kernel.org
Subject: Re: [PATCH iwl-next v8 07/11] igc: add support for frame preemption
 verification

On Wed, Mar 05, 2025 at 08:00:22AM -0500, Faizal Rahim wrote:
> b) configure_pmac() -> not used
>    - this callback dynamically controls pmac_enabled at runtime. For
>      example, mmsv calls configure_pmac() and disables pmac_enabled when
>      the link partner goes down, even if the user previously enabled it.
>      The intention is to save power but it is not feasible in igc
>      because it causes an endless adapter reset loop:
> 
>    1) Board A and Board B complete the verification handshake. Tx mode
>       register for both boards are in TSN mode.
>    2) Board B link goes down.
> 
>    On Board A:
>    3) mmsv calls configure_pmac() with pmac_enabled = false.
>    4) configure_pmac() in igc updates a new field based on pmac_enabled.
>       Driver uses this field in igc_tsn_new_flags() to indicate that the
>       user enabled/disabled FPE.
>    5) configure_pmac() in igc calls igc_tsn_offload_apply() to check
>       whether an adapter reset is needed. Calls existing logic in
>       igc_tsn_will_tx_mode_change() and igc_tsn_new_flags().
>    6) Since pmac_enabled is now disabled and no other TSN feature is
>       active, igc_tsn_will_tx_mode_change() evaluates to true because Tx
>       mode will switch from TSN to Legacy.
>    7) Driver resets the adapter.
>    8) Registers are set, and Tx mode switches to Legacy.
>    9) When link partner is up, steps 3–8 repeat, but this time with
>       pmac_enabled = true, reactivating TSN.
>       igc_tsn_will_tx_mode_change() evaluates to true again, since Tx
>       mode will switch from Legacy to TSN.
>   10) Driver resets the adapter.
>   11) Rest adapter completes, registers are set, and Tx mode switches to

s/Rest adapter/Adapter reset/

>       TSN.
> 
>   On Board B:
>   12) Adapter reset on Board A at step 10 causes it to detect its link
>       partner as down.
>   13) Repeats steps 3–8.
>   14) Once reset adapter on Board A is completed at step 11, it detects
>       its link partner as up.
>   15) Repeats steps 9–11.
> 
>    - this cycle repeats indefinitely. To avoid this issue, igc only uses
>      mmsv.pmac_enabled to track whether FPE is enabled or disabled.
> 
> Co-developed-by: Vinicius Costa Gomes <vinicius.gomes@...el.com>
> Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@...el.com>
> Co-developed-by: Choong Yong Liang <yong.liang.choong@...ux.intel.com>
> Signed-off-by: Choong Yong Liang <yong.liang.choong@...ux.intel.com>
> Co-developed-by: Chwee-Lin Choong <chwee.lin.choong@...el.com>
> Signed-off-by: Chwee-Lin Choong <chwee.lin.choong@...el.com>
> Signed-off-by: Faizal Rahim <faizal.abdul.rahim@...ux.intel.com>
> ---
> +static inline bool igc_fpe_is_pmac_enabled(struct igc_adapter *adapter)
> +{
> +	return static_branch_unlikely(&igc_fpe_enabled) &&
> +	       adapter->fpe.mmsv.pmac_enabled;
> +}
> +
> +static inline bool igc_fpe_is_verify_or_response(union igc_adv_rx_desc *rx_desc,
> +						 unsigned int size, void *pktbuf)
> +{
> +	u32 status_error = le32_to_cpu(rx_desc->wb.upper.status_error);
> +	static const u8 zero_payload[SMD_FRAME_SIZE] = {0};
> +	int smd;
> +
> +	smd = FIELD_GET(IGC_RXDADV_STAT_SMD_TYPE_MASK, status_error);
> +
> +	return (smd == IGC_RXD_STAT_SMD_TYPE_V || smd == IGC_RXD_STAT_SMD_TYPE_R) &&
> +		size == SMD_FRAME_SIZE &&
> +		!memcmp(pktbuf, zero_payload, SMD_FRAME_SIZE); /* Buffer is all zeros */

Using this definition...

> +}
> +
> +static inline void igc_fpe_lp_event_status(union igc_adv_rx_desc *rx_desc,
> +					   struct ethtool_mmsv *mmsv)
> +{
> +	u32 status_error = le32_to_cpu(rx_desc->wb.upper.status_error);
> +	int smd;
> +
> +	smd = FIELD_GET(IGC_RXDADV_STAT_SMD_TYPE_MASK, status_error);
> +
> +	if (smd == IGC_RXD_STAT_SMD_TYPE_V)
> +		ethtool_mmsv_event_handle(mmsv, ETHTOOL_MMSV_LP_SENT_VERIFY_MPACKET);
> +	else if (smd == IGC_RXD_STAT_SMD_TYPE_R)
> +		ethtool_mmsv_event_handle(mmsv, ETHTOOL_MMSV_LP_SENT_RESPONSE_MPACKET);
> +}
> @@ -2617,6 +2617,15 @@ static int igc_clean_rx_irq(struct igc_q_vector *q_vector, const int budget)
>  			size -= IGC_TS_HDR_LEN;
>  		}
>  
> +		if (igc_fpe_is_pmac_enabled(adapter) &&
> +		    igc_fpe_is_verify_or_response(rx_desc, size, pktbuf)) {

... invalid SMD-R and SMD-V frames will skip this code block altogether, and
will be passed up the network stack, and visible at least in tcpdump, correct?
Essentially, if the link partner would craft an ICMP request packet with
an SMD-V or SMD-R, your station would respond to it, which is incorrect.

A bit strange, the behavior in this case seems a bit under-specified in
the standard, and I don't see any counter that should be incremented.

> +			igc_fpe_lp_event_status(rx_desc, &adapter->fpe.mmsv);
> +			/* Advance the ring next-to-clean */
> +			igc_is_non_eop(rx_ring, rx_desc);
> +			cleaned_count++;
> +			continue;
> +		}

To fix this, don't you want to merge the unnaturally split
igc_fpe_is_verify_or_response() and igc_fpe_lp_event_status() into a
single function, which returns true whenever the mPacket should be
consumed by the driver, but decides whether to emit a mmsv event on its
own? Merging the two would also avoid reading rx_desc->wb.upper.status_error
twice.

Something like this:

static inline bool igc_fpe_handle_mpacket(struct igc_adapter *adapter,
					  union igc_adv_rx_desc *rx_desc,
					  unsigned int size, void *pktbuf)
{
	u32 status_error = le32_to_cpu(rx_desc->wb.upper.status_error);
	int smd;

	smd = FIELD_GET(IGC_RXDADV_STAT_SMD_TYPE_MASK, status_error);
	if (smd != IGC_RXD_STAT_SMD_TYPE_V && smd != IGC_RXD_STAT_SMD_TYPE_R)
		return false;

	if (size == SMD_FRAME_SIZE && mem_is_zero(pktbuf, SMD_FRAME_SIZE)) {
		struct ethtool_mmsv *mmsv = &adapter->fpe.mmsv;
		enum ethtool_mmsv_event event;

		if (smd == IGC_RXD_STAT_SMD_TYPE_V)
			event = ETHTOOL_MMSV_LP_SENT_VERIFY_MPACKET;
		else
			event = ETHTOOL_MMSV_LP_SENT_RESPONSE_MPACKET;

		ethtool_mmsv_event_handle(mmsv, event);
	}

	return true;
}

		if (igc_fpe_is_pmac_enabled(adapter) &&
		    igc_fpe_handle_mpacket(adapter, rx_desc, size, pktbuf)) {
			/* Advance the ring next-to-clean */
			igc_is_non_eop(rx_ring, rx_desc);
			cleaned_count++;
			continue;
		}

[ also remark the use of mem_is_zero() instead of memcmp() with a buffer
  pre-filled with zeroes. It should be more efficient, for the simple
  reason that it's accessing a single memory buffer and not two. Though
  I'm surprised how widespread the memcmp() pattern is throughout the
  kernel. ]

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ