lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20221026121330.2042989-1-vladimir.oltean@nxp.com>
Date:   Wed, 26 Oct 2022 15:13:30 +0300
From:   Vladimir Oltean <vladimir.oltean@....com>
To:     netdev@...r.kernel.org
Cc:     "David S. Miller" <davem@...emloft.net>,
        Eric Dumazet <edumazet@...gle.com>,
        Jakub Kicinski <kuba@...nel.org>,
        Paolo Abeni <pabeni@...hat.com>,
        Claudiu Manoil <claudiu.manoil@....com>,
        linux-kernel@...r.kernel.org
Subject: [PATCH v2 net] net: enetc: survive memory pressure without crashing

Under memory pressure, enetc_refill_rx_ring() may fail, and when called
during the enetc_open() -> enetc_setup_rxbdr() procedure, this is not
checked for.

An extreme case of memory pressure will result in exactly zero buffers
being allocated for the RX ring, and in such a case it is expected that
hardware drops all RX packets due to lack of buffers.

There are 2 problems. One is that the hardware drop doesn't happen, the
other is that even if this is fixed, the driver has undefined behavior
and may even crash. Explanation for the latter follows below.

The enetc NAPI poll procedure is shared between RX and TX conf, and
enetc_poll() calls enetc_clean_rx_ring() even if the reason why NAPI was
scheduled is TX.

The enetc_clean_rx_ring() function (and its XDP derivative) is not
prepared to handle such a condition. It has this loop exit condition:

		rxbd = enetc_rxbd(rx_ring, i);
		bd_status = le32_to_cpu(rxbd->r.lstatus);
		if (!bd_status)
			break;

otherwise said, the NAPI poll procedure does not look at the Producer
Index of the RX ring, instead it just walks circularly through the
descriptors until it finds one which is not Ready.

The driver undefined behavior is caused by the fact that the
enetc_rxbd(rx_ring, i) RX descriptor is only initialized by
enetc_refill_rx_ring() if page allocation has succeeded.

If memory allocation ever failed, enetc_clean_rx_ring() looks at
rxbd->r.lstatus as an exit condition, but "rxbd" itself is uninitialized
memory. If it contains junk, then junk buffers will be processed.

To fix this problem, memset the DMA coherent area used for RX buffer
descriptors in enetc_dma_alloc_bdr(). This makes all BDs be "not ready"
by default, which makes enetc_clean_rx_ring() exit early from the BD
processing loop when there is no valid buffer available.

The other problem (hardware does not drop packet in lack of buffers)
is due to an initial misconfiguration of the RX ring consumer index,
misconfiguration which is usually masked away by the proper
configuration done by enetc_refill_rx_ring() - when page allocation does
not fail.

The hardware guide recommends BD rings to be configured as follows:

| Configure the receive ring producer index register RBaPIR with a value
| of 0. The producer index is initially configured by software but owned
| by hardware after the ring has been enabled. Hardware increments the
| index when a frame is received which may consume one or more BDs.
| Hardware is not allowed to increment the producer index to match the
| consumer index since it is used to indicate an empty condition. The ring
| can hold at most RBLENR[LENGTH]-1 received BDs.
|
| Configure the receive ring consumer index register RBaCIR. The
| consumer index is owned by software and updated during operation of the
| of the BD ring by software, to indicate that any receive data occupied
| in the BD has been processed and it has been prepared for new data.
| - If consumer index and producer index are initialized to the same
|   value, it indicates that all BDs in the ring have been prepared and
|   hardware owns all of the entries.
| - If consumer index is initialized to producer index plus N, it would
|   indicate N BDs have been prepared. Note that hardware cannot start if
|   only a single buffer is prepared due to the restrictions described in
|   (2).
| - Software may write consumer index to match producer index anytime
|   while the ring is operational to indicate all received BDs prior have
|   been processed and new BDs prepared for hardware.

The reset-default value of the consumer index is 0, and this makes the
ENETC think that all buffers have been initialized (when in reality none
were).

To operate using no buffer, we must initialize the CI to PI + 1.

Fixes: d4fd0404c1c9 ("enetc: Introduce basic PF and VF ENETC ethernet drivers")
Signed-off-by: Vladimir Oltean <vladimir.oltean@....com>
---
v1->v2:
- also add an initial value for the RX ring consumer index, to be used
  when page allocation fails
- update the commit message
- deliberately not pick up Claudiu's review tag due to the above changes

v1 at:
https://patchwork.kernel.org/project/netdevbpf/patch/20221024172049.4187400-1-vladimir.oltean@nxp.com/

 drivers/net/ethernet/freescale/enetc/enetc.c | 16 ++++++++++++----
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/freescale/enetc/enetc.c b/drivers/net/ethernet/freescale/enetc/enetc.c
index 54bc92fc6bf0..a787114c9ede 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc.c
+++ b/drivers/net/ethernet/freescale/enetc/enetc.c
@@ -1738,18 +1738,21 @@ void enetc_get_si_caps(struct enetc_si *si)
 
 static int enetc_dma_alloc_bdr(struct enetc_bdr *r, size_t bd_size)
 {
-	r->bd_base = dma_alloc_coherent(r->dev, r->bd_count * bd_size,
-					&r->bd_dma_base, GFP_KERNEL);
+	size_t size = r->bd_count * bd_size;
+
+	r->bd_base = dma_alloc_coherent(r->dev, size, &r->bd_dma_base,
+					GFP_KERNEL);
 	if (!r->bd_base)
 		return -ENOMEM;
 
 	/* h/w requires 128B alignment */
 	if (!IS_ALIGNED(r->bd_dma_base, 128)) {
-		dma_free_coherent(r->dev, r->bd_count * bd_size, r->bd_base,
-				  r->bd_dma_base);
+		dma_free_coherent(r->dev, size, r->bd_base, r->bd_dma_base);
 		return -EINVAL;
 	}
 
+	memset(r->bd_base, 0, size);
+
 	return 0;
 }
 
@@ -2090,7 +2093,12 @@ static void enetc_setup_rxbdr(struct enetc_hw *hw, struct enetc_bdr *rx_ring)
 	else
 		enetc_rxbdr_wr(hw, idx, ENETC_RBBSR, ENETC_RXB_DMA_SIZE);
 
+	/* Also prepare the consumer index in case page allocation never
+	 * succeeds. In that case, hardware will never advance producer index
+	 * to match consumer index, and will drop all frames.
+	 */
 	enetc_rxbdr_wr(hw, idx, ENETC_RBPIR, 0);
+	enetc_rxbdr_wr(hw, idx, ENETC_RBCIR, 1);
 
 	/* enable Rx ints by setting pkt thr to 1 */
 	enetc_rxbdr_wr(hw, idx, ENETC_RBICR0, ENETC_RBICR0_ICEN | 0x1);
-- 
2.34.1

Powered by blists - more mailing lists