[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20241220140516563WDQ_X40bt0ZOch3Qte1YO@zte.com.cn>
Date: Fri, 20 Dec 2024 14:05:16 +0800 (CST)
From: <jiang.kun2@....com.cn>
To: <andrew@...n.ch>, <olteanv@...il.com>, <davem@...emloft.net>,
<edumazet@...gle.com>, <netdev@...r.kernel.org>,
<linux-kernel@...r.kernel.org>
Cc: <hehe.peilin@....com.cn>, <xu.xin16@....com.cn>, <fan.yu9@....com.cn>,
<qiu.yutan@....com.cn>, <wang.yaxin@....com.cn>,
<tu.qiang35@....com.cn>, <yang.yang29@....com.cn>,
<ye.xingchen@....com.cn>, <zhang.yunkai@....com.cn>
Subject: [PATCH linux next] net:dsa:fix the dsa_ptr null pointer dereference
From: Peilin He<he.peilin@....com.cn>
Issue
=====
Repeatedly accessing the DSA Ethernet controller via the ethtool command,
followed by a system reboot, may trigger a DSA null pointer dereference,
causing a kernel panic and preventing the system from rebooting properly.
This can lead to data loss or denial-of-service, resulting in serious
consequences.
The original problem occurred in the Linux kernel version 5.4.19.
The following is the panic log:
[ 172.523467] Unable to handle kernel NULL pointer dereference at virtual
address 0000000000000020
[ 172.532455] Mem abort info:
[ 172.535313] printk: console [ttyS0]: printing thread stopped
[ 172.536352] ESR = 0x0000000096000006
[ 172.544926] EC = 0x25: DABT (current EL), IL = 32 bits
[ 172.550321] SET = 0, FnV = 0
[ 172.553427] EA = 0, S1PTW = 0
[ 172.556646] FSC = 0x06: level 2 translation fault
[ 172.561604] Data abort info:
[ 172.564563] ISV = 0, ISS = 0x00000006
[ 172.568466] CM = 0, WnR = 0
[ 172.571502] user pgtable: 4k pages, 48-bit VAs, pgdp=00000020a4b34000
[ 172.578058] [0000000000000020] pgd=08000020a4ce6003, p4d=08000020a4ce6003,
pud=08000020a4b4d003, pmd=0000000000000000
[ 172.588785] Internal error: Oops: 96000006 [#1] PREEMPT_RT SMP
[ 172.594641] Modules linked in: r8168(O) bcmdhd(O) ossmod(O) tipc(O)
[ 172.600933] CPU: 1 PID: 548 Comm: lldpd Tainted: G O
[ 172.610795] Hardware name: LS1028A RDB Board (DT)
[ 172.615508] pstate: 80000005 (Nzcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 172.622492] pc : dsa_master_get_sset_count+0x24/0xa4
[ 172.627475] lr : ethtool_get_drvinfo+0x8c/0x210
[ 172.632020] sp : ffff80000c233a90
[ 172.635338] x29: ffff80000c233a90 x28: ffff67ad21e45a00 x27: 0000000000000000
[ 172.642498] x26: 0000000000000000 x25: 0000ffffd1102110 x24: 0000000000000000
[ 172.649657] x23: 00020100001149a9 x22: 0000ffffd1102110 x21: 0000000000000000
[ 172.656816] x20: 0000000000000000 x19: ffff67ad00bbe000 x18: 0000000000000000
[ 172.663974] x17: 0000000000000000 x16: 0000000000000000 x15: 0000ffffd1102110
[ 172.671132] x14: ffffffffffffffff x13: 30322e344230342e x12: 33302e37564c4547
[ 172.678290] x11: 0000000000000020 x10: 0101010101010101 x9 : ffffd837fcebe6fc
[ 172.685448] x8 : 0101010101010101 x7 : 6374656e655f6c73 x6 : 74656e655f6c7366
[ 172.692606] x5 : ffff80000c233b01 x4 : ffffd837fdae0251 x3 : 0000000000000063
[ 172.699764] x2 : ffffd837fd076da0 x1 : 0000000000000000 x0 : ffff67ad00bbe000
[ 172.706923] Call trace:
[ 172.709371] dsa_master_get_sset_count+0x24/0xa4
[ 172.714000] ethtool_get_drvinfo+0x8c/0x210
[ 172.718193] dev_ethtool+0x780/0x2120
[ 172.721863] dev_ioctl+0x1b0/0x580
[ 172.725273] sock_do_ioctl+0xc0/0x100
[ 172.728944] sock_ioctl+0x130/0x3c0
[ 172.732440] __arm64_sys_ioctl+0xb4/0x100
[ 172.736460] invoke_syscall+0x50/0x120
[ 172.740219] el0_svc_common.constprop.0+0x4c/0xf4
[ 172.744936] do_el0_svc+0x2c/0xa0
[ 172.748257] el0_svc+0x20/0x60
[ 172.751318] el0t_64_sync_handler+0xe8/0x114
[ 172.755599] el0t_64_sync+0x180/0x184
[ 172.759271] Code: a90153f3 2a0103f4 a9025bf5 f9418015 (f94012b6)
[ 172.765383] ---[ end trace 0000000000000002 ]---
Root Cause
==========
Analysis of linux-next-6.13.0-rc3 reveals that the
dsa_conduit_get_sset_count() function accesses members of
a structure pointed to by cpu_dp without checking
if cpu_dp is a null pointer. This can lead to a kernel panic
if cpu_dp is NULL.
static int dsa_conduit_get_sset_count(struct net_device *dev,
int sset)
{
struct dsa_port *cpu_dp = dev->dsa_ptr;
const struct ethtool_ops *ops = cpu_dp->orig_ethtool_ops;
struct dsa_switch *ds = cpu_dp->ds;
...
}
dev->dsa_ptr is set to NULL in both the dsa_switch_shutdown and
dsa_conduit_teardown functions. When the DSA module unloads,
dsa_conduit_ethtool_teardown(dev) restores the original copy of the DSA
device's ethtool_ops using "dev->ethtool_ops = cpu_dp->orig_ethtool_ops;"
before setting dev->dsa_ptr to NULL. This ensures that ethtool_ops
remains accessible after DSA unloading. However, dsa_switch_shutdown does
not restore the original copy of the DSA device's ethtool_ops, potentially
leading to a null pointer dereference of dsa_ptr and subsequently a system
panic.
Solution
========
In the kernel's dsa_switch_shutdown function, before dp->conduit->dsa_ptr
is set to NULL, the dsa_conduit_ethtool_shutdown function is called to
restore the DSA master's ethtool_ops pointer to its original value.
This prevents the kernel from entering the DSA ethtool_ops flow even if
the user executes ethtool, thus avoiding the null pointer dereference issue
with dsa_ptr.
Signed-off-by: Peilin He<he.peilin@....com.cn>
Co-developed-by: xu xin <xu.xin16@....com.cn>
Signed-off-by: xu xin <xu.xin16@....com.cn>
Signed-off-by: Kun Jiang <jiang.kun2@....com.cn>
Cc: Fan Yu <fan.yu9@....com.cn>
Cc: Yutan Qiu <qiu.yutan@....com.cn>
Cc: Yaxin Wang <wang.yaxin@....com.cn>
Cc: tuqiang <tu.qiang35@....com.cn>
Cc: Yang Yang <yang.yang29@....com.cn>
Cc: ye xingchen <ye.xingchen@....com.cn>
Cc: Yunkai Zhang <zhang.yunkai@....com.cn>
---
net/dsa/dsa.c | 15 ++++++++++++++-
1 file changed, 14 insertions(+), 1 deletion(-)
diff --git a/net/dsa/dsa.c b/net/dsa/dsa.c
index 5a7c0e565a89..5eee0c436848 100644
--- a/net/dsa/dsa.c
+++ b/net/dsa/dsa.c
@@ -1561,6 +1561,17 @@ void dsa_unregister_switch(struct dsa_switch *ds)
}
EXPORT_SYMBOL_GPL(dsa_unregister_switch);
+static void dsa_conduit_ethtool_shutdown(struct net_device *dev)
+{
+ struct dsa_port *cpu_dp = dev->dsa_ptr;
+
+ if (netif_is_lag_master(dev))
+ return;
+
+ dev->ethtool_ops = cpu_dp->orig_ethtool_ops;
+ cpu_dp->orig_ethtool_ops = NULL;
+}
+
/* If the DSA conduit chooses to unregister its net_device on .shutdown, DSA is
* blocking that operation from completion, due to the dev_hold taken inside
* netdev_upper_dev_link. Unlink the DSA user interfaces from being uppers of
@@ -1595,8 +1606,10 @@ void dsa_switch_shutdown(struct dsa_switch *ds)
/* Disconnect from further netdevice notifiers on the conduit,
* since netdev_uses_dsa() will now return false.
*/
- dsa_switch_for_each_cpu_port(dp, ds)
+ dsa_switch_for_each_cpu_port(dp, ds) {
+ dsa_conduit_ethtool_shutdown(dp->conduit);
dp->conduit->dsa_ptr = NULL;
+ }
rtnl_unlock();
out:
--
2.25.1
Powered by blists - more mailing lists