[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20230726171930.1632710-1-khorenko@virtuozzo.com>
Date: Wed, 26 Jul 2023 20:19:29 +0300
From: Konstantin Khorenko <khorenko@...tuozzo.com>
To: netdev@...r.kernel.org
Cc: Jakub Kicinski <kuba@...nel.org>,
Manish Chopra <manishc@...vell.com>,
Ariel Elior <aelior@...vell.com>,
David Miller <davem@...emloft.net>,
Sudarsana Kalluru <skalluru@...vell.com>,
Paolo Abeni <pabeni@...hat.com>,
Konstantin Khorenko <khorenko@...tuozzo.com>
Subject: [PATCH 0/1] qed: Yet another scheduling while atomic fix
Running an old RHEL7-based kernel we have got several cases of following
BUG_ON():
BUG: scheduling while atomic: swapper/24/0/0x00000100
[<ffffffffb41c6199>] schedule+0x29/0x70
[<ffffffffb41c5512>] schedule_hrtimeout_range_clock+0xb2/0x150
[<ffffffffb41c55c3>] schedule_hrtimeout_range+0x13/0x20
[<ffffffffb41c3bcf>] usleep_range+0x4f/0x70
[<ffffffffc08d3e58>] qed_ptt_acquire+0x38/0x100 [qed]
[<ffffffffc08eac48>] _qed_get_vport_stats+0x458/0x580 [qed]
[<ffffffffc08ead8c>] qed_get_vport_stats+0x1c/0xd0 [qed]
[<ffffffffc08dffd3>] qed_get_protocol_stats+0x93/0x100 [qed]
qed_mcp_send_protocol_stats
case MFW_DRV_MSG_GET_LAN_STATS:
case MFW_DRV_MSG_GET_FCOE_STATS:
case MFW_DRV_MSG_GET_ISCSI_STATS:
case MFW_DRV_MSG_GET_RDMA_STATS:
[<ffffffffc08e36d8>] qed_mcp_handle_events+0x2d8/0x890 [qed]
qed_int_assertion
qed_int_attentions
[<ffffffffc08d9490>] qed_int_sp_dpc+0xa50/0xdc0 [qed]
[<ffffffffb3aa7623>] tasklet_action+0x83/0x140
[<ffffffffb41d9125>] __do_softirq+0x125/0x2bb
[<ffffffffb41d560c>] call_softirq+0x1c/0x30
[<ffffffffb3a30645>] do_softirq+0x65/0xa0
[<ffffffffb3aa78d5>] irq_exit+0x105/0x110
[<ffffffffb41d8996>] do_IRQ+0x56/0xf0
The situation is clear - tasklet function called schedule, but the fix
is not so trivial.
Checking the mainstream code it seem the same calltrace is still
possible on the latest kernel as well, so here is the fix.
The was a similar case recently for QEDE driver (reading stats through
sysfs) which resulted in the commit:
42510dffd0e2 ("qed/qede: Fix scheduling while atomic")
i tried to implement the same logic as a fix for my case, but failed:
unfortunately it's not clear to me for this particular QED driver case
which statistic to collect in delay works for each particular device and
getting ALL possible stats for all devices, ignoring device type seems
incorrect.
Taking into account that i do not have access to the hardware at all,
the delay work approach is nearly impossible for me.
Thus i have taken the idea from patch v3 - just to provide the context
by the caller:
https://www.spinics.net/lists/netdev/msg901089.html
At least this solution is technically clear and hopefully i did not make
stupid mistakes here.
The patch is COMPILE TESTED ONLY.
i would appreciate if somebody can test the patch. :)
Konstantin Khorenko (1):
qed: Fix scheduling in a tasklet while getting stats
drivers/net/ethernet/qlogic/qed/qed_dev_api.h | 2 ++
drivers/net/ethernet/qlogic/qed/qed_fcoe.c | 19 ++++++++++----
drivers/net/ethernet/qlogic/qed/qed_fcoe.h | 6 +++--
drivers/net/ethernet/qlogic/qed/qed_hw.c | 26 ++++++++++++++++---
drivers/net/ethernet/qlogic/qed/qed_iscsi.c | 19 ++++++++++----
drivers/net/ethernet/qlogic/qed/qed_iscsi.h | 6 +++--
drivers/net/ethernet/qlogic/qed/qed_l2.c | 19 ++++++++++----
drivers/net/ethernet/qlogic/qed/qed_l2.h | 3 +++
drivers/net/ethernet/qlogic/qed/qed_main.c | 6 ++---
9 files changed, 80 insertions(+), 26 deletions(-)
--
2.31.1
Powered by blists - more mailing lists