lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Message-ID: <BY3PR18MB4612A5906D64C3DBACFAAECDAB469@BY3PR18MB4612.namprd18.prod.outlook.com> Date: Thu, 25 May 2023 15:27:03 +0000 From: Manish Chopra <manishc@...vell.com> To: Jiri Pirko <jiri@...nulli.us> CC: "kuba@...nel.org" <kuba@...nel.org>, "netdev@...r.kernel.org" <netdev@...r.kernel.org>, Ariel Elior <aelior@...vell.com>, Alok Prasad <palok@...vell.com>, Sudarsana Reddy Kalluru <skalluru@...vell.com>, David Miller <davem@...emloft.net> Subject: RE: [EXT] Re: [PATCH v5 net] qede: Fix scheduling while atomic Hi Jiri, > -----Original Message----- > From: Jiri Pirko <jiri@...nulli.us> > Sent: Wednesday, May 24, 2023 5:01 PM > To: Manish Chopra <manishc@...vell.com> > Cc: kuba@...nel.org; netdev@...r.kernel.org; Ariel Elior > <aelior@...vell.com>; Alok Prasad <palok@...vell.com>; Sudarsana Reddy > Kalluru <skalluru@...vell.com>; David Miller <davem@...emloft.net> > Subject: [EXT] Re: [PATCH v5 net] qede: Fix scheduling while atomic > > External Email > > ---------------------------------------------------------------------- > Tue, May 23, 2023 at 04:42:35PM CEST, manishc@...vell.com wrote: > >Bonding module collects the statistics while holding the spinlock, > >beneath that qede->qed driver statistics flow gets scheduled out due to > >usleep_range() used in PTT acquire logic which results into below bug > >and traces - > > > >[ 3673.988874] Hardware name: HPE ProLiant DL365 Gen10 Plus/ProLiant > >DL365 Gen10 Plus, BIOS A42 10/29/2021 [ 3673.988878] Call Trace: > >[ 3673.988891] dump_stack_lvl+0x34/0x44 [ 3673.988908] > >__schedule_bug.cold+0x47/0x53 [ 3673.988918] __schedule+0x3fb/0x560 [ > >3673.988929] schedule+0x43/0xb0 [ 3673.988932] > >schedule_hrtimeout_range_clock+0xbf/0x1b0 > >[ 3673.988937] ? __hrtimer_init+0xc0/0xc0 [ 3673.988950] > >usleep_range+0x5e/0x80 [ 3673.988955] qed_ptt_acquire+0x2b/0xd0 [qed] > >[ 3673.988981] _qed_get_vport_stats+0x141/0x240 [qed] [ 3673.989001] > >qed_get_vport_stats+0x18/0x80 [qed] [ 3673.989016] > >qede_fill_by_demand_stats+0x37/0x400 [qede] [ 3673.989028] > >qede_get_stats64+0x19/0xe0 [qede] [ 3673.989034] > >dev_get_stats+0x5c/0xc0 [ 3673.989045] > >netstat_show.constprop.0+0x52/0xb0 > >[ 3673.989055] dev_attr_show+0x19/0x40 [ 3673.989065] > >sysfs_kf_seq_show+0x9b/0xf0 [ 3673.989076] seq_read_iter+0x120/0x4b0 [ > >3673.989087] new_sync_read+0x118/0x1a0 [ 3673.989095] > >vfs_read+0xf3/0x180 [ 3673.989099] ksys_read+0x5f/0xe0 [ 3673.989102] > >do_syscall_64+0x3b/0x90 [ 3673.989109] > >entry_SYSCALL_64_after_hwframe+0x44/0xae > > You mention "bonding module" at the beginning of this description. Where > exactly is that shown in the trace? > > I guess that the "spinlock" you talk about is "dev_base_lock", isn't it? Bonding function somehow were not part of traces, but this is the flow from bonding module which calls dev_get_stats() under spin_lock_nested(&bond->stats_lock, nest_level) which results to this issue. > > > >[ 3673.989115] RIP: 0033:0x7f8467d0b082 [ 3673.989119] Code: c0 e9 b2 > >fe ff ff 50 48 8d 3d ca 05 08 00 e8 35 e7 01 00 0f 1f 44 00 00 f3 0f 1e > >fa 64 8b 04 25 18 00 00 00 85 c0 75 10 0f 05 <48> 3d 00 f0 ff ff 77 56 > >c3 0f 1f 44 00 00 48 83 ec 28 48 89 54 24 [ 3673.989121] RSP: > >002b:00007ffffb21fd08 EFLAGS: 00000246 ORIG_RAX: 0000000000000000 [ > >3673.989127] RAX: ffffffffffffffda RBX: 000000000100eca0 RCX: > >00007f8467d0b082 [ 3673.989128] RDX: 00000000000003ff RSI: > >00007ffffb21fdc0 RDI: 0000000000000003 [ 3673.989130] RBP: > 00007f8467b96028 R08: 0000000000000010 R09: 00007ffffb21ec00 [ > 3673.989132] R10: 00007ffffb27b170 R11: 0000000000000246 R12: > 00000000000000f0 [ 3673.989134] R13: 0000000000000003 R14: > 00007f8467b92000 R15: 0000000000045a05 > >[ 3673.989139] CPU: 30 PID: 285188 Comm: read_all Kdump: loaded > Tainted: G W OE > > > >Fix this by collecting the statistics asynchronously from a periodic > >delayed work scheduled at default stats coalescing interval and return > >the recent copy of statisitcs from .ndo_get_stats64(), also add ability > >to configure/retrieve stats coalescing interval using below commands - > > > >ethtool -C ethx stats-block-usecs <val> ethtool -c ethx > > > >Fixes: 133fac0eedc3 ("qede: Add basic ethtool support") > >Cc: Sudarsana Kalluru <skalluru@...vell.com> > >Cc: David Miller <davem@...emloft.net> > >Signed-off-by: Manish Chopra <manishc@...vell.com> > >--- > >v1->v2: > > - Fixed checkpatch and kdoc warnings. > >v2->v3: > > - Moving the changelog after tags. > >v3->v4: > > - Changes to collect stats periodically using delayed work > > and add ability to configure/retrieve stats coalescing > > interval using ethtool > > - Modified commit description to reflect the changes > >v4->v5: > > - Renamed the variables (s/ticks/usecs and s/interval/ticks) > > - Relaxed the stats usecs coalescing configuration to allow > > user to set any range of values and also while getting return > > the exact value configured > > - Usage of usecs_to_jiffies() wherever applicable > > - Cosmetic change for logs/comments > >--- > > drivers/net/ethernet/qlogic/qede/qede.h | 4 +++ > > .../net/ethernet/qlogic/qede/qede_ethtool.c | 26 ++++++++++++-- > > drivers/net/ethernet/qlogic/qede/qede_main.c | 35 ++++++++++++++++++- > > 3 files changed, 62 insertions(+), 3 deletions(-) > > > >diff --git a/drivers/net/ethernet/qlogic/qede/qede.h > >b/drivers/net/ethernet/qlogic/qede/qede.h > >index f90dcfe9ee68..8a63f99d499c 100644 > >--- a/drivers/net/ethernet/qlogic/qede/qede.h > >+++ b/drivers/net/ethernet/qlogic/qede/qede.h > >@@ -271,6 +271,10 @@ struct qede_dev { > > #define QEDE_ERR_WARN 3 > > > > struct qede_dump_info dump_info; > >+ struct delayed_work periodic_task; > >+ unsigned long stats_coal_ticks; > >+ u32 stats_coal_usecs; > >+ spinlock_t stats_lock; /* lock for vport stats > access */ > > }; > > > > enum QEDE_STATE { > >diff --git a/drivers/net/ethernet/qlogic/qede/qede_ethtool.c > >b/drivers/net/ethernet/qlogic/qede/qede_ethtool.c > >index 8284c4c1528f..a6498eb7cbd7 100644 > >--- a/drivers/net/ethernet/qlogic/qede/qede_ethtool.c > >+++ b/drivers/net/ethernet/qlogic/qede/qede_ethtool.c > >@@ -426,6 +426,8 @@ static void qede_get_ethtool_stats(struct net_device > *dev, > > } > > } > > > >+ spin_lock(&edev->stats_lock); > >+ > > for (i = 0; i < QEDE_NUM_STATS; i++) { > > if (qede_is_irrelevant_stat(edev, i)) > > continue; > >@@ -435,6 +437,8 @@ static void qede_get_ethtool_stats(struct net_device > *dev, > > buf++; > > } > > > >+ spin_unlock(&edev->stats_lock); > >+ > > __qede_unlock(edev); > > } > > > >@@ -817,6 +821,7 @@ static int qede_get_coalesce(struct net_device > >*dev, > > > > coal->rx_coalesce_usecs = rx_coal; > > coal->tx_coalesce_usecs = tx_coal; > >+ coal->stats_block_coalesce_usecs = edev->stats_coal_usecs; > > > > return rc; > > } > >@@ -830,6 +835,21 @@ int qede_set_coalesce(struct net_device *dev, > struct ethtool_coalesce *coal, > > int i, rc = 0; > > u16 rxc, txc; > > > >+ if (edev->stats_coal_usecs != coal->stats_block_coalesce_usecs) { > >+ bool stats_coal_enabled; > >+ > >+ stats_coal_enabled = edev->stats_coal_usecs ? true : false; > >+ > >+ edev->stats_coal_usecs = coal->stats_block_coalesce_usecs; > >+ edev->stats_coal_ticks = > >+usecs_to_jiffies(coal->stats_block_coalesce_usecs); > >+ > >+ if (!stats_coal_enabled) > >+ schedule_delayed_work(&edev->periodic_task, 0); > > What is the point of schedule here? Don't you want to rather schedule if > (stats_coal_enabled == true) ?? This was for scheduling of periodic task ONLY if previously it was disabled. But actually it does not harm to schedule once always whenever user sets a non-zero usecs. > > > >+ > >+ DP_INFO(edev, "Configured stats coal ticks=%lu jiffies\n", > >+ edev->stats_coal_ticks); > >+ } > >+ > > if (!netif_running(dev)) { > > DP_INFO(edev, "Interface is down\n"); > > return -EINVAL; > >@@ -2236,7 +2256,8 @@ static int qede_get_per_coalesce(struct > >net_device *dev, } > > > > static const struct ethtool_ops qede_ethtool_ops = { > >- .supported_coalesce_params = ETHTOOL_COALESCE_USECS, > >+ .supported_coalesce_params = ETHTOOL_COALESCE_USECS | > >+ > ETHTOOL_COALESCE_STATS_BLOCK_USECS, > > .get_link_ksettings = qede_get_link_ksettings, > > .set_link_ksettings = qede_set_link_ksettings, > > .get_drvinfo = qede_get_drvinfo, > >@@ -2287,7 +2308,8 @@ static const struct ethtool_ops qede_ethtool_ops > >= { }; > > > > static const struct ethtool_ops qede_vf_ethtool_ops = { > >- .supported_coalesce_params = ETHTOOL_COALESCE_USECS, > >+ .supported_coalesce_params = ETHTOOL_COALESCE_USECS | > >+ > ETHTOOL_COALESCE_STATS_BLOCK_USECS, > > .get_link_ksettings = qede_get_link_ksettings, > > .get_drvinfo = qede_get_drvinfo, > > .get_msglevel = qede_get_msglevel, > >diff --git a/drivers/net/ethernet/qlogic/qede/qede_main.c > >b/drivers/net/ethernet/qlogic/qede/qede_main.c > >index 06c6a5813606..61cc10968988 100644 > >--- a/drivers/net/ethernet/qlogic/qede/qede_main.c > >+++ b/drivers/net/ethernet/qlogic/qede/qede_main.c > >@@ -308,6 +308,8 @@ void qede_fill_by_demand_stats(struct qede_dev > >*edev) > > > > edev->ops->get_vport_stats(edev->cdev, &stats); > > > >+ spin_lock(&edev->stats_lock); > >+ > > p_common->no_buff_discards = stats.common.no_buff_discards; > > p_common->packet_too_big_discard = > stats.common.packet_too_big_discard; > > p_common->ttl0_discard = stats.common.ttl0_discard; @@ -405,6 > +407,8 > >@@ void qede_fill_by_demand_stats(struct qede_dev *edev) > > p_ah->tx_1519_to_max_byte_packets = > > stats.ah.tx_1519_to_max_byte_packets; > > } > >+ > >+ spin_unlock(&edev->stats_lock); > > } > > > > static void qede_get_stats64(struct net_device *dev, @@ -413,9 +417,10 > >@@ static void qede_get_stats64(struct net_device *dev, > > struct qede_dev *edev = netdev_priv(dev); > > struct qede_stats_common *p_common; > > > >- qede_fill_by_demand_stats(edev); > > p_common = &edev->stats.common; > > > >+ spin_lock(&edev->stats_lock); > >+ > > stats->rx_packets = p_common->rx_ucast_pkts + p_common- > >rx_mcast_pkts + > > p_common->rx_bcast_pkts; > > stats->tx_packets = p_common->tx_ucast_pkts + p_common- > >tx_mcast_pkts > >+ @@ -435,6 +440,8 @@ static void qede_get_stats64(struct net_device > *dev, > > stats->collisions = edev->stats.bb.tx_total_collisions; > > stats->rx_crc_errors = p_common->rx_crc_errors; > > stats->rx_frame_errors = p_common->rx_align_errors; > >+ > >+ spin_unlock(&edev->stats_lock); > > } > > > > #ifdef CONFIG_QED_SRIOV > >@@ -1000,6 +1007,21 @@ static void qede_unlock(struct qede_dev *edev) > > rtnl_unlock(); > > } > > > >+static void qede_periodic_task(struct work_struct *work) { > >+ struct qede_dev *edev = container_of(work, struct qede_dev, > >+ periodic_task.work); > >+ > >+ if (test_bit(QEDE_SP_DISABLE, &edev->sp_flags)) > >+ return; > >+ > >+ if (edev->stats_coal_usecs) { > > Why don't you cancel the work when you don't want this to happen? Ok. > > > >+ qede_fill_by_demand_stats(edev); > >+ schedule_delayed_work(&edev->periodic_task, > >+ edev->stats_coal_ticks); > >+ } > >+} > >+ > > static void qede_sp_task(struct work_struct *work) { > > struct qede_dev *edev = container_of(work, struct qede_dev, @@ > >-1208,7 +1230,9 @@ static int __qede_probe(struct pci_dev *pdev, u32 > dp_module, u8 dp_level, > > * from there, although it's unlikely]. > > */ > > INIT_DELAYED_WORK(&edev->sp_task, qede_sp_task); > >+ INIT_DELAYED_WORK(&edev->periodic_task, > qede_periodic_task); > > mutex_init(&edev->qede_lock); > >+ spin_lock_init(&edev->stats_lock); > > > > rc = register_netdev(edev->ndev); > > if (rc) { > >@@ -1233,6 +1257,11 @@ static int __qede_probe(struct pci_dev *pdev, > u32 dp_module, u8 dp_level, > > edev->rx_copybreak = QEDE_RX_HDR_SIZE; > > > > qede_log_probe(edev); > >+ > >+ edev->stats_coal_usecs = USEC_PER_SEC; > >+ edev->stats_coal_ticks = usecs_to_jiffies(USEC_PER_SEC); > >+ schedule_delayed_work(&edev->periodic_task, 0); > >+ > > return 0; > > > > err4: > >@@ -1301,6 +1330,7 @@ static void __qede_remove(struct pci_dev *pdev, > enum qede_remove_mode mode) > > unregister_netdev(ndev); > > > > cancel_delayed_work_sync(&edev->sp_task); > >+ cancel_delayed_work_sync(&edev->periodic_task); > > > > edev->ops->common->set_power_state(cdev, PCI_D0); > > > >@@ -2571,6 +2601,9 @@ static void qede_recovery_handler(struct > qede_dev > >*edev) > > > > DP_NOTICE(edev, "Starting a recovery process\n"); > > > >+ /* disable periodic stats */ > >+ edev->stats_coal_usecs = 0; > > You disable but never enable again. Why? > It's getting enabled back when recovery flow completes in below part of changes in the patch. + edev->stats_coal_usecs = USEC_PER_SEC; + edev->stats_coal_ticks = usecs_to_jiffies(USEC_PER_SEC); + schedule_delayed_work(&edev->periodic_task, 0); Although on reset I am resetting values to default load values but maybe I will cleanup here to retain ethtool set values. > Also, why don't you do: > cancel_delayed_work_sync(&edev->periodic_task) > here instead? > Sure. > > >+ > > /* No need to acquire first the qede_lock since is done by > qede_sp_task > > * before calling this function. > > */ > >-- > >2.27.0 > > > >
Powered by blists - more mailing lists