[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20240308110921.26074-D-hca@linux.ibm.com>
Date: Fri, 8 Mar 2024 12:09:21 +0100
From: Heiko Carstens <hca@...ux.ibm.com>
To: Alexandra Winter <wintera@...ux.ibm.com>
Cc: David Miller <davem@...emloft.net>, Jakub Kicinski <kuba@...nel.org>,
Paolo Abeni <pabeni@...hat.com>, Eric Dumazet <edumazet@...gle.com>,
netdev@...r.kernel.org, linux-s390@...r.kernel.org,
Vasily Gorbik <gor@...ux.ibm.com>,
Alexander Gordeev <agordeev@...ux.ibm.com>,
Christian Borntraeger <borntraeger@...ux.ibm.com>,
Sven Schnelle <svens@...ux.ibm.com>,
Thorsten Winkler <twinkler@...ux.ibm.com>
Subject: Re: [PATCH net v2] s390/qeth: handle deferred cc1
On Thu, Mar 07, 2024 at 10:38:27AM +0100, Alexandra Winter wrote:
> The IO subsystem expects a driver to retry a ccw_device_start, when the
> subsequent interrupt response block (irb) contains a deferred
> condition code 1.
>
> Symptoms before this commit:
> On the read channel we always trigger the next read anyhow, so no
> different behaviour here.
> On the write channel we may experience timeout errors, because the
> expected reply will never be received without the retry.
> Other callers of qeth_send_control_data() may wrongly assume that the ccw
> was successful, which may cause problems later.
>
> Note that since
> commit 2297791c92d0 ("s390/cio: dont unregister subchannel from child-drivers")
> and
> commit 5ef1dc40ffa6 ("s390/cio: fix invalid -EBUSY on ccw_device_start")
> deferred CC1s are more likely to occur. See the commit message of the
> latter for more background information.
>
> Fixes: 2297791c92d0 ("s390/cio: dont unregister subchannel from child-drivers")
> Reference-ID: LTC205042
> Signed-off-by: Alexandra Winter <wintera@...ux.ibm.com>
> ---
> v1->v2: correct patch format
>
> drivers/s390/net/qeth_core_main.c | 36 +++++++++++++++++++++++++++++--
> 1 file changed, 34 insertions(+), 2 deletions(-)
This patch seems to introduce a reference count problem, which results
in use-after-free reports like the one below. So, please ignore the
patch - this needs to be sorted out first.
[ 553.017253] ------------[ cut here ]------------
[ 553.017255] refcount_t: addition on 0; use-after-free.
[ 553.017269] WARNING: CPU: 12 PID: 115746 at lib/refcount.c:25 refcount_warn_saturate+0x10e/0x130
[ 553.017275] Modules linked in: kvm algif_hash af_alg nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink dm_service_time mlx5_ib ib_uverbs ib_core uvdevice
+s390_trng eadm_sch vfio_ccw mdev vfio_iommu_type1 vfio sch_fq_codel loop configfs lcs ctcm fsm zfcp scsi_transport_fc mlx5_core ghash_s390 prng chacha_s390 libchacha aes_s390 des_s390 libdes sha3_512_s390 sha3_256_s390 sha512_s390 sha256_s390 sha1_s390 sha_common scsi_dh_rdac scsi_dh_emc scsi_dh_alua pkey
+zcrypt rng_core dm_multipath autofs4
[ 553.017331] Unloaded tainted modules: test_klp_state3(K):1 test_klp_state2(K):4 test_klp_state(K):3 test_klp_callbacks_demo2(K):2 test_klp_callbacks_demo(K):12 test_klp_atomic_replace(K):2 test_klp_livepatch(K):6 [last unloaded: kvm]
[ 553.017405] CPU: 12 PID: 115746 Comm: qeth_recover Tainted: G W K 6.8.0-20240305.rc7.git1.570c73d6a3df.300.fc39.s390x #1
[ 553.017408] Hardware name: IBM 8561 T01 701 (LPAR)
[ 553.017409] Krnl PSW : 0404c00180000000 00000001a53cad02 (refcount_warn_saturate+0x112/0x130)
[ 553.017413] R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3
[ 553.017415] Krnl GPRS: c000000000000027 0000000000000023 000000000000002a 0000000100000000
[ 553.017417] 00000004b38e0900 0000000000000000 0000000095dfc2c0 0000000000001000
[ 553.017419] 00000001a563eb58 000000009938d818 0000000000001007 000000009938d800
[ 553.017421] 0000000095dfac00 0000000000000000 00000001a53cacfe 000003800f24bc10
[ 553.017426] Krnl Code: 00000001a53cacf2: c020004f67d8 larl %r2,00000001a5db7ca2
00000001a53cacf8: c0e5ffc262b4 brasl %r14,00000001a4c17260
#00000001a53cacfe: af000000 mc 0,0
>00000001a53cad02: a7f4ff9a brc 15,00000001a53cac36
00000001a53cad06: 92014008 mvi 8(%r4),1
00000001a53cad0a: c020004f67f6 larl %r2,00000001a5db7cf6
00000001a53cad10: c0e5ffc262a8 brasl %r14,00000001a4c17260
00000001a53cad16: af000000 mc 0,0
[ 553.017439] Call Trace:
[ 553.017440] [<00000001a53cad02>] refcount_warn_saturate+0x112/0x130
[ 553.017443] ([<00000001a53cacfe>] refcount_warn_saturate+0x10e/0x130)
[ 553.017445] [<00000001a563db88>] __qeth_issue_next_read+0x270/0x288
[ 553.017448] [<00000001a563dc38>] qeth_mpc_initialize.constprop.0+0x98/0x738
[ 553.017449] [<00000001a564007e>] qeth_hardsetup_card+0x36e/0xb38
[ 553.017451] [<00000001a56408d8>] qeth_set_online+0x90/0x3a0
[ 553.017453] [<00000001a5640d04>] qeth_do_reset+0x11c/0x1f8
[ 553.017455] [<00000001a4c47f30>] kthread+0x120/0x128
[ 553.017458] [<00000001a4bc3014>] __ret_from_fork+0x3c/0x58
[ 553.017460] [<00000001a58d717a>] ret_from_fork+0xa/0x30
[ 553.017463] Last Breaking-Event-Address:
[ 553.017464] [<00000001a4c172de>] __warn_printk+0x7e/0xf0
[ 553.017467] ---[ end trace 0000000000000000 ]---
[ 553.017474] qeth 0.0.bd00: The qeth device driver failed to recover an error on the device
[ 553.018615] qeth 0.0.bd00: The qeth device driver failed to recover an error on the device
Powered by blists - more mailing lists