[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4a893f88b6892502a5f7a61bcfc806a271a730a9.camel@web.de>
Date: Sun, 18 May 2025 14:12:04 +0200
From: Bert Karwatzki <spasswolf@....de>
To: Jason Xing <kerneljasonxing@...il.com>
Cc: Johannes Berg <johannes@...solutions.net>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"linux-next@...r.kernel.org" <linux-next@...r.kernel.org>,
"llvm@...ts.linux.dev" <llvm@...ts.linux.dev>, Thomas Gleixner
<tglx@...utronix.de>, linux-wireless@...r.kernel.org, spasswolf@....de
Subject: Re: lockup and kernel panic in linux-next-202505{09,12} when
compiled with clang
Am Sonntag, dem 18.05.2025 um 09:30 +0800 schrieb Jason Xing:
> Hi Bert,
>
> Thanks for your report and analysis!
>
> On Sun, May 18, 2025 at 3:49 AM Bert Karwatzki <spasswolf@....de> wrote:
> >
> > Am Samstag, dem 17.05.2025 um 13:34 +0200 schrieb Bert Karwatzki:
> > > Am Freitag, dem 16.05.2025 um 20:19 +0200 schrieb Bert Karwatzki:
> > > > I've added a debugging statement:
> > > >
> > > > diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c
> > > > index 3bd5ee0995fe..853493eca4f5 100644
> > > > --- a/net/mac80211/tx.c
> > > > +++ b/net/mac80211/tx.c
> > > > @@ -4586,7 +4586,11 @@ static noinline void ieee80211_8023_xmit_clang_debug_helper(struct sk_buff *skb,
>
> What is the caller of it? It's the function that you customized?
The only caller of ieee80211_8023_xmit_clang_debug_helper() is
ieee80211_8023_xmit(). I did this because I thought clang might
have been producing incorrect code at the time, but it turned
out clang did nothing wrong.
>
> > > > struct ieee80211_local *local,
> > > > struct ieee80211_tx_info *info)
> > > > {
> > > > - if (unlikely(skb->sk && sock_flag(skb->sk, SOCK_WIFI_STATUS))) {
> > > > + if (unlikely(skb->sk && ((skb_shinfo(skb)->tx_flags & SKBTX_WIFI_STATUS) ||
> > > > + sock_flag(skb->sk, SOCK_WIFI_STATUS)))) {
> > > > + if ((skb_shinfo(skb)->tx_flags & SKBTX_WIFI_STATUS) ^ sock_flag(skb->sk, SOCK_WIFI_STATUS))
> > > > + printk(KERN_INFO "%s: skb_shinfo(skb)->tx_flags & SKBTX_WIFI_STATUS = %u sock_flag(skb->sk,
> > > > SOCK_WIFI_STATUS) = %u\n",
> > > > + __func__, (skb_shinfo(skb)->tx_flags & SKBTX_WIFI_STATUS), sock_flag(skb->sk,
> > > > SOCK_WIFI_STATUS));
> > > > info->status_data = ieee80211_store_ack_skb(local, skb,
> > > > &info->flags, NULL);
> > > > if (info->status_data)
> > > >
> > > > This gives the following logoutput (and a lockup), indicating that sock_flag(skb->sk, SOCK_WIFI_STATUS) and
> > > > (skb_shinfo(skb)->tx_flags & SKBTX_WIFI_STATUS) are actually NOT equivalent (when compiled with clang and
> > > > PREEMPT_RT=y)
>
> Moving skc_flags out of the union can solve the issue, right? Simple
> modification looks like this:
> diff --git a/include/net/sock.h b/include/net/sock.h
> index 3e15d7105ad2..5810c7b80507 100644
> --- a/include/net/sock.h
> +++ b/include/net/sock.h
> @@ -189,13 +189,13 @@ struct sock_common {
>
> atomic64_t skc_cookie;
>
> + unsigned long skc_flags;
> /* following fields are padding to force
> * offset(struct sock, sk_refcnt) == 128 on 64bit arches
> * assuming IPV6 is enabled. We use this padding differently
> * for different kind of 'sockets'
> */
> union {
> - unsigned long skc_flags;
> struct sock *skc_listener; /* request_sock */
> struct inet_timewait_death_row *skc_tw_dr; /*
> inet_timewait_sock */
> };
>
> Can you give it a try?
I thought this would work, but applying this patch on both on next-20250513 and
next-20250516 gives the usual kernel panic (captured via netconsole) or the lockup
(which I'm not repeating here ~1000 lines).
[ 199.627464][ T580] Oops: general protection fault, probably for non-canonical address 0xff510aa8ab572985: 0000 [#1] SMP NOPTI
[ 199.627475][ T580] CPU: 8 UID: 0 PID: 580 Comm: napi/phy0-0 Not tainted 6.15.0-rc6-next-20250513-llvm-00005-gdd968010bbfa #993 PREEMPT_{RT,(full)}
[ 199.627481][ T580] Hardware name: Micro-Star International Co., Ltd. Alpha 15 B5EEK/MS-158L, BIOS E158LAMS.10F 11/11/2024
[ 199.627484][ T580] RIP: 0010:queued_spin_lock_slowpath+0x120/0x1c0
[ 199.627494][ T580] Code: c8 c1 e8 10 66 87 47 02 66 85 c0 74 40 0f b7 c0 89 c6 83 e6 03 c1 e6 04 83 e0 fc 49 c7 c0 f8 ff ff ff 49 8b 84 40 a0 fa 98 ab <48>
89 94 06 c0 21 06 ac 83 7a 08 00 75 08 f3 90 83 7a 08 00 74 f8
[ 199.627497][ T580] RSP: 0018:ffffd0c301e77998 EFLAGS: 00010006
[ 199.627501][ T580] RAX: ff510aa8ff5107b5 RBX: 0000000000000286 RCX: 0000000000240000
[ 199.627503][ T580] RDX: ffff8a716e8231c0 RSI: 0000000000000010 RDI: ffff8a64c7ed35f8
[ 199.627505][ T580] RBP: ffff8a62c8751200 R08: fffffffffffffff8 R09: 0000000000000001
[ 199.627507][ T580] R10: 0000000000000001 R11: ffffffffab1f0820 R12: ffff8a64c7ed35e0
[ 199.627509][ T580] R13: ffff8a62cbaf2480 R14: ffff8a64c7ed35f8 R15: ffff8a64c7ed35f8
[ 199.627511][ T580] FS: 0000000000000000(0000) GS:ffff8a71c27c1000(0000) knlGS:0000000000000000
[ 199.627513][ T580] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 199.627515][ T580] CR2: 00007fbadcaec0b0 CR3: 00000007fc23a000 CR4: 0000000000750ef0
[ 199.627518][ T580] PKRU: 55555554
[ 199.627519][ T580] Call Trace:
[ 199.627522][ T580] <TASK>
[ 199.627525][ T580] _raw_spin_lock_irqsave+0x57/0x60
[ 199.627531][ T580] rt_spin_lock+0x73/0xa0
[ 199.627536][ T580] sock_queue_err_skb+0xdc/0x140
[ 199.627542][ T580] skb_complete_wifi_ack+0xa9/0x120
[ 199.627551][ T580] ieee80211_report_used_skb+0x541/0x6e0 [mac80211]
[ 199.627598][ T580] ? srso_alias_return_thunk+0x5/0xfbef5
[ 199.627604][ T580] ? srso_alias_return_thunk+0x5/0xfbef5
[ 199.627608][ T580] ieee80211_tx_status_ext+0x3b3/0x870 [mac80211]
[ 199.627636][ T580] ? srso_alias_return_thunk+0x5/0xfbef5
[ 199.627638][ T580] ? rt_spin_lock+0x3d/0xa0
[ 199.627646][ T580] ? mt76_tx_status_unlock+0x38/0x230 [mt76]
[ 199.627657][ T580] mt76_tx_status_unlock+0x1e0/0x230 [mt76]
[ 199.627668][ T580] __mt76_tx_complete_skb+0x13b/0x2e0 [mt76]
[ 199.627676][ T580] ? srso_alias_return_thunk+0x5/0xfbef5
[ 199.627679][ T580] ? rt_spin_unlock+0x12/0x40
[ 199.627682][ T580] ? srso_alias_return_thunk+0x5/0xfbef5
[ 199.627688][ T580] mt76_connac2_txwi_free+0x127/0x150 [mt76_connac_lib]
[ 199.627698][ T580] mt7921_mac_tx_free+0x112/0x260 [mt7921_common]
[ 199.627708][ T580] mt7921_rx_check+0x33/0xe0 [mt7921_common]
[ 199.627715][ T580] mt76_dma_rx_poll+0x322/0x660 [mt76]
[ 199.627725][ T580] ? mt792x_poll_rx+0x2a/0x120 [mt792x_lib]
[ 199.627732][ T580] mt792x_poll_rx+0x71/0x120 [mt792x_lib]
[ 199.627739][ T580] __napi_poll+0x2a/0x170
[ 199.627743][ T580] ? napi_threaded_poll_loop+0x32/0x1b0
[ 199.627746][ T580] napi_threaded_poll_loop+0xe4/0x1b0
[ 199.627749][ T580] ? napi_threaded_poll_loop+0x32/0x1b0
[ 199.627751][ T580] ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
[ 199.627757][ T580] napi_threaded_poll+0x57/0x80
[ 199.627760][ T580] ? __pfx_napi_threaded_poll+0x10/0x10
[ 199.627763][ T580] kthread+0x25c/0x280
[ 199.627769][ T580] ? __pfx_kthread+0x10/0x10
[ 199.627773][ T580] ret_from_fork+0xc4/0x1b0
[ 199.627777][ T580] ? __pfx_kthread+0x10/0x10
[ 199.627781][ T580] ret_from_fork_asm+0x1a/0x30
[ 199.627788][ T580] </TASK>
[ 199.627789][ T580] Modules linked in: sd_mod scsi_mod scsi_common netconsole ccm snd_seq_dummy snd_hrtimer snd_seq_midi snd_rawmidi snd_seq_midi_event
snd_seq snd_seq_device rfcomm bnep nls_ascii nls_cp437 vfat fat snd_ctl_led snd_hda_codec_realtek snd_hda_scodec_component snd_hda_codec_generic
snd_hda_codec_hdmi btusb btbcm btintel btrtl snd_hda_intel btmtk snd_intel_dspcfg snd_hda_codec snd_soc_dmic snd_acp3x_rn snd_acp3x_pdm_dma snd_hwdep bluetooth
snd_hda_core snd_soc_core uvcvideo videobuf2_vmalloc videobuf2_memops snd_pcm_oss uvc videobuf2_v4l2 snd_mixer_oss videodev snd_pcm snd_rn_pci_acp3x
snd_acp_config videobuf2_common snd_timer snd_soc_acpi msi_wmi ecdh_generic ecc sparse_keymap mc wmi_bmof edac_mce_amd snd k10temp snd_pci_acp3x ccp soundcore
battery ac button joydev hid_sensor_accel_3d hid_sensor_magn_3d hid_sensor_prox hid_sensor_als hid_sensor_gyro_3d hid_sensor_trigger hid_sensor_iio_common
industrialio_triggered_buffer kfifo_buf amd_pmc evdev industrialio mt7921e mt
May 18 13:22:44 7921_common mt792x_lib mt76_connac_lib mt76
[ 199.627877][ T580] mac80211 libarc4 cfg80211 rfkill msr fuse nvme_fabrics efi_pstore configfs efivarfs autofs4 ext4 mbcache jbd2 amdgpu usbhid
drm_panel_backlight_quirks cec drm_buddy drm_suballoc_helper drm_exec i2c_algo_bit drm_display_helper gpu_sched drm_ttm_helper hid_sensor_hub ttm xhci_pci
hid_multitouch mfd_core hid_generic xhci_hcd i2c_hid_acpi drm_client_lib usbcore psmouse amd_sfh i2c_hid drm_kms_helper nvme hid serio_raw nvme_core amdxcp
r8169 i2c_piix4 crc16 usb_common i2c_smbus i2c_designware_platform i2c_designware_core
[ 199.627931][ T580] ---[ end trace 0000000000000000 ]---
[ 199.781799][ T580] RIP: 0010:queued_spin_lock_slowpath+0x120/0x1c0
[ 199.781799][ T580] Code: c8 c1 e8 10 66 87 47 02 66 85 c0 74 40 0f b7 c0 89 c6 83 e6 03 c1 e6 04 83 e0 fc 49 c7 c0 f8 ff ff ff 49 8b 84 40 a0 fa 98 ab <48>
89 94 06 c0 21 06 ac 83 7a 08 00 75 08 f3 90 83 7a 08 00 74 f8
[ 199.781799][ T580] RSP: 0018:ffffd0c301e77998 EFLAGS: 00010006
[ 199.781799][ T580] RAX: ff510aa8ff5107b5 RBX: 0000000000000286 RCX: 0000000000240000
[ 199.781799][ T580] RDX: ffff8a716e8231c0 RSI: 0000000000000010 RDI: ffff8a64c7ed35f8
[ 199.781799][ T580] RBP: ffff8a62c8751200 R08: fffffffffffffff8 R09: 0000000000000001
[ 199.781799][ T580] R10: 0000000000000001 R11: ffffffffab1f0820 R12: ffff8a64c7ed35e0
[ 199.781799][ T580] R13: ffff8a62cbaf2480 R14: ffff8a64c7ed35f8 R15: ffff8a64c7ed35f8
[ 199.781799][ T580] FS: 0000000000000000(0000) GS:ffff8a71c27c1000(0000) knlGS:0000000000000000
[ 199.781799][ T580] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 199.781799][ T580] CR2: 00007fbadcaec0b0 CR3: 00000007fc23a000 CR4: 0000000000750ef0
[ 199.781799][ T580] PKRU: 55555554
[ 199.781799][ T580] Kernel panic - not syncing: Fatal exception in interrupt
[ 199.788541][ T580] Kernel Offset: 0x29800000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[ 199.788541][ T580] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---
I even tried this version of your patch, to keep the offset of skc_refcnt at 128,
but it doesn't work, either.
commit fca84c5cde713be480544a64ed6680afc3319670
Author: Bert Karwatzki <spasswolf@....de>
Date: Sun May 18 13:32:36 2025 +0200
include: net: sock: move skc_flags out of the union
Signed-off-by: Bert Karwatzki <spasswolf@....de>
diff --git a/include/net/sock.h b/include/net/sock.h
index 3e15d7105ad2..e73929a4da6e 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -195,7 +195,6 @@ struct sock_common {
* for different kind of 'sockets'
*/
union {
- unsigned long skc_flags;
struct sock *skc_listener; /* request_sock */
struct inet_timewait_death_row *skc_tw_dr; /* inet_timewait_sock */
};
@@ -221,6 +220,9 @@ struct sock_common {
};
refcount_t skc_refcnt;
+
+ /* place skc_flags here to keep offset(struct sock, sk_refcnt) == 128 */
+ unsigned long skc_flags;
/* private: */
int skc_dontcopy_end[0];
union {
> > >
> > > I've added more debugging output:
> > >
> > > diff --git a/include/net/sock.h b/include/net/sock.h
> > > index e223102337c7..e13560b5b7a8 100644
> > > --- a/include/net/sock.h
> > > +++ b/include/net/sock.h
> > > @@ -2735,8 +2735,10 @@ static inline void _sock_tx_timestamp(struct sock *sk,
> > > *tskey = atomic_inc_return(&sk->sk_tskey) - 1;
> > > }
> > > }
> > > - if (unlikely(sock_flag(sk, SOCK_WIFI_STATUS)))
> > > + if (unlikely(sock_flag(sk, SOCK_WIFI_STATUS))) {
> > > + printk(KERN_INFO "%s: setting SKBTX_WIFI_STATUS for sk = %px\n", __func__, sk);
> > > *tx_flags |= SKBTX_WIFI_STATUS;
> > > + }
> > > }
> > >
> > > static inline void sock_tx_timestamp(struct sock *sk,
> > > diff --git a/net/core/sock.c b/net/core/sock.c
> > > index e02a78538e3e..f6589ad5ba36 100644
> > > --- a/net/core/sock.c
> > > +++ b/net/core/sock.c
> > > @@ -1548,6 +1548,7 @@ int sk_setsockopt(struct sock *sk, int level, int optname,
> > > break;
> > >
> > > case SO_WIFI_STATUS:
> > > + printk(KERN_INFO "%s: setting SOCK_WIFI_STATUS to %u for sk = %px\n", __func__, valbool, sk);
> > > sock_valbool_flag(sk, SOCK_WIFI_STATUS, valbool);
> > > break;
> > >
> > > diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c
> > > index 853493eca4f5..eee2f80949c6 100644
> > > --- a/net/mac80211/tx.c
> > > +++ b/net/mac80211/tx.c
> > > @@ -4588,9 +4588,12 @@ static noinline void ieee80211_8023_xmit_clang_debug_helper(struct sk_buff *skb,
> > > {
> > > if (unlikely(skb->sk && ((skb_shinfo(skb)->tx_flags & SKBTX_WIFI_STATUS) ||
> > > sock_flag(skb->sk, SOCK_WIFI_STATUS)))) {
> > > - if ((skb_shinfo(skb)->tx_flags & SKBTX_WIFI_STATUS) ^ sock_flag(skb->sk, SOCK_WIFI_STATUS))
> > > + if ((skb_shinfo(skb)->tx_flags & SKBTX_WIFI_STATUS) ^ sock_flag(skb->sk, SOCK_WIFI_STATUS)) {
> > > printk(KERN_INFO "%s: skb_shinfo(skb)->tx_flags & SKBTX_WIFI_STATUS = %u sock_flag(skb->sk, SOCK_WIFI_STATUS) = %u\n",
> > > __func__, (skb_shinfo(skb)->tx_flags & SKBTX_WIFI_STATUS), sock_flag(skb->sk, SOCK_WIFI_STATUS));
> > > + printk(KERN_INFO "%s: skb->sk = %px skb->sk->sk_flags = 0x%lx\n", __func__, skb->sk, skb->sk->sk_flags);
> > > + return; // This should make this case non-fatal.
> > > + }
> > > info->status_data = ieee80211_store_ack_skb(local, skb,
> > > &info->flags, NULL);
> > > if (info->status_data)
> > >
> > >
> > >
> > > This gives after ~15min uptime
> > >
> > > [ 189.337797] [ T576] ieee80211_8023_xmit_clang_debug_helper: skb_shinfo(skb)->tx_flags & SKBTX_WIFI_STATUS = 0 sock_flag(skb->sk, SOCK_WIFI_STATUS) = 1
> > > [ 189.337803] [ T576] ieee80211_8023_xmit_clang_debug_helper: skb->sk = ffff8c1b798c4e00 skb->sk->sk_flags = 0xffffffffb4efe640
> > > [ 191.325256] [ T576] ieee80211_8023_xmit_clang_debug_helper: skb_shinfo(skb)->tx_flags & SKBTX_WIFI_STATUS = 0 sock_flag(skb->sk, SOCK_WIFI_STATUS) = 1
> > > [ 191.325259] [ T576] ieee80211_8023_xmit_clang_debug_helper: skb->sk = ffff8c1b798c5a00 skb->sk->sk_flags = 0xffffffffb4efe640
> > > [ 257.591831] [ T576] ieee80211_8023_xmit_clang_debug_helper: skb_shinfo(skb)->tx_flags & SKBTX_WIFI_STATUS = 0 sock_flag(skb->sk, SOCK_WIFI_STATUS) = 1
> > > [ 257.591844] [ T576] ieee80211_8023_xmit_clang_debug_helper: skb->sk = ffff8c1baf3bca00 skb->sk->sk_flags = 0xffffffffb4efe640
> > > [ 301.786963] [ T576] ieee80211_8023_xmit_clang_debug_helper: skb_shinfo(skb)->tx_flags & SKBTX_WIFI_STATUS = 0 sock_flag(skb->sk, SOCK_WIFI_STATUS) = 1
> > > [ 301.786967] [ T576] ieee80211_8023_xmit_clang_debug_helper: skb->sk = ffff8c1c1bc40100 skb->sk->sk_flags = 0xffffffffb4efe640
> > > [ 302.780881] [ T576] ieee80211_8023_xmit_clang_debug_helper: skb_shinfo(skb)->tx_flags & SKBTX_WIFI_STATUS = 0 sock_flag(skb->sk, SOCK_WIFI_STATUS) = 1
> > > [ 302.780884] [ T576] ieee80211_8023_xmit_clang_debug_helper: skb->sk = ffff8c1a44cf6000 skb->sk->sk_flags = 0xffffffffb4efe640
> > > [ 482.792298] [ T576] ieee80211_8023_xmit_clang_debug_helper: skb_shinfo(skb)->tx_flags & SKBTX_WIFI_STATUS = 0 sock_flag(skb->sk, SOCK_WIFI_STATUS) = 1
> > > [ 482.792304] [ T576] ieee80211_8023_xmit_clang_debug_helper: skb->sk = ffff8c1da0f4de00 skb->sk->sk_flags = 0xffffffffb4efe640
> > > [ 482.806144] [ T576] ieee80211_8023_xmit_clang_debug_helper: skb_shinfo(skb)->tx_flags & SKBTX_WIFI_STATUS = 0 sock_flag(skb->sk, SOCK_WIFI_STATUS) = 1
> > > [ 482.806148] [ T576] ieee80211_8023_xmit_clang_debug_helper: skb->sk = ffff8c1da0f4c500 skb->sk->sk_flags = 0xffffffffb4efe640
> > > [ 482.817280] [ T576] ieee80211_8023_xmit_clang_debug_helper: skb_shinfo(skb)->tx_flags & SKBTX_WIFI_STATUS = 0 sock_flag(skb->sk, SOCK_WIFI_STATUS) = 1
> > > [ 482.817284] [ T576] ieee80211_8023_xmit_clang_debug_helper: skb->sk = ffff8c1da0f4df00 skb->sk->sk_flags = 0xffffffffb4efe640
> > > [ 552.327291] [ T576] ieee80211_8023_xmit_clang_debug_helper: skb_shinfo(skb)->tx_flags & SKBTX_WIFI_STATUS = 0 sock_flag(skb->sk, SOCK_WIFI_STATUS) = 1
> > > [ 552.327295] [ T576] ieee80211_8023_xmit_clang_debug_helper: skb->sk = ffff8c1da0f4de00 skb->sk->sk_flags = 0xffffffffb4efe640
> > > [ 916.971599] [ T576] ieee80211_8023_xmit_clang_debug_helper: skb_shinfo(skb)->tx_flags & SKBTX_WIFI_STATUS = 0 sock_flag(skb->sk, SOCK_WIFI_STATUS) = 1
> > > [ 916.971607] [ T576] ieee80211_8023_xmit_clang_debug_helper: skb->sk = ffff8c1a62834000 skb->sk->sk_flags = 0xffffffffb4efe640
> > >
> > > The printk()s in sk_set_sockopt() and _sock_tx_timestamp() are not called at all so the flag
> > > SOCK_WIFI_STATUS is actually nevers set! What is printed when printing skb->sk->sk_flags looks
> > > suspiciously like a pointer, and as sk_flags is actually a member of a union in struct sock_common
> > > it seems clang is using sk_flags for one of the other union members here
> > >
> > > struct sock_common {
> > > [...]
> > > union {
> > > unsigned long skc_flags;
> > > struct sock *skc_listener; /* request_sock */
> > > struct inet_timewait_death_row *skc_tw_dr; /* inet_timewait_sock */
> > > };
> > > [...]
> > > }
> > >
> > > Bert Karwatzki
> >
> > I added even more debugging output and found out why commit 76a853f86c97 (" wifi: free
> > SKBTX_WIFI_STATUS skb tx_flags flag") does not work.
> >
> > diff --git a/include/net/sock.h b/include/net/sock.h
> > index e13560b5b7a8..6e1291d2e5a1 100644
> > --- a/include/net/sock.h
> > +++ b/include/net/sock.h
> > @@ -2738,6 +2738,8 @@ static inline void _sock_tx_timestamp(struct sock *sk,
> > if (unlikely(sock_flag(sk, SOCK_WIFI_STATUS))) {
> > printk(KERN_INFO "%s: setting SKBTX_WIFI_STATUS for sk = %px\n", __func__, sk);
> > *tx_flags |= SKBTX_WIFI_STATUS;
> > + } else {
> > + printk(KERN_INFO "%s: NOT setting SKBTX_WIFI_STATUS for sk = %px\n", __func__, sk);
> > }
> > }
> >
> > diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
> > index 20915895bdaa..4913b09c0617 100644
> > --- a/net/ipv4/inet_connection_sock.c
> > +++ b/net/ipv4/inet_connection_sock.c
> > @@ -912,6 +912,7 @@ reqsk_alloc_noprof(const struct request_sock_ops *ops, struct sock *sk_listener,
> > return NULL;
> > }
> > req->rsk_listener = sk_listener;
> > + printk(KERN_INFO "%s: sk_listener = %px\n", __func__, sk_listener);
> > }
> > req->rsk_ops = ops;
> > req_to_sk(req)->sk_prot = sk_listener->sk_prot;
> > @@ -986,6 +987,7 @@ static struct request_sock *inet_reqsk_clone(struct request_sock *req,
> > nreq_sk->sk_incoming_cpu = req_sk->sk_incoming_cpu;
> >
> > nreq->rsk_listener = sk;
> > + printk(KERN_INFO "%s: rsk_listener =%px\n", __func__, sk);
> >
> > /* We need not acquire fastopenq->lock
> > * because the child socket is locked in inet_csk_listen_stop().
> > diff --git a/net/ipv4/inet_timewait_sock.c b/net/ipv4/inet_timewait_sock.c
> > index 67efe9501581..1a3108ec7503 100644
> > --- a/net/ipv4/inet_timewait_sock.c
> > +++ b/net/ipv4/inet_timewait_sock.c
> > @@ -190,6 +190,7 @@ struct inet_timewait_sock *inet_twsk_alloc(const struct sock *sk,
> > const struct inet_sock *inet = inet_sk(sk);
> >
> > tw->tw_dr = dr;
> > + printk(KERN_INFO "%s: sk = %px tw_dr = %px\n", __func__, sk, dr);
> > /* Give us an identity. */
> > tw->tw_daddr = inet->inet_daddr;
> > tw->tw_rcv_saddr = inet->inet_rcv_saddr;
> > diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c
> > index eee2f80949c6..227b86427e06 100644
> > --- a/net/mac80211/tx.c
> > +++ b/net/mac80211/tx.c
> > @@ -4586,6 +4586,8 @@ static noinline void ieee80211_8023_xmit_clang_debug_helper(struct sk_buff *skb,
> > struct ieee80211_local *local,
> > struct ieee80211_tx_info *info)
> > {
> > + if (skb->sk)
> > + printk(KERN_INFO "%s: skb->sk = %px skb->sk->sk_flags = 0x%lx\n", __func__, skb->sk, skb->sk->sk_flags);
> > if (unlikely(skb->sk && ((skb_shinfo(skb)->tx_flags & SKBTX_WIFI_STATUS) ||
> > sock_flag(skb->sk, SOCK_WIFI_STATUS)))) {
> > if ((skb_shinfo(skb)->tx_flags & SKBTX_WIFI_STATUS) ^ sock_flag(skb->sk, SOCK_WIFI_STATUS)) {
> >
> >
> > This monitor the value of skb->sk->sk_flags not only in the error case but in all cases, and also monitors
> > the places where the other members of the sk_flags union are set. The error occurs when at the start
> > of ieee80211_8023_xmit_clang_debug_helper() sk_flags is not actually the skc_flags member of the union
> > but insted is skc_tw_dr which is only interpreted is flags.
> > So why does it work with gcc but fail with clang? sock_flag(skb->sk, SOCK_WIFI_STATUS) test bit 19 of
> > skb->sk->sk_flags
>
> Could you say more about this? I don't follow it. Why does the gcc
> test just miss the crash issue? Is there anything (like call trace)
> different between them?
>
I think it is just pointer lottery, the pointer in the gcc version has bit
19 not set while the pointer in the clang version has bit 19 set. Why this is
always the case, I don't know, there is KASLR active after all.
By the way, the pointer value that is incorrectly used as sk_flags set
in inet_twsk_alloc() (called by tcp_time_wait()):
struct inet_timewait_sock *inet_twsk_alloc(const struct sock *sk,
struct inet_timewait_death_row *dr,
const int state)
{
struct inet_timewait_sock *tw;
if (refcount_read(&dr->tw_refcount) - 1 >=
READ_ONCE(dr->sysctl_max_tw_buckets))
return NULL;
tw = kmem_cache_alloc(sk->sk_prot_creator->twsk_prot->twsk_slab,
GFP_ATOMIC);
if (tw) {
const struct inet_sock *inet = inet_sk(sk);
tw->tw_dr = dr; // This is incorrectly use as sk_flags!xXX
> My worry is that all the callers calling sock_flag might have such
> potential risk...
>
> Thanks,
> Jason
I'd worry that, too. How can callers of sock_flag() know which part of the union is
active? At least for debugging purposes one could add a bool to struct sock_common
which is false by default and gets set to true when the pointer members of the union are
set, e.g. in inet_twsk_alloc():
struct inet_timewait_sock *inet_twsk_alloc(const struct sock *sk,
struct inet_timewait_death_row *dr,
const int state)
{
struct inet_timewait_sock *tw;
if (refcount_read(&dr->tw_refcount) - 1 >=
READ_ONCE(dr->sysctl_max_tw_buckets))
return NULL;
tw = kmem_cache_alloc(sk->sk_prot_creator->twsk_prot->twsk_slab,
GFP_ATOMIC);
if (tw) {
const struct inet_sock *inet = inet_sk(sk);
tw->tw_dr = dr;
tw->is_pointer = true;
>
Bert Karwatzki
Powered by blists - more mailing lists