netdev - RE: [PATCH] net: txgbe: fix GPIO interrupt blocking

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <038101da6638$bf8cd310$3ea67930$@trustnetic.com>
Date: Fri, 23 Feb 2024 17:14:47 +0800
From: Jiawen Wu <jiawenwu@...stnetic.com>
To: "'Andrew Lunn'" <andrew@...n.ch>
Cc: <davem@...emloft.net>,
	<edumazet@...gle.com>,
	<kuba@...nel.org>,
	<pabeni@...hat.com>,
	<maciej.fijalkowski@...el.com>,
	<netdev@...r.kernel.org>,
	<mengyuanlou@...-swift.com>
Subject: RE: [PATCH] net: txgbe: fix GPIO interrupt blocking

On Thu, Feb 22, 2024 11:08 PM, Andrew Lunn wrote:
> > There are flags passed in sfp.c:
> >
> > err = devm_request_threaded_irq(sfp->dev, sfp->gpio_irq[i],
> > 				NULL, sfp_irq,
> > 				IRQF_ONESHOT |
> > 				IRQF_TRIGGER_RISING |
> > 				IRQF_TRIGGER_FALLING,
> > 				sfp_irq_name, sfp);
> 
> Does you hardware support edges for GPIOs? And by that, i mean the
> whole chain of interrupt controllers? So your GPIO controller notices
> an edge in the GPIO. It then passed a notification to the interrupt
> controller within the GPIO controller. It then sets a bit to indicate
> an interrupt has happened. At that point you have a level
> interrupt. That bit causes a level interrupt to the interrupt
> controller above in the chain. And it needs to be level all way up.

My hardware is required to configure GPIOs as edge-sensitive.
But I think I got something wrong. There were two problems to be solved
in this patch:

1) The register of GPIO interrupt status is masked before MAC IRQ enabled.

This is because of hardware deficiency. I need to manually clear the interrupt
status before using them. Otherwise, GPIO interrupts will never be reported
again. So there is a workaround for clearing interrupts to set GPIOs EOI in
txgbe_up_complete().

2) GPIO EOI is not set to clear interrupt status after handling the interrupt,
it  should be done in chip->irq_ack, but this ops is not called.

This is because I used handle_nested_irq() in txgbe_gpio_irq_handler() to
handle the IRQ of specific GPIO line. Since the IRQ is requested as threaded
IRQ and only action->thread_fn is created in sfp.c, the highlevel irq-events
handler (handle_level_irq() or handle_edge_irq() set in gpio irq chip) is not
called. Both level and edge type will call chip->irq_ack, but they are not called.

So I should use generic_handle_domain_irq() instead of handle_nested_irq()
to handle GPIO IRQ. But there is call trace when I do it,

[   86.784113] ------------[ cut here ]------------
[   86.784114] irq 154 handler irq_default_primary_handler+0x0/0x10 enabled interrupts
[   86.784122] WARNING: CPU: 0 PID: 3383 at kernel/irq/handle.c:161 __handle_irq_event_percpu+0x150/0x1a0
[   86.784125] Modules linked in: i2c_designware_platform sfp i2c_designware_core txgbe libwx fuse vfat fat nouveau
snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec_hdmi snd_hda_intel intel_rapl_msr snd_intel_dspcfg intel_rapl_common
snd_hda_codec snd_hda_core edac_mce_amd snd_hwdep eeepc_wmi crc32_pclmul asus_wmi snd_pcm ledtrig_audio platform_profile
ghash_clmulni_intel sparse_keymap snd_seq_dummy sha512_ssse3 rfkill wmi_bmof drm_gpuvm snd_seq_oss mxm_wmi drm_exec snd_seq_midi
binfmt_misc snd_seq_midi_event gpu_sched snd_rawmidi aesni_intel i2c_algo_bit crypto_simd bridge cryptd snd_seq drm_display_helper
snd_seq_device drm_ttm_helper snd_timer ttm stp drm_kms_helper snd llc acpi_cpufreq k10temp ccp video soundcore wmi squashfs loop
sch_fq_codel drm parport_pc ppdev lp parport ramoops reed_solomon ip_tables ext4 mbcache jbd2 mdio_i2c nvme ahci nvme_core libahci
t10_pi i2c_piix4 libata pcs_xpcs crc32c_intel crc64_rocksoft i2c_core crc64 crc_t10dif r8169 crct10dif_generic crct10dif_pclmul
phylink
[   86.784200]  crct10dif_common realtek [last unloaded: i2c_designware_core]
[   86.784204] CPU: 0 PID: 3383 Comm: irq/126-eth%d Not tainted 6.8.0-rc1+ #147
[   86.784206] Hardware name: System manufacturer System Product Name/PRIME X570-P, BIOS 4403 04/28/2022
[   86.784207] RIP: 0010:__handle_irq_event_percpu+0x150/0x1a0
[   86.784210] Code: 44 00 00 e9 09 ff ff ff 80 3d 17 15 7e 01 00 75 1b 48 8b 13 44 89 ee 48 c7 c7 f8 e7 82 a8 c6 05 01 15 7e 01 01
e8 00 d8 f6 ff <0f> 0b fa 0f 1f 44 00 00 e9 fc fe ff ff f0 48 0f ba 6b 40 01 0f 82
[   86.784211] RSP: 0018:ffffa04300d6bd68 EFLAGS: 00010282
[   86.784213] RAX: 0000000000000000 RBX: ffff8defda7bc000 RCX: 0000000000000000
[   86.784214] RDX: 0000000000000002 RSI: ffffffffa88644c3 RDI: 00000000ffffffff
[   86.784215] RBP: 0000000000000002 R08: 0000000000000000 R09: ffffa04300d6bc00
[   86.784216] R10: 0000000000000001 R11: 0000000000000001 R12: 0000000000000000
[   86.784217] R13: 000000000000009a R14: ffff8def8ff50a00 R15: ffff8defda7bc300
[   86.784219] FS:  0000000000000000(0000) GS:ffff8df68ea00000(0000) knlGS:0000000000000000
[   86.784220] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   86.784221] CR2: 000000c00081f000 CR3: 00000001821b6000 CR4: 0000000000750ef0
[   86.784222] PKRU: 55555554
[   86.784223] Call Trace:
[   86.784225]  <TASK>
[   86.784227]  ? __warn+0x80/0x130
[   86.784231]  ? __handle_irq_event_percpu+0x150/0x1a0
[   86.784233]  ? report_bug+0x1f4/0x200
[   86.784236]  ? srso_alias_return_thunk+0x5/0xfbef5
[   86.784240]  ? handle_bug+0x42/0x70
[   86.784243]  ? exc_invalid_op+0x14/0x70
[   86.784245]  ? asm_exc_invalid_op+0x16/0x20
[   86.784249]  ? __handle_irq_event_percpu+0x150/0x1a0
[   86.784251]  ? __handle_irq_event_percpu+0x150/0x1a0
[   86.784253]  ? __pfx_irq_thread_fn+0x10/0x10
[   86.784255]  handle_irq_event_percpu+0x10/0x50
[   86.784257]  handle_irq_event+0x34/0x60
[   86.784260]  handle_level_irq+0xa5/0x120
[   86.784263]  handle_irq_desc+0x3a/0x50
[   86.784266]  txgbe_gpio_irq_handler+0x82/0x140 [txgbe]
[   86.784271]  ? __pfx_irq_thread_fn+0x10/0x10
[   86.784273]  handle_nested_irq+0xaf/0x100
[   86.784275]  txgbe_misc_irq_handle+0x60/0x80 [txgbe]
[   86.784279]  irq_thread_fn+0x20/0x60
[   86.784282]  irq_thread+0xe2/0x190
[   86.784284]  ? srso_alias_return_thunk+0x5/0xfbef5
[   86.784286]  ? __pfx_irq_thread_dtor+0x10/0x10
[   86.784288]  ? __pfx_irq_thread+0x10/0x10
[   86.784290]  kthread+0xf0/0x120
[   86.784294]  ? __pfx_kthread+0x10/0x10
[   86.784296]  ret_from_fork+0x30/0x50
[   86.784299]  ? __pfx_kthread+0x10/0x10
[   86.784301]  ret_from_fork_asm+0x1b/0x30
[   86.784306]  </TASK>
[   86.784307] ---[ end trace 0000000000000000 ]---

This is due to default primary handler in the irq action chain.
I'm not quite sure what I dig here, am I missing any flags?