[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <YQJ5c5+wWlmubxaG@shinobu>
Date: Thu, 29 Jul 2021 18:48:35 +0900
From: William Breathitt Gray <vilhelm.gray@...il.com>
To: jic23@...nel.org
Cc: linux-stm32@...md-mailman.stormreply.com, kernel@...gutronix.de,
a.fatoum@...gutronix.de, kamel.bouhara@...tlin.com,
gwendal@...omium.org, alexandre.belloni@...tlin.com,
david@...hnology.com, linux-iio@...r.kernel.org,
linux-kernel@...r.kernel.org, linux-arm-kernel@...ts.infradead.org,
syednwaris@...il.com, patrick.havelange@...ensium.com,
fabrice.gasnier@...com, mcoquelin.stm32@...il.com,
alexandre.torgue@...com, o.rempel@...gutronix.de,
jarkko.nikula@...ux.intel.com
Subject: Re: [PATCH v13 00/17] Introduce the Counter character device
interface
On Wed, Jul 28, 2021 at 05:51:45PM +0900, William Breathitt Gray wrote:
> On Tue, Jul 13, 2021 at 06:53:04PM +0900, William Breathitt Gray wrote:
> > Suppose I open the chrdev from a userspace application and keep it open,
> > but then remove the respective driver and Counter subsystem module from
> > my system. The devm_counter_release() and counter_exit() functions will
> > be called as expected; the counter_chrdev_release() function will not be
> > called yet, but that is expected because the chrdev is still open by
> > userspace. If I try to break out of my userspace application, I expect
> > counter_chrdev_release() to finally be called, but this does not happen.
> > Instead, my userspace application stalls and I see the following error
> > in my dmesg:
> >
> > [ 172.859570] BUG: unable to handle page fault for address: ffffffffc09ae298
> > [ 172.859594] #PF: supervisor read access in kernel mode
> > [ 172.859598] #PF: error_code(0x0000) - not-present page
> > [ 172.859603] PGD 23615067 P4D 23615067 PUD 23617067 PMD 1029ad067 PTE 0
> > [ 172.859623] Oops: 0000 [#1] SMP NOPTI
> > [ 172.859629] CPU: 2 PID: 2485 Comm: counter_example Not tainted 5.13.0+ #1
> > [ 172.859640] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS d55cb5a 04/01/2014
> > [ 172.859645] RIP: 0010:filp_close+0x29/0x70
> > [ 172.859662] Code: 90 0f 1f 44 00 00 55 48 89 e5 41 56 41 55 41 54 48 8b 47 38 48 85 c0 0f 84 b7 40 86 00 48 8b 47 28 49 89 fc 49 89 f5 45 31 f6 <48> 8b 40 78 48 85 c0 74 08 ff d0 0f 1f 00 41 89 c6 41 f6 44 24 45
> > [ 172.859669] RSP: 0018:ffffad31c0ee7cb0 EFLAGS: 00010246
> > [ 172.859675] RAX: ffffffffc09ae220 RBX: 0000000000000001 RCX: 0000000000000001
> > [ 172.859680] RDX: ffff9a43829708e0 RSI: ffff9a4382970840 RDI: ffff9a43821f4f00
> > [ 172.859684] RBP: ffffad31c0ee7cc8 R08: 0000000000000001 R09: 0000000000000001
> > [ 172.859687] R10: ffffffffffff4d00 R11: ffff9a43933c6e10 R12: ffff9a43821f4f00
> > [ 172.859691] R13: ffff9a4382970840 R14: 0000000000000000 R15: 0000000000000003
> > [ 172.859694] FS: 0000000000000000(0000) GS:ffff9a44b7d00000(0000) knlGS:0000000000000000
> > [ 172.859699] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [ 172.859704] CR2: ffffffffc09ae298 CR3: 0000000023610001 CR4: 0000000000370ee0
> > [ 172.859713] Call Trace:
> > [ 172.859730] put_files_struct+0x73/0xd0
> > [ 172.859738] exit_files+0x49/0x50
> > [ 172.859743] do_exit+0x33b/0xa20
> > [ 172.859751] do_group_exit+0x3b/0xb0
> > [ 172.859758] get_signal+0x16f/0x8b0
> > [ 172.859766] ? _copy_to_user+0x20/0x30
> > [ 172.859774] ? put_timespec64+0x3d/0x60
> > [ 172.859784] arch_do_signal_or_restart+0xf3/0x850
> > [ 172.859794] ? hrtimer_nanosleep+0x9f/0x120
> > [ 172.859802] ? __hrtimer_init+0xd0/0xd0
> > [ 172.859808] exit_to_user_mode_prepare+0x122/0x1b0
> > [ 172.859816] syscall_exit_to_user_mode+0x27/0x50
> > [ 172.859825] do_syscall_64+0x48/0xc0
> > [ 172.859831] entry_SYSCALL_64_after_hwframe+0x44/0xae
> > [ 172.859839] RIP: 0033:0x7f07f8b9951a
> > [ 172.859850] Code: Unable to access opcode bytes at RIP 0x7f07f8b994f0.
> > [ 172.859853] RSP: 002b:00007ffc0d12c230 EFLAGS: 00000246 ORIG_RAX: 00000000000000e6
> > [ 172.859860] RAX: fffffffffffffdfc RBX: ffffffffffffff01 RCX: 00007f07f8b9951a
> > [ 172.859863] RDX: 00007ffc0d12c2b0 RSI: 0000000000000000 RDI: 0000000000000000
> > [ 172.859867] RBP: 0000000000000000 R08: 0000000000000000 R09: 00007ffc0d12c1c6
> > [ 172.859871] R10: 00007ffc0d12c2b0 R11: 0000000000000246 R12: 00007ffc0d12c2b0
> > [ 172.859874] R13: 00007ffc0d12c2b0 R14: 0000000000000000 R15: 0000000000000000
> > [ 172.859886] Modules linked in: intel_rapl_msr intel_rapl_common kvm_intel kvm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel nls_iso8859_1 aesni_intel crypto_simd cryptd rapl drm_ttm_helper ttm uvcvideo drm_kms_helper videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_common input_leds syscopyarea videodev sysfillrect sysimgblt fb_sys_fops cec rc_core joydev mc drm serio_raw mac_hid qemu_fw_cfg sch_fq_codel msr parport_pc ppdev lp parport virtio_rng ip_tables x_tables autofs4 hid_generic usbhid hid virtio_net psmouse net_failover i2c_piix4 virtio_blk failover pata_acpi floppy [last unloaded: counter]
> > [ 172.859995] CR2: ffffffffc09ae298
> > [ 172.860009] ---[ end trace e7d3d7da1a73b8f4 ]---
> > [ 172.860013] RIP: 0010:filp_close+0x29/0x70
> > [ 172.860021] Code: 90 0f 1f 44 00 00 55 48 89 e5 41 56 41 55 41 54 48 8b 47 38 48 85 c0 0f 84 b7 40 86 00 48 8b 47 28 49 89 fc 49 89 f5 45 31 f6 <48> 8b 40 78 48 85 c0 74 08 ff d0 0f 1f 00 41 89 c6 41 f6 44 24 45
> > [ 172.860027] RSP: 0018:ffffad31c0ee7cb0 EFLAGS: 00010246
> > [ 172.860031] RAX: ffffffffc09ae220 RBX: 0000000000000001 RCX: 0000000000000001
> > [ 172.860034] RDX: ffff9a43829708e0 RSI: ffff9a4382970840 RDI: ffff9a43821f4f00
> > [ 172.860038] RBP: ffffad31c0ee7cc8 R08: 0000000000000001 R09: 0000000000000001
> > [ 172.860041] R10: ffffffffffff4d00 R11: ffff9a43933c6e10 R12: ffff9a43821f4f00
> > [ 172.860044] R13: ffff9a4382970840 R14: 0000000000000000 R15: 0000000000000003
> > [ 172.860047] FS: 0000000000000000(0000) GS:ffff9a44b7d00000(0000) knlGS:0000000000000000
> > [ 172.860052] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [ 172.860056] CR2: ffffffffc09ae298 CR3: 0000000023610001 CR4: 0000000000370ee0
> > [ 172.860073] Fixing recursive fault but reboot is needed!
> >
> > It looks like faults in filp_close() before counter_chrdev_release() is
> > called. Is this issue manifesting because counter_exit() was called
> > earlier while the chrdev was still open?
>
> After giving this some more thought I realized that once the counter
> module is unloaded, the counter_fops callbacks are lost and thus the
> fops for the chrdev are no longer be valid. This means that
> counter_chrdev_release will need to be called before the counter module
> is unloaded. How can I guarantee this if a userspace application may
> still have the chrdev indefinitely open? In counter_chrdev_remove(),
> should I walk through the open chrdevs and release each one?
>
> William Breathitt Gray
Sorry for all the noise, I had simply forgotten to define the owner
member of counter_fops structure to THIS_MODULE. I'll make that fix and
also add in some conditional checks to protect the callbacks so they
return -ENODEV if the counter has been unregistered. With that I expect
to have a v14 submission ready some time this coming week.
William Breathitt Gray
Download attachment "signature.asc" of type "application/pgp-signature" (834 bytes)
Powered by blists - more mailing lists