[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CALXmgOG9-uyQaELJQwkAajrtMpJezLbPYpKnwnFOiA6rjZzC1w@mail.gmail.com>
Date: Fri, 5 Jul 2024 12:04:44 -0400
From: Divyaank Tiwari <dtiwari@...stonybrook.edu>
To: Dave Kleikamp <dave.kleikamp@...cle.com>
Cc: jfs-discussion@...ts.sourceforge.net, linux-kernel@...r.kernel.org,
brauner@...nel.org, jack@...e.cz, axboe@...nel.dk, jinpu.wang@...os.com,
dchinner@...hat.com, lizhi.xu@...driver.com, johannes.thumshirn@....com,
Yifei Liu <yifeliu@...stonybrook.edu>, Erez Zadok <ezk@...stonybrook.edu>,
Scott Smolka <sas@...stonybrook.edu>, Geoff Kuenning <geoff@...hmc.edu>, shaggy@...nel.org
Subject: Re: fs/jfs: Reproduction of kernel oops bug in JFS
Thank you for your prompt response, Dave.
We understand that you may not have the necessary bandwidth to
investigate this immediately, and we appreciate your acknowledgment of
the issue. To assist in debugging, our group can provide you with VMs
pre-installed with our Replayer for testing, among other resources.
While our previous patch, which swapped the order of kfree and
mutex_unlock, may not be a definitive fix as you indicated, we hope it
can at least offer some insight into the nature of the bug.
Thank you once again for your response and your continued support.
Best regards,
File Systems and Storage Lab (FSL)
Stony Brook University
On Mon, Jul 1, 2024 at 4:40 PM Dave Kleikamp <dave.kleikamp@...cle.com> wrote:
>
> I appreciate the time and effort you've put into this so far.
>
> Though I'm the only maintainer of jfs, I still don't have a lot of time
> right now to put into investigating this. I will keep track of the issue
> and try to find the time to look into it, but I can't promise how soon
> that will be.
>
> Again, thanks! I do appreciate the effort.
>
> Shaggy
>
> On 7/1/24 2:54PM, Divyaank Tiwari wrote:
> > On 10th April, 2024 we submitted a patch to the JFS mailing list for
> > the lmLogOpen() function in the jfs_logmgr.c file. The patch moved a
> > NULL pointer dereference of a log pointer in the txEnd() function in
> > jfs_txnmgr.c, from being before a mutex_unlock to after the
> > mutex_unlock, effectively postponing a kernel oops. This was confirmed
> > by us through our file system Model Checking tool, Metis (Link:
> > https://github.com/sbu-fsl/Metis). (Reference to our older patch:
> > https://lkml.org/lkml/2024/4/10/1396.)
> >
> > In response to that patch, Dave Kleikamp informed us that while the
> > above patch looked completely safe, it was unclear whether it fixed
> > anything, as releasing the lock shouldn't leave any references to the
> > sbi->log data structure. (We noted that all other instances of kfree
> > of that log pointer happened before the mutex unlock; hence we felt
> > our patch was at least partially “correct.”) He clarified that calling
> > kfree() is unrelated to the value of sbi->log (which would be the
> > cause of the NULL dereference issue). Lastly, the patch might only
> > alter the timing by holding the lock longer, but since nothing should
> > reference the log anymore, holding the lock while calling kfree() may
> > not prevent a use-after-free issue. Regarding Dave's previous inquiry
> > about our experiments, we confirmed that we are not using an external
> > journal for JFS, nor does any file system share a journal.
> >
> > Although we have still not been able to pinpoint the exact source of
> > the aforementioned bug, we have developed a program that can reproduce
> > it with high probability on the latest release of Linux Kernel
> > (v6.9.4). This program, which we call a Replayer, is open-sourced,
> > and we have extensively documented its usage. (Link to the repository:
> > https://github.com/sbu-fsl/Metis-Replayer/tree/main.)
> >
> > To briefly summarize our Replayer: from an execution of Metis that
> > caused a kernel oops in JFS, we first extracted two logs—one
> > containing a sequence of all operations up to that point and the other
> > listing the relative paths of the initial files and directories in the
> > mounted file system. We wrote a simple script, setup_jfs.sh, to create
> > a JFS file system on a 16 MiB ramdisk. (A ramdisk has a much faster
> > I/O rate than a regular device.) Our Replayer (replay.c), implemented
> > in about 700 lines of C code, pre-populates the files and directories
> > and then executes each operation from the sequence log
> > (jfs_op_sequence.log) line by line.
> >
> > We suspect the bug is a race of sorts. Specifically, the Replayer
> > always unmounts and remounts JFS in between EACH file system operation
> > (present in the sequence file generated by Metis, while model checking
> > a file system). We are unable to reproduce this bug without the
> > pair of unmount+mount in between each file system operation. Hence
> > our suspicion is that the bug is a race between (1) some file system
> > operation adding a record to the log, while (2) the file system is
> > being unmounted and the txn log is being shut-down.
> >
> > Due to the bug's nondeterministic nature, we have found that running
> > all operations (823,178 in total) from the log file in a loop for 500
> > iterations, results in a high probability of reproducing the bug
> > within a day. In our experiments, we encountered the bug after about
> > 60-300 iterations. Correspondingly, the time taken to trigger the bug
> > ranged from about 9 to 75 hours (on our VM). For reference, our
> > repository includes a dmesg trace (dmesg_jfs_kernel_oops_trace.txt,
> > reproduced below) captured when the kernel crash was triggered during
> > one of our experiments using our replayer.
> >
> > Jun 22 03:56:08 hostname kernel: BUG: kernel NULL pointer dereference,
> > address: 00000000000000a4
> > Jun 22 03:56:08 hostname kernel: #PF: supervisor write access in kernel mode
> > Jun 22 03:56:08 hostname kernel: #PF: error_code(0x0002) - not-present page
> > Jun 22 03:56:08 hostname kernel: PGD 0 P4D 0
> > Jun 22 03:56:08 hostname kernel: Oops: 0002 [#1] PREEMPT SMP NOPTI
> > Jun 22 03:56:08 hostname kernel: CPU: 3 PID: 582 Comm: jfsCommit Not
> > tainted 6.9.4 #1
> > Jun 22 03:56:08 hostname kernel: Hardware name: VMware, Inc. VMware
> > Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00
> > 11/12/2020
> > Jun 22 03:56:08 hostname kernel: RIP: 0010:txEnd+0x8d/0x1e0 [jfs]
> > Jun 22 03:56:08 hostname kernel: Code: 03 01 0f 85 88 00 00 00 83 3d
> > ae af 00 00 03 7f 6a 66 83 7b 28 00 0f 85 3b 01 00 00 8b 05 bb 16 01
> > 00 66 89 43 28 41 0f b7 c5 <41> 83 ac 24 a4 00 00 00 01 89 05 a4 16 01
> > 00 75 14 f0 41 80 64 24
> > Jun 22 03:56:08 hostname kernel: RSP: 0018:ffffaf9300adbe50 EFLAGS: 00010246
> > Jun 22 03:56:08 hostname kernel: RAX: 0000000000000008 RBX:
> > ffffaf9301213500 RCX: 0000000000000000
> > Jun 22 03:56:08 hostname kernel: RDX: 0000000000000000 RSI:
> > 0000000000000246 RDI: ffffaf9301213530
> > Jun 22 03:56:08 hostname kernel: RBP: ffffaf9300adbe68 R08:
> > 0000000000000000 R09: ffff94c3ffba3640
> > Jun 22 03:56:08 hostname kernel: R10: 0000000000000001 R11:
> > 0000000000000000 R12: 0000000000000000
> > Jun 22 03:56:08 hostname kernel: R13: 0000000000000008 R14:
> > ffff9485d28cbcf0 R15: ffffaf9301213500
> > Jun 22 03:56:08 hostname kernel: FS: 0000000000000000(0000)
> > GS:ffff94c3ffb80000(0000) knlGS:0000000000000000
> > Jun 22 03:56:08 hostname kernel: CS: 0010 DS: 0000 ES: 0000 CR0:
> > 0000000080050033
> > Jun 22 03:56:08 hostname kernel: CR2: 00000000000000a4 CR3:
> > 00000001d7f1c004 CR4: 0000000000770ef0
> > Jun 22 03:56:08 hostname kernel: PKRU: 55555554
> > Jun 22 03:56:08 hostname kernel: Call Trace:
> > Jun 22 03:56:08 hostname kernel: <TASK>
> > Jun 22 03:56:08 hostname kernel: ? show_regs+0x6d/0x80
> > Jun 22 03:56:08 hostname kernel: ? __die+0x29/0x70
> > Jun 22 03:56:08 hostname kernel: ? page_fault_oops+0x151/0x520
> > Jun 22 03:56:08 hostname kernel: ? do_user_addr_fault+0x325/0x6b0
> > Jun 22 03:56:08 hostname kernel: ? exc_page_fault+0x7c/0x190
> > Jun 22 03:56:08 hostname kernel: ? asm_exc_page_fault+0x2b/0x30
> > Jun 22 03:56:08 hostname kernel: ? txEnd+0x8d/0x1e0 [jfs]
> > Jun 22 03:56:08 hostname kernel: jfs_lazycommit+0x292/0x380 [jfs]
> > Jun 22 03:56:08 hostname kernel: ? __pfx_default_wake_function+0x10/0x10
> > Jun 22 03:56:08 hostname kernel: ? __pfx_jfs_lazycommit+0x10/0x10 [jfs]
> > Jun 22 03:56:08 hostname kernel: kthread+0xf5/0x130
> > Jun 22 03:56:08 hostname kernel: ? __pfx_kthread+0x10/0x10
> > Jun 22 03:56:08 hostname kernel: ret_from_fork+0x3d/0x60
> > Jun 22 03:56:08 hostname kernel: ? __pfx_kthread+0x10/0x10
> > Jun 22 03:56:08 hostname kernel: ret_from_fork_asm+0x1a/0x30
> > Jun 22 03:56:08 hostname kernel: </TASK>
> > Jun 22 03:56:08 hostname kernel: Modules linked in: tls brd
> > vsock_loopback vmw_vsock_virtio_transport_common intel_rapl_msr
> > intel_rapl_common vmw_vsock_vmci_transport vsock sunrpc
> > intel_uncore_frequency_common binfmt_misc nfit rapl vmw_balloon joydev
> > input_leds serio_raw vmw_vmci mac_hid sch_fq_codel dm_multipath
> > scsi_dh_rdac scsi_dh_emc jfs scsi_dh_alua nls_ucs2_utils msr
> > efi_pstore ip_tables x_tables autofs4 btrfs blake2b_generic raid10
> > raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor
> > raid6_pq libcrc32c raid1 raid0 vmwgfx drm_ttm_helper ttm
> > drm_kms_helper crct10dif_pclmul crc32_pclmul ghash_clmulni_intel
> > sha512_ssse3 sha256_ssse3 drm sha1_ssse3 ahci mptspi psmouse mptscsih
> > mptbase vmxnet3 i2c_piix4 libahci scsi_transport_spi pata_acpi
> > aesni_intel crypto_simd cryptd [last unloaded: brd]
> > Jun 22 03:56:08 hostname kernel: CR2: 00000000000000a4
> > Jun 22 03:56:08 hostname kernel: ---[ end trace 0000000000000000 ]---
> > Jun 22 03:56:08 hostname kernel: RIP: 0010:txEnd+0x8d/0x1e0 [jfs]
> > Jun 22 03:56:08 hostname kernel: Code: 03 01 0f 85 88 00 00 00 83 3d
> > ae af 00 00 03 7f 6a 66 83 7b 28 00 0f 85 3b 01 00 00 8b 05 bb 16 01
> > 00 66 89 43 28 41 0f b7 c5 <41> 83 ac 24 a4 00 00 00 01 89 05 a4 16 01
> > 00 75 14 f0 41 80 64 24
> > Jun 22 03:56:08 hostname kernel: RSP: 0018:ffffaf9300adbe50 EFLAGS: 00010246
> > Jun 22 03:56:08 hostname kernel: RAX: 0000000000000008 RBX:
> > ffffaf9301213500 RCX: 0000000000000000
> > Jun 22 03:56:08 hostname kernel: RDX: 0000000000000000 RSI:
> > 0000000000000246 RDI: ffffaf9301213530
> > Jun 22 03:56:08 hostname kernel: RBP: ffffaf9300adbe68 R08:
> > 0000000000000000 R09: ffff94c3ffba3640
> > Jun 22 03:56:08 hostname kernel: R10: 0000000000000001 R11:
> > 0000000000000000 R12: 0000000000000000
> > Jun 22 03:56:08 hostname kernel: R13: 0000000000000008 R14:
> > ffff9485d28cbcf0 R15: ffffaf9301213500
> > Jun 22 03:56:08 hostname kernel: FS: 0000000000000000(0000)
> > GS:ffff94c3ffb80000(0000) knlGS:0000000000000000
> > Jun 22 03:56:08 hostname kernel: CS: 0010 DS: 0000 ES: 0000 CR0:
> > 0000000080050033
> > Jun 22 03:56:08 hostname kernel: CR2: 00000000000000a4 CR3:
> > 00000001d7f1c004 CR4: 0000000000770ef0
> > Jun 22 03:56:08 hostname kernel: PKRU: 55555554
> > Jun 22 03:56:08 hostname kernel: note: jfsCommit[582] exited with irqs disabled
> > Jun 22 03:56:08 hostname kernel: note: jfsCommit[582] exited with
> > preempt_count 1
> > Jun 22 03:56:32 hostname kernel: watchdog: BUG: soft lockup - CPU#6
> > stuck for 26s! [replay:257577]
> >
> > Further details about setting up the environment and conducting the
> > experiment are available in our repository’s README. We hope that our
> > replayer will assist the JFS maintainers in debugging and resolving
> > this kernel oops bug.
> >
> > Note: All of our experiments were performed on Linux kernels v6.6.1 and v6.9.4
> >
> > Signed-off-by: Divyaank Tiwari <dtiwari@...stonybrook.edu>
> > Signed-off-by: Yifei Liu <yifeliu@...stonybrook.edu>
> > Signed-off-by: Erez Zadok <ezk@...stonybrook.edu>
> > Signed-off-by: Scott Smolka <sas@...stonybrook.edu>
> > Signed-off-by: Geoff Kuenning <geoff@...hmc.edu>
Powered by blists - more mailing lists