linux-kernel - Re: Possible kernel fs block code regression in 6.2.3 umounting usb drives

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <ZAuQOHnfa7xGvzKI@sol.localdomain>
Date:   Fri, 10 Mar 2023 12:16:56 -0800
From:   Eric Biggers <ebiggers@...nel.org>
To:     Mike Cloaked <mike.cloaked@...il.com>
Cc:     linux-block@...r.kernel.org, linux-kernel@...r.kernel.org,
        stable@...r.kernel.org
Subject: Re: Possible kernel fs block code regression in 6.2.3 umounting usb
 drives

On Fri, Mar 10, 2023 at 12:14:10PM -0800, Eric Biggers wrote:
> On Fri, Mar 10, 2023 at 07:33:37PM +0000, Mike Cloaked wrote:
> > With kerne. 6.2.3 if I simply plug in a usb external drive, mount it
> > and umount it, then the journal has a kernel Oops and I have submitted
> > a bug report, that includes the journal output, at
> > https://bugzilla.kernel.org/show_bug.cgi?id=217174
> > 
> > As soon as the usb drive is unmounted, the kernel Oops occurs, and the
> > machine hangs on shutdown and needs a hard reboot.
> > 
> > I have reproduced the same issue on three different machines, and in
> > each case downgrading back to kernel 6.2.2 resolves the issue and it
> > no longer occurs.
> > 
> > This would seem to be a regression in kernel 6.2.3
> > 
> > Mike C
> 
> Thanks for reporting this!  If this is reliably reproducible and is known to be
> a regression between v6.2.2 and v6.2.3, any chance you could bisect it to find
> out the exact commit that introduced the bug?
> 
> For reference I'm also copying the stack trace from bugzilla below:
> 
> BUG: kernel NULL pointer dereference, address: 0000000000000028
>  #PF: supervisor read access in kernel mode
>  #PF: error_code(0x0000) - not-present page
>  PGD 0 P4D 0
>  Oops: 0000 [#1] PREEMPT SMP PTI
>  CPU: 9 PID: 1118 Comm: lvcreate Tainted: G                T  6.2.3-1>
>  Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./Z370 Ex>
>  RIP: 0010:blk_throtl_update_limit_valid+0x1f/0x110

BTW, the block/ commits between v6.2.2 and v6.2.3 were:

	blk-mq: avoid sleep in blk_mq_alloc_request_hctx
	blk-mq: remove stale comment for blk_mq_sched_mark_restart_hctx
	blk-mq: wait on correct sbitmap_queue in blk_mq_mark_tag_wait
	blk-mq: Fix potential io hung for shared sbitmap per tagset
	blk-mq: correct stale comment of .get_budget
	block: sync mixed merged request's failfast with 1st bio's
	block: Fix io statistics for cgroup in throttle path
	block: bio-integrity: Copy flags when bio_integrity_payload is cloned
	block: use proper return value from bio_failfast()
	blk-iocost: fix divide by 0 error in calc_lcoefs()
	blk-cgroup: dropping parent refcount after pd_free_fn() is done
	blk-cgroup: synchronize pd_free_fn() from blkg_free_workfn() and blkcg_deactivate_policy()
	block: don't allow multiple bios for IOCB_NOWAIT issue
	block: clear bio->bi_bdev when putting a bio back in the cache
	block: be a bit more careful in checking for NULL bdev while polling

Without having any in-depth knowledge here, I think "blk-cgroup: synchronize
pd_free_fn() from blkg_free_workfn() and blkcg_deactivate_policy()" looks the
most suspicious here...  I see that AUTOSEL selected it from a 3-patch series
without backporting patch 2, maybe that could be it?  Anyway, just a hunch.

- Eric