linux-kernel - Re: [PATCH 2/2] exec: increase BINPRM_BUF

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20190219162643.GA15202@roeck-us.net>
Date:   Tue, 19 Feb 2019 08:26:43 -0800
From:   Guenter Roeck <linux@...ck-us.net>
To:     Oleg Nesterov <oleg@...hat.com>
Cc:     Andrew Morton <akpm@...ux-foundation.org>,
        Ben Woodard <woodard@...hat.com>,
        "Eric W. Biederman" <ebiederm@...ssion.com>,
        Kees Cook <keescook@...omium.org>,
        Michal Hocko <mhocko@...e.com>, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 2/2] exec: increase BINPRM_BUF_SIZE to 256

On Tue, Feb 19, 2019 at 01:37:57PM +0100, Oleg Nesterov wrote:
> On 02/18, Guenter Roeck wrote:
> >
> > Unfortunately, this patch causes one of my qemu emulations to crash.
> > The crash is not always seen, but at least with every other boot attempt.
> 
> Hmm. I can't imagine how this change can cause the null-ptr-deref in
> blk_mq_run_hw_queue().
> 
Me not either.

> > Reverting the patch fixes the problem. Crash log and bisect results
> > are attached below.
> 
> Do you mean that you applied the "revert" patch on top of linux-next ?
> 
I reverted to patch on top of linux-next (next-20190218, more specifically).
The problem was gone. I then reverted the revert and the probllem was back.

> Or did you rely on git-bisect ?
> 
Sorry, I don't understand the question. git bisect, unless I am missing
something, doesn't revert any patches.

> > [   10.681671] BUG: Kernel NULL pointer dereference at 0x00000040
> > [   10.681826] Faulting instruction address: 0xc0431480
> > [   10.682072] Oops: Kernel access of bad area, sig: 11 [#1]
> > [   10.682251] BE PAGE_SIZE=4K PREEMPT Xilinx Virtex440
> > [   10.682387] Modules linked in:
> > [   10.682528] CPU: 0 PID: 1 Comm: swapper Tainted: G        W         5.0.0-rc6-next-20190218+ #2
> > [   10.682733] NIP:  c0431480 LR: c043147c CTR: c0422ad8
> > [   10.682863] REGS: cf82fbe0 TRAP: 0300   Tainted: G        W          (5.0.0-rc6-next-20190218+)
> > [   10.683065] MSR:  00029000 <CE,EE,ME>  CR: 22000222  XER: 00000000
> > [   10.683236] DEAR: 00000040 ESR: 00000000 
> > [   10.683236] GPR00: c043147c cf82fc90 cf82ccc0 00000000 00000000 00000000 00000002 00000000 
> > [   10.683236] GPR08: 00000000 00000000 c04310bc 00000000 22000222 00000000 c0002c54 00000000 
> > [   10.683236] GPR16: 00000000 00000001 c09aa39c c09021b0 c09021dc 00000007 c0a68c08 00000000 
> > [   10.683236] GPR24: 00000001 ced6d400 ced6dcf0 c0815d9c 00000000 00000000 00000000 cedf0800 
> > [   10.684331] NIP [c0431480] blk_mq_run_hw_queue+0x28/0x114
> > [   10.684473] LR [c043147c] blk_mq_run_hw_queue+0x24/0x114
> > [   10.684602] Call Trace:
> > [   10.684671] [cf82fc90] [c043147c] blk_mq_run_hw_queue+0x24/0x114 (unreliable)
> > [   10.684854] [cf82fcc0] [c04315bc] blk_mq_run_hw_queues+0x50/0x7c
> > [   10.685002] [cf82fce0] [c0422b24] blk_set_queue_dying+0x30/0x68
> > [   10.685154] [cf82fcf0] [c0423ec0] blk_cleanup_queue+0x34/0x14c
> > [   10.685306] [cf82fd10] [c054d73c] ace_probe+0x3dc/0x508
> > [   10.685445] [cf82fd50] [c052d740] platform_drv_probe+0x4c/0xb8
> > [   10.685592] [cf82fd70] [c052abb0] really_probe+0x20c/0x32c
> > [   10.685728] [cf82fda0] [c052ae58] driver_probe_device+0x68/0x464
> > [   10.685877] [cf82fdc0] [c052b500] device_driver_attach+0xb4/0xe4
> > [   10.686024] [cf82fde0] [c052b5dc] __driver_attach+0xac/0xfc
> > [   10.686161] [cf82fe00] [c0528428] bus_for_each_dev+0x80/0xc0
> > [   10.686314] [cf82fe30] [c0529b3c] bus_add_driver+0x144/0x234
> > [   10.686457] [cf82fe50] [c052c46c] driver_register+0x88/0x15c
> > [   10.686610] [cf82fe60] [c09de288] ace_init+0x4c/0xac
> > [   10.686742] [cf82fe80] [c0002730] do_one_initcall+0xac/0x330
> > [   10.686888] [cf82fee0] [c09aafd0] kernel_init_freeable+0x34c/0x478
> > [   10.687043] [cf82ff30] [c0002c6c] kernel_init+0x18/0x114
> 
> looks unrelated...
> 

Indeed...

The underlying problem is in the error handling code of ace_setup(),
which calls put_disk() followed by blk_cleanup_queue(). put_disk()
calls disk_release(), which calls blk_put_queue(), which in turn
results in a call to blk_mq_hw_sysfs_release().

Added debug code, with your patch reverted, shows:

 ######### blk_mq_hw_sysfs_release hctx=cee4a800
 ...
 ######### blk_mq_run_hw_queue hctx=cee4a800

blk_mq_hw_sysfs_release() calls kfree(htcx), so accessing it later is most
definitely not a good idea.

No idea why this only causes problems with your patch applied.

I'll send a patch to fix the underlying problem.

Thanks,
Guenter