lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Sat, 8 Jun 2024 14:34:47 +0800
From: Li Nan <linan666@...weicloud.com>
To: Ming Lei <ming.lei@...hat.com>, Li Nan <linan666@...weicloud.com>
Cc: Changhui Zhong <czhong@...hat.com>, axboe@...nel.dk,
 ZiyangZhang@...ux.alibaba.com, linux-block@...r.kernel.org,
 linux-kernel@...r.kernel.org, yukuai3@...wei.com, yi.zhang@...wei.com,
 houtao1@...wei.com, yangerkun@...wei.com
Subject: Re: [PATCH] ublk_drv: fix NULL pointer dereference in
 ublk_ctrl_start_recovery()



在 2024/6/6 17:52, Ming Lei 写道:
> On Thu, Jun 06, 2024 at 04:05:33PM +0800, Li Nan wrote:
>>
>>
>> 在 2024/6/6 12:48, Changhui Zhong 写道:
>>
>> [...]
>>
>>>>
>>>> Hi Changhui,
>>>>
>>>> The hang is actually expected because recovery fails.
>>>>
>>>> Please pull the latest ublksrv and check if the issue can still be
>>>> reproduced:
>>>>
>>>> https://github.com/ublk-org/ublksrv
>>>>
>>>> BTW, one ublksrv segfault and two test cleanup issues are fixed.
>>>>
>>>> Thanks,
>>>> Ming
>>>>
>>>
>>> Hi,Ming and Nan
>>>
>>> after applying the new patch and pulling the latest ublksrv,
>>> I ran the test for 4 hours and did not observe any task hang.
>>> the test results looks good!
>>>
>>> Thanks,
>>> Changhui
>>>
>>>
>>> .
>>
>> Thanks for you test!
>>
>> However, I got a NULL pointer dereference bug with ublksrv. It is not
> 
> BTW, your patch isn't related with generic/004 which won't touch
> recovery code path.
> 
>> introduced by this patch. It seems io was issued after deleting disk. And
>> it can be reproduced by:
>>
>>    while true; do make test T=generic/004; done
> 
> We didn't see that when running such test with linus tree, and usually
> Changhui run generic test for hours.
> 
>>
>> [ 1524.286485] running generic/004
>> [ 1529.110875] blk_print_req_error: 109 callbacks suppressed
> ...
>> [ 1541.171010] BUG: kernel NULL pointer dereference, address: 0000000000000000
>> [ 1541.171734] #PF: supervisor write access in kernel mode
>> [ 1541.172271] #PF: error_code(0x0002) - not-present page
>> [ 1541.172798] PGD 0 P4D 0
>> [ 1541.173065] Oops: Oops: 0002 [#1] PREEMPT SMP
>> [ 1541.173515] CPU: 0 PID: 43707 Comm: ublk Not tainted
>> 6.9.0-next-20240523-00004-g9bc7e95c7323 #454
>> [ 1541.174417] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
>> 1.16.1-2.fc37 04/01/2014
>> [ 1541.175311] RIP: 0010:io_fallback_tw+0x252/0x300
> 
> This one looks one io_uring issue.
> 
> Care to provide which line of source code points to by 'io_fallback_tw+0x252'?
> 
> gdb> l *(io_fallback_tw+0x252)
> 
(gdb) list * io_fallback_tw+0x252
0xffffffff81d79dc2 is in io_fallback_tw 
(./arch/x86/include/asm/atomic64_64.h:25).
20              __WRITE_ONCE(v->counter, i);
21      }
22
23      static __always_inline void arch_atomic64_add(s64 i, atomic64_t *v)
24      {
25              asm volatile(LOCK_PREFIX "addq %1,%0"
26                           : "=m" (v->counter)
27                           : "er" (i), "m" (v->counter) : "memory");
28      }

The corresponding code is:
io_fallback_tw
   percpu_ref_get(&last_ctx->refs);

I have the vmcore of this issue. If you have any other needs, please let me
know.

-- 
Thanks,
Nan


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ