lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 19 Aug 2021 11:03:42 +0200
From:   Sven Schnelle <svens@...ux.ibm.com>
To:     Christoph Hellwig <hch@....de>
Cc:     Hillf Danton <hdanton@...a.com>,
        syzbot <syzbot+aa0801b6b32dca9dda82@...kaller.appspotmail.com>,
        axboe@...nel.dk, linux-block@...r.kernel.org,
        linux-kernel@...r.kernel.org, syzkaller-bugs@...glegroups.com
Subject: Re: [syzbot] general protection fault in wb_timer_fn

Christoph Hellwig <hch@....de> writes:

> On Mon, Aug 16, 2021 at 05:10:41PM +0800, Hillf Danton wrote:
>> Remove and free all qos callbacks added, with cb->timer deleted in
>> blk_stat_remove_callback().
>> 
>> only for thoughts.
>> 
>> +++ x/block/blk-sysfs.c
>> @@ -800,9 +800,7 @@ static void blk_release_queue(struct kob
>>  
>>  	might_sleep();
>>  
>> -	if (test_bit(QUEUE_FLAG_POLL_STATS, &q->queue_flags))
>> -		blk_stat_remove_callback(q, q->poll_cb);
>> -	blk_stat_free_callback(q->poll_cb);
>> +	rq_qos_exit(q);
>
> rq_qos_exit is already called in blk_cleanup_queue, and the blk-mq
> pollig doesn't even use the qos framework.  So I'm not sure what this
> is supposed to help.

I'm seeing a similar crash in our CI:

[  464.072042] nbd0: detected capacity change from 0 to 2097152
[  464.092297]  nbd0: p1
[  464.244242] EXT4-fs (nbd0p1): mounted filesystem with ordered data mode. Opts: (null). Quota mode: none.
[  468.266306] block nbd0: NBD_DISCONNECT
[  468.266318] block nbd0: Disconnected due to user request.
[  468.266320] block nbd0: shutting down sockets
[  468.291814] Unable to handle kernel pointer dereference in virtual kernel address space
[  468.291817] Failing address: 000002aa264a7000 TEID: 000002aa264a7803
[  468.291819] Fault in home space mode while using kernel ASCE.
[  468.291822] AS:0000000159c84007 R3:0000000000000024 
[  468.291843] Oops: 003b ilc:3 [#1] SMP 
[  468.291846] Modules linked in: nbd(E-) xt_CHECKSUM(E) xt_MASQUERADE(E) xt_conntrack(E) ipt_REJECT(E) xt_tcpudp(E) nft_compat(E) nf_nat_tftp(E) nft_objref(E) nf_conntrack_tftp(E) nft_counter(E) nft_fib_inet(E) nft_fib_ipv4(E) nft_fib_ipv6(E) nft_fib(E) nft_reject_inet(E) nf_reject_ipv4(E) nf_reject_ipv6(E) nft_reject(E) nft_ct(E) dm_service_time(E) nft_chain_nat(E) nf_nat(E) nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) ip_set(E) nf_tables(E) nfnetlink(E) sunrpc(E) zfcp(E) scsi_transport_fc(E) dm_multipath(E) scsi_dh_rdac(E) scsi_dh_emc(E) scsi_dh_alua(E) mlx5_ib(E) ib_uverbs(E) ib_core(E) s390_trng(E) vfio_ccw(E) mdev(E) vfio_iommu_type1(E) vfio(E) zcrypt_cex4(E) eadm_sch(E) sch_fq_codel(E) configfs(E) ip_tables(E) x_tables(E) ghash_s390(E) prng(E) aes_s390(E) des_s390(E) libdes(E) sha3_512_s390(E) sha3_256_s390(E) sha512_s390(E) sha256_s390(E) sha1_s390(E) sha_common(E) mlx5_core(E) nvme(E) nvme_core(E) pkey(E) zcrypt(E) rng_core(E) autofs4(E)
[  468.291891] CPU: 4 PID: 0 Comm: swapper/4 Tainted: G            E     5.14.0-20210819.rc6.git0.f26c3abc432a.300.fc34.s390x+next #1
[  468.291894] Hardware name: IBM 8561 T01 703 (LPAR)
[  468.291895] Krnl PSW : 0704c00180000000 0000000158cfe3b6 (wb_timer_fn+0x56/0x538)
[  468.291902]            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3
[  468.291905] Krnl GPRS: 0000000000000200 000002aa264a7018 0000000189fc3400 0000000000000000
[  468.291907]            fffffffffffc0000 0000000000000000 00000002f767c000 0000000158cc9420
[  468.291909]            0000000000000000 0000000189fc3410 00000001e19622a0 0000000138e9a700
[  468.291911]            0000000080378000 00000002f767c002 0000038000d43ca0 0000038000d43c40
[  468.291937] Krnl Code: 0000000158cfe3a4: e380b0280004        lg      %r8,40(%r11)
                          0000000158cfe3aa: e31010900004        lg      %r1,144(%r1)
                         #0000000158cfe3b0: e31012000004        lg      %r1,512(%r1)
                         >0000000158cfe3b6: e36010980004        lg      %r6,152(%r1)
                          0000000158cfe3bc: ec88005e007c        cgij    %r8,0,8,0000000158cfe478
                          0000000158cfe3c2: e310b0300002        ltg     %r1,48(%r11)
                          0000000158cfe3c8: a7840058            brc     8,0000000158cfe478
                          0000000158cfe3cc: c0e5ffce8822        brasl   %r14,00000001586cf410
[  468.291951] Call Trace:
[  468.291953]  [<0000000158cfe3b6>] wb_timer_fn+0x56/0x538 
[  468.291956]  [<00000001586ca980>] call_timer_fn+0x38/0x178 
[  468.291960]  [<00000001586cad58>] __run_timers.part.0+0x298/0x358 
[  468.291962]  [<00000001586cae62>] run_timer_softirq+0x4a/0x88 
[  468.291964]  [<0000000159149236>] __do_softirq+0x146/0x3c8 
[  468.291967]  [<000000015862cbaa>] irq_exit+0xf2/0x120 
[  468.291970]  [<000000015913a334>] do_ext_irq+0xd4/0x160 
[  468.291972]  [<000000015914769c>] ext_int_handler+0xdc/0x110 
[  468.291974]  [<0000000159147826>] psw_idle_exit+0x0/0xa 
[  468.291976] ([<00000001585dbfe8>] arch_cpu_idle+0x40/0xd0)
[  468.291978]  [<000000015914718a>] default_idle_call+0x42/0x108 
[  468.291980]  [<000000015866ab6a>] do_idle+0xd2/0x160 
[  468.291983]  [<000000015866adb6>] cpu_startup_entry+0x36/0x40 
[  468.291985]  [<00000001585ef74e>] smp_start_secondary+0x86/0x90 
[  468.291987] Last Breaking-Event-Address:
[  468.291989]  [<0000038000d43d30>] 0x38000d43d30
[  468.291992] Kernel panic - not syncing: Fatal exception in interrupt

The crash is likely triggered by nbd. wb_timer_fn+0x56 is block/blk-wbt.c: 237
like in the syzbot reported crash. That line was just recently touched,
so i wonder whether that's related?

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ