linux-kernel - Re: [RFC PATCH v2 2/2] blk-mq: Lockout tagset iter when freeing rqs

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <df44b73d-6c42-87ee-3c25-b95a44712e05@acm.org>
Date:   Fri, 18 Dec 2020 14:43:41 -0800
From:   Bart Van Assche <bvanassche@....org>
To:     John Garry <john.garry@...wei.com>, axboe@...nel.dk,
        ming.lei@...hat.com
Cc:     linux-block@...r.kernel.org, linux-kernel@...r.kernel.org,
        hch@....de, hare@...e.de, ppvk@...eaurora.org,
        kashyap.desai@...adcom.com, linuxarm@...wei.com
Subject: Re: [RFC PATCH v2 2/2] blk-mq: Lockout tagset iter when freeing rqs

On 12/17/20 3:07 AM, John Garry wrote:
> References to old IO sched requests are currently cleared from the
> tagset when freeing those requests; switching elevator or changing
> request queue depth is such a scenario in which this occurs.
> 
> However, this does not stop the potentially racy behaviour of freeing
> and clearing a request reference between a tagset iterator getting a
> reference to a request and actually dereferencing that request.
> 
> Such a use-after-free can be triggered, as follows:
> 
> ==================================================================
> BUG: KASAN: use-after-free in bt_iter+0xa0/0x120
> Read of size 8 at addr ffff00108d589300 by task fio/3052
> 
> CPU: 32 PID: 3052 Comm: fio Tainted: GW
> 5.10.0-rc4-64839-g2dcf1ee5054f #693
> Hardware name: Huawei Taishan 2280 /D05, BIOS Hisilicon
> D05 IT21 Nemo 2.0 RC0 04/18/2018
> Call trace:
> dump_backtrace+0x0/0x2d0
> show_stack+0x18/0x68
> dump_stack+0x100/0x16c
> print_address_description.constprop.12+0x6c/0x4e8
> kasan_report+0x130/0x200
> __asan_load8+0x9c/0xd8
> bt_iter+0xa0/0x120
> blk_mq_queue_tag_busy_iter+0x2d8/0x540
> blk_mq_in_flight+0x80/0xb8
> part_stat_show+0xd8/0x238
> dev_attr_show+0x44/0x90
> sysfs_kf_seq_show+0x128/0x1c8
> kernfs_seq_show+0xa0/0xb8
> seq_read_iter+0x1ec/0x6a0
> seq_read+0x1d0/0x250
> kernfs_fop_read+0x70/0x330
> vfs_read+0xe4/0x250
> ksys_read+0xc8/0x178
> __arm64_sys_read+0x44/0x58
> el0_svc_common.constprop.2+0xc4/0x1e8
> do_el0_svc+0x90/0xa0
> el0_sync_handler+0x128/0x178
> el0_sync+0x158/0x180
> 
> This is found experimentally by running fio on 2x SCSI disks - 1x disk
> holds the root partition. Userspace is constantly triggering the tagset
> iter from reading the root (gen)disk partition info. And so if the IO
> sched is constantly changed on the other disk, eventually the UAF occurs,
> as described above.

Hi John,

Something is not clear to me. The above call stack includes
blk_mq_queue_tag_busy_iter(). That function starts with
percpu_ref_tryget(&q->q_usage_counter) and ends with calling
percpu_ref_put(&q->q_usage_counter). So it will only iterate over a tag set
if q->q_usage_counter is live. However, both blk_mq_update_nr_requests()
and elevator_switch() start with freezing the request queue.
blk_mq_freeze_queue() starts with killing q->q_usage_counter and waits
until that counter has dropped to zero. In other words,
blk_mq_queue_tag_busy_iter() should not iterate over a tag set while a tag
set is being freed or reallocated. Does this mean that we do not yet have
a full explanation about why the above call stack can be triggered?

Thanks,

Bart.