lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20200918183859.GA4030639@dhcp-10-100-145-180.wdl.wdc.com>
Date:   Fri, 18 Sep 2020 11:38:59 -0700
From:   Keith Busch <kbusch@...nel.org>
To:     Tong Zhang <ztong0001@...il.com>
Cc:     Jens Axboe <axboe@...com>, Christoph Hellwig <hch@....de>,
        Sagi Grimberg <sagi@...mberg.me>,
        linux-nvme@...ts.infradead.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] nvme: fix NULL pointer dereference

On Thu, Sep 17, 2020 at 11:32:12PM -0400, Tong Zhang wrote:
> Please correct me if I am wrong.
> After a bit more digging I found out that it is indeed command_id got
> corrupted is causing this problem. Although the tag and command_id
> range is checked like you said, the elements in rqs cannot be
> guaranteed to be not NULL. thus although the range check is passed,
> blk_mq_tag_to_rq() can still return NULL. 

I think your describing a sequence problem in initialization. We
shouldn't have interrupts wired up to uninitialized tagsets.

A more appropriate sequence would setup request_irq() after the tagset
is ready. It makes handling a failed irq setup a bit weird for io
queues, though.

> It is clear that the current sanitization is not enough and there's
> more implication about this -- when all rqs got populated, a corrupted
> command_id may silently corrupt other data not belonging to the
> current command.

The block layer doesn't do anything with requests that haven't been
started, so if your controller completes non-existent commands, then
nothing particular will happen with the rqs.

If the request had been started and the controller provides a corrupted
completion, then the fault lies entirely with the controller and you
should raise that issue with your vendor. There's no way the driver can
distinguish a genuine completion from a corrupted one.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ