lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20170518013454.GA13864@ming.t460p>
Date:   Thu, 18 May 2017 09:34:59 +0800
From:   Ming Lei <ming.lei@...hat.com>
To:     Jon Derrick <jonathan.derrick@...el.com>
Cc:     "linux-block@...r.kernel.org" <linux-block@...r.kernel.org>,
        linux-ext4@...r.kernel.org, linux-nvme@...ts.infradead.org,
        Jens Axboe <axboe@...nel.dk>, Christoph Hellwig <hch@....de>,
        sagi@...mberg.me, Keith Busch <keith.busch@...el.com>
Subject: Re: BUG: hot removal during writes on ext4 formatted nvme device

On Mon, May 22, 2017 at 06:38:12PM -0600, Jon Derrick wrote:
> Hello,
> 
> I've encountered a BUG that I've experienced during hot removal on an
> ext4-formatted nvme device undergoing writes. I have been able to verify
> that 4.5, 4.6, 4.10.12, 4.11, and 4.12-rc1 show similar issues (the v4.6
> trace below shows issues with block that have already been fixed). I'm
> using VMD hardware for my hotplug controller so 4.5 is as far back as I
> can go (maybe someone else can verify on non-VMD hardware?).
> 
> To reproduce:
> 1) mkfs.ext4 <nvme>
> 2) mount <nvme> <mnt>
> 3) dd if=/dev/zero of=<mnt>/file bs=1M count=10000
> 4) Hot remove the drive while above is writing
> 
> From what I can tell, the ext4 sb is trying to be committed in the error
> path. There is supposed to be a check if the device is still alive via
> block_device_ejected(), but my guess is that there is a race between the
> removal/deletion in genhd and this check. I would appreciate any help
> resolving this.
>

Recently I played fio over NVMe partition direclty with hot-remove too, and
found that d3cfb2a0ac0b8487d28(block: block new I/O just after queue is set
as dying) is helpful for this kind of issue.

Also the following patch fixes one issue in remove path.

	http://marc.info/?l=linux-block&m=149498450028434&w=2

So could you test v4.12-rc1(d3cfb2a0 is merged) with the above patch?

With these patches in, block layer & NVMe should make sure that all I/O can
be finished with -EIO before del_gendisk() returns once after hot-remove
is triggered, then the failure handling of fs might need further investigation.


Thanks,
Ming

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ