lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <180713DB-114C-454B-A91E-063AB0116251@dilger.ca>
Date:	Tue, 22 Feb 2011 09:13:49 -0700
From:	Andreas Dilger <adilger@...ger.ca>
To:	"djwong@...ibm.com" <djwong@...ibm.com>
Cc:	Jens Axboe <axboe@...nel.dk>,
	linux-kernel <linux-kernel@...r.kernel.org>,
	"linux-fsdevel@...r.kernel.org" <linux-fsdevel@...r.kernel.org>,
	Mingming Cao <mcao@...ibm.com>,
	linux-scsi <linux-scsi@...r.kernel.org>
Subject: Re: [RFC] block integrity: Fix write after checksum calculation problem

On 2011-02-21, at 19:00, "Darrick J. Wong" <djwong@...ibm.com> wrote:
> Last summer there was a long thread entitled "Wrong DIF guard tag on ext2
> write" (http://marc.info/?l=linux-scsi&m=127530531808556&w=2) that started a
> discussion about how to deal with the situation where one program tells the
> kernel to write a block to disk, the kernel computes the checksum of that data,
> and then a second program begins writing to that same block before the disk HBA
> can DMA the memory block, thereby causing the disk to complain about being sent
> invalid checksums.
> 
> I was able to write a
> trivial program to trigger the write problem, I'm pretty sure that this has not
> been fixed upstream.  (FYI, using O_DIRECT still seems fine.)

Can you please attach your reproducer? IIRC it needed mmap() to hit this problem?  Did you measure CPU usage during your testing?

> Below is a simple if naive solution: (ab)use the bounce buffering code to copy
> the memory page just prior to calculating the checksum, and send the copy and
> the checksum to the disk controller.  With this patch applied, the invalid
> guard tag messages go away.  An optimization would be to perform the copy only
> when memory contents change, but I wanted to ask peoples' opinions before
> continuing.  I don't imagine bounce buffering is particularly speedy, though I
> haven't noticed any corruption errors or weirdness yet.

I don't like adding a data copy in the IO path at all. We are just looking to enable T10 DIF for Lustre and this would definitely hurt performance significantly, even though it isn't needed there at all (since the server side has proper locking of the pages to prevent multiple writers to the same page).

> Anyway, I'm mostly wondering: what do people think of this as a starting point
> to fixing the DIF checksum problem?

I'd definitely prefer that the filesystem be in charge of deciding whether this is needed or not. If the use of the data copy can be constrained to only the minimum required cases (e.g. if fs checks for rewrite on page that is marked as Writeback and either copies or blocks until writeback is complete, as a mount option) that would be better. At that point we can compare on different hardware whether copying or blocking should be the default. 

Cheers, Andreas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ