linux-kernel - Re: [PATCH v3 14/16] Gut bio_add

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20120528213839.GB18537@dhcp-172-17-108-109.mtv.corp.google.com>
Date:	Tue, 29 May 2012 06:38:39 +0900
From:	Tejun Heo <tj@...nel.org>
To:	Mikulas Patocka <mpatocka@...hat.com>
Cc:	Alasdair G Kergon <agk@...hat.com>,
	Kent Overstreet <koverstreet@...gle.com>,
	Mike Snitzer <snitzer@...hat.com>,
	linux-kernel@...r.kernel.org, linux-bcache@...r.kernel.org,
	dm-devel@...hat.com, linux-fsdevel@...r.kernel.org,
	axboe@...nel.dk, yehuda@...newdream.net, vgoyal@...hat.com,
	bharrosh@...asas.com, sage@...dream.net, drbd-dev@...ts.linbit.com,
	Dave Chinner <dchinner@...hat.com>, tytso@...gle.com
Subject: Re: [PATCH v3 14/16] Gut bio_add_page()

Hello,

On Mon, May 28, 2012 at 05:27:33PM -0400, Mikulas Patocka wrote:
> > They're split and made in-flight together.
> 
> I was talking about old ATA disk (without command queueing). So the 
> requests are not sent together. USB 2 may be a similar case, it has 
> limited transfer size and it doesn't have command queueing too.

I meant in the block layer.  For consecutive commands, queueing
doesn't really matter.

> > Disk will most likely seek to the sector read all of them into buffer
> > at once and then serve the two consecutive commands back-to-back
> > without much inter-command delay.
> 
> Without command queueing, the disk will serve the first request, then 
> receive the second request, and then serve the second request (hopefully 
> the data would be already prefetched after the first request).
> 
> The point is that while the disk is processing the second request, the CPU 
> can already process data from the first request.

Those are transfer latencies - multiple orders of magnitude shorter
than IO latencies.  It would be surprising if they actually are
noticeable with any kind of disk bound workload.

> > Isn't it more like you shouldn't be sending read requested by user and
> > read ahead in the same bio?
> 
> If the user calls read with 512 bytes, you would send bio for just one 
> sector. That's too small and you'd get worse performance because of higher 
> command overhead. You need to send larger bios.

All modern FSes are page granular, so the granularity would be
per-page.  Also, RAHEAD is treated differently in terms of
error-handling.  Do filesystems implement their own rahead
(independent from the common logic in vfs layer) on their own?

> AHCI can interrupt after partial transfer (so for example you can send a 
> command to read 1M, but signal interrupt after the first 4k was 
> transferred), but no one really wrote code that could use this feature. It 
> is questionable if this would improve performance because it would double 
> interrupt load.

The feature is pointless for disks anyway.  Think about the scales of
latencies of different phases of command processing.  The difference
is multiple orders of magnitude.

> > If exposing segmenting limit upwards is a must (I'm kinda skeptical),
> > let's have proper hints (or dynamic hinting interface) instead.
> 
> With this patchset, you don't have to expose all the limits. You can 
> expose just a few most useful limits to avoid bio split in the cases 
> described above.

Yeah, if that actually helps, sure.  From what I read, dm is already
(ab)using merge_bvec_fn() like that anyway.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/