lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20111121145247.0e37dc36.akpm@linux-foundation.org>
Date:	Mon, 21 Nov 2011 14:52:47 -0800
From:	Andrew Morton <akpm@...ux-foundation.org>
To:	Wu Fengguang <fengguang.wu@...el.com>
Cc:	Linux Memory Management List <linux-mm@...ck.org>,
	<linux-fsdevel@...r.kernel.org>, Li Shaohua <shaohua.li@...el.com>,
	Clemens Ladisch <clemens@...isch.de>,
	Jens Axboe <jens.axboe@...cle.com>,
	Rik van Riel <riel@...hat.com>,
	LKML <linux-kernel@...r.kernel.org>,
	Andi Kleen <andi@...stfloor.org>
Subject: Re: [PATCH 1/8] block: limit default readahead size for small
 devices

On Mon, 21 Nov 2011 17:18:20 +0800
Wu Fengguang <fengguang.wu@...el.com> wrote:

> Linus reports a _really_ small & slow (505kB, 15kB/s) USB device,
> on which blkid runs unpleasantly slow. He manages to optimize the blkid
> reads down to 1kB+16kB, but still kernel read-ahead turns it into 48kB.
> 
>      lseek 0,    read 1024   => readahead 4 pages (start of file)

I'm disturbed that the code did a 4 page (16kbyte?) readahead after an
lseek.  Given the high probability that the next read will occur after
a second lseek, that's a mistake.

Was an lseek to offset 0 special-cased?

>      lseek 1536, read 16384  => readahead 8 pages (page contiguous)
> 
> The readahead heuristics involved here are reasonable ones in general.
> So it's good to fix blkid with fadvise(RANDOM), as Linus already did.
> 
> For the kernel part, Linus suggests:
>   So maybe we could be less aggressive about read-ahead when the size of
>   the device is small? Turning a 16kB read into a 64kB one is a big deal,
>   when it's about 15% of the whole device!
> 
> This looks reasonable: smaller device tend to be slower (USB sticks as
> well as micro/mobile/old hard disks).

Spose so.  Obviously there are other characteristics which should be
considered when choosing a readaahead size, but one of them can be disk
size and that's what this change does.

In a better world, userspace would run a
work-out-what-readahead-size-to-use script each time a distro is
installed and when new storage devices are added/detected.  Userspace
would then remember that readahead size for subsequent bootups.

In the real world, we shovel guaranteed-to-be-wrong guesswork into the
kernel and everyone just uses the results.  Sigh.

> --- linux-next.orig/block/genhd.c	2011-10-31 00:13:51.000000000 +0800
> +++ linux-next/block/genhd.c	2011-11-18 11:27:08.000000000 +0800
> @@ -623,6 +623,26 @@ void add_disk(struct gendisk *disk)
>  	WARN_ON(retval);
>  
>  	disk_add_events(disk);
> +
> +	/*
> +	 * Limit default readahead size for small devices.
> +	 *        disk size    readahead size
> +	 *               1M                8k
> +	 *               4M               16k
> +	 *              16M               32k
> +	 *              64M               64k
> +	 *             256M              128k
> +	 *               1G              256k
> +	 *               4G              512k
> +	 *              16G             1024k
> +	 *              64G             2048k
> +	 *             256G             4096k
> +	 */
> +	if (get_capacity(disk)) {
> +		unsigned long size = get_capacity(disk) >> 9;

get_capacity() returns sector_t.  This expression will overflow with a
2T disk.  I'm not sure if we successfully support 2T disks on 32-bit
machines, but changes like this will guarantee that we don't :)

> +		size = 1UL << (ilog2(size) / 2);

I think there's a rounddown_pow_of_two() hiding in that expression?

> +		bdi->ra_pages = min(bdi->ra_pages, size);

I don't have a clue why that min() is in there.  It needs a comment,
please.

> +	}


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ