lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <17948.8871.997123.363344@notabene.brown>
Date:	Wed, 11 Apr 2007 09:49:59 +1000
From:	Neil Brown <neilb@...e.de>
To:	syrius.ml@...log.org
Cc:	linux-kernel@...r.kernel.org, Alasdair G Kergon <agk@...hat.com>,
	Andrew Morton <akpm@...ux-foundation.org>
Subject: Re: [dm-devel] bio too big device md1 (16 > 8)


This is difficult.

Summary of problem is:
  Filesystem on LVM on md/raid1
  Add an md/linear to the md/raid1 and fs dies with 'bio too big'.

Where to begin....

The fs level builds bios to send down to the device, and it queries
the device to find out how big the bio can be.
There are two ways it can query the device
  1/ it can look at the max_sectors field in the queue structure.
  2/ it can call the merge_bvec_fn function in the queue structure.

It actually does both.  max_sectors sets an over-all maximum, and
merge_bvec_fn - if defined - can ACK or NAK each page being added.

It is currently very awkward for a stacked device to respond to a
merge_bvec_fn by calling the merge_bvec_fn of the underlying devices.
So they don't.
I'm not sure what dm does, but md simply sets max_sectors to 1 page if
there is a merge_bvec_fn in an underlying device.

So in this particular case, as md/linear defines a merge_bvec_fn, as
soon as the md/linear array is added to the md/raid1 array, the
max_sectors of the md/raid1 is set to 1 page.

This works ok when a filesystem is on an md/raid1 as the filesystem
check max_sectors every time before building the bio.

However in your case, dm (aka LVM) is in there too.

dm doesn't know that md/raid1 has just changed max_sectors and there
is no convenient way for it to find out.  So when the filesystem tries
to get the max_sectors for the dm device, it gets the value that dm set
up when it was created, which was somewhat larger than one page.
When the request gets down to the raid1 layer, it caused a problem.

You can probably make your array work by adding the md/linear array to
the md/raid1 *before* enabling the LVM device on top of it.  However
that isn't really a good long-term solution.

We really need better ways for stacked devices to communicate.

Alasdair Kergon (dm developer) has some patches to make it practical
to recursively call merge_bvec_fn so they might be part of the
solution, but more discussion is needed before that becomes a reality.

Sorry I cannot be more helpful at this stage.

NeilBrown

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ