lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <170fa0d21003110801p3ce6c5e9xe77b6f9f833fe19d@mail.gmail.com>
Date:	Thu, 11 Mar 2010 11:01:41 -0500
From:	Mike Snitzer <snitzer@...hat.com>
To:	Nikanth Karthikesan <knikanth@...e.de>
Cc:	Theodore Tso <tytso@....edu>,
	Damian Lukowski <damian@....rwth-aachen.de>,
	"linux-ide@...r.kernel.org" <linux-ide@...r.kernel.org>,
	Jeff Garzik <jeff@...zik.org>, Matthew Wilcox <matthew@....cx>,
	"Martin K. Petersen" <martin.petersen@...cle.com>,
	James Bottomley <James.Bottomley@...e.de>,
	Tejun Heo <tj@...nel.org>, lkml <linux-kernel@...r.kernel.org>,
	Daniel Taylor <Daniel.Taylor@....com>,
	Mark Lord <kernel@...savvy.com>,
	"H. Peter Anvin" <hpa@...or.com>, hirofumi@...l.parknet.co.jp,
	Andrew Morton <akpm@...ux-foundation.org>,
	Alan Cox <alan@...rguk.ukuu.org.uk>, irtiger@...il.com,
	aschnell@...e.de, jdelvare@...e.de
Subject: Re: ATA 4 KiB sector issues.

On Thu, Mar 11, 2010 at 10:00 AM, Nikanth Karthikesan <knikanth@...e.de> wrote:
> On Thursday 11 March 2010 19:58:11 Theodore Tso wrote:
>> On Mar 11, 2010, at 8:57 AM, Nikanth Karthikesan wrote:
>> > I guess, what he meant was, to keep filesystem blocks aligned, even if
>> > the partition is not. Say if the partition is mis-aligned by 512-bytes,
>> > let the filesystem waste 4k-512bytes and keep it's blocks aligned. But it
>> > might be a case of over-engineering, possibly requiring disk format
>> > change.
>>
>> Ah, yes, I agree with you; that's probably what he meant.
>>
>> Sure, that's theoretically possible, but it would mean changing every
>>  single filesystem, and it would require a file system format change --- or
>>  at least a file system format extension.
>>
>> It would seem to be way easier to simply fix the partitioning tools to do
>>  the right thing, though.
>>
>
> Yes. May be, just a simple but transparent device-mapper like mapping on top
> of the mis-aligned partition, to do the alignment. Then the file-system code
> need not change much.
>
> But Linux already has device-mapper and Linux will not be affected with mis-
> aligned partitions, when we use LVM.

Well, device-mapper and LVM needed to be updated to make them "just
work" but yes that work has been done.

> But the actual problem here is that partitioning tools might create partitions
> that wont allow other operating-systems to boot. So it might be enough, if the
> partitioning tools just create partitions with (mis-)alignment requirement for
> Windows.

I'm not following...

Anyway, 4K drives that are 512b logical and 4K physical may or may not
also have "DOS partition compensation" that use LBA -1 as the first
naturally (4K) aligned start.  This means that the partition tools
need to shift the start of the first primary partition to be offset by
3584 bytes (7 512b sectors) for use with Linux.  But for windows,
AFAIK windows XP and windows 7 create all partitions aligned on 1MB
boundaries.  Linux's parted and fdisk create 1MB aligned partitions
now too.

So the only outlier is older versions of windows (< XP) and Linux (old
fdisk and parted, etc also use DOS partitioning) that don't use
naturally aligned (e.g. 1MB) partition boundaries.  In those versions
of Windows and LInux there are ways to change the default start of
sector 63.   That said, there is an opportunity to improve
documentation for how to workaround DOS partitioning on these
operating systems.

One other piece worth mentioning on this "IO Toplogy" support in the
entire Linux I/O Stack is the virt layers.  hch has already extended
the virt-io protocol and qemu is in the finishing stages of being
updated to properly consume the "IO Topology" information.  So we
really don't have any gaps in the Linux I/O stack.

mkp in particular, Jens, James, myself, and others implemented and
refined the SCSI and block changes.  kzak, jim meyering, hans de
goede, hch, eric sandeen, bob peterson, myself and others updated all
other I/O stack layers ranging from DM to LVM, libblkid, fdisk, parted
to anaconda to mkfs.ext[234], mkfs.xfs, mkfs.gfs2 to virt-io and qemu.
 FYI, all of these advances will be in Fedora 13 (quite a few are
already in Fedora 12).

There are obviously other Linux systems and userland tools (likely
Xen, other mkfs.* and more) that should be updated.  Hopefully
maintainers and/or contributors of these projects will follow-up to
address those that need updating.

Again please see:
http://oss.oracle.com/~mkp/docs/linux-advanced-storage.pdf
http://people.redhat.com/msnitzer/docs/io-limits.txt
Some omissions include: Linux MD, which has been updated as mkp
pointed out, and I neglected to talk about virt-io and qemu (but like
I said they have been updated too).

Hopefully we're all closer to being on the same page now.

Mike
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ