linux-kernel - Re: Why aren't partitions limited to fit within the device?

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20061015082921.GC22674@vianova.fi>
Date:	Sun, 15 Oct 2006 11:29:21 +0300
From:	Ville Herva <vherva@...nova.fi>
To:	Neil Brown <neilb@...e.de>
Cc:	linux-kernel@...r.kernel.org, aeb@....nl,
	Jens Axboe <jens.axboe@...cle.com>
Subject: Re: Why aren't partitions limited to fit within the device?

On Fri, Oct 13, 2006 at 09:50:49AM +1000, you [Neil Brown] wrote:
> 
> Hi,
>  I was looking into an issue that someone was having with raid5.
> They made an md/raid5 out of 5 whole devices and by luck the data
> that was written to the first block of the 5th device looked
> slightly like a partition table.  fdisk output below for the curious.
> However some partitions were beyond the end of the device.

That reminds me of an old long-standing mystery I had with a machine that
had a RAID-5 of three whole devices. 

I wonder if there's ever a change the kernel partition detection code could
_write_ on the disk, even when there's really no partition table?

Below is a description of the problem. I'm afraid I only replicated in on
2.2 and 2.4, but it just might still be present in 2.6. Unfortunately, the
actual raid device is now on different disks that have partition table on
them. I'm just asking if this rings any bells, since I really spent a long
time debugging it, and never found a real clue.

------------------------------------------------------------

The raid device is:

   md4 : active raid5 hdc[2] hdb[1] hdg[0] 156367872 blocks level 5, 16k chunk, algorithm 0 [3/3] [UUU]

The kernel is 2.2.x + RAID-0.90 patch. Fs is ext2. After unmounting the
filesystem, I can mount it again without problems. I can also raidstop the
raid device in between and all is still fine:

   -> umount /dev/md4; mount /dev/md4 
      - no corruption
   -> umount /dev/md4; raidstop /dev/md4; raidstart /dev/md4; mount /dev/md4
      - no corruption

But after a reboot, the filesystem is corrupted - few bytes differ in the
beginning of /dev/md4 between 1k and and 5k.

cmp -l md4 afterboot/md4
   1083      1    3
   4641     35    0
   4642    205    0
   4643     10    0
   bytepos after  before
           boot   boot

I found out that the difference (corruption) is usually on three bytes on
/dev/hdg, but sometimes on /dev/hdc, too. (/dev/md4 = hdb+hdc+hdg; hdb&hdc
are on i810, hdg is on hpt370).

First, I did
   umount /dev/md4
   raidstop /dev/md4
   head -c 50k /dev/hdg > /save/hdg
   reboot

To rule out kernel raid autodetect and raid code in general, I
booted 2.2.25 with "single init=/bin/bash raid=noautodetect".

   head -c50k /dev/hdg | cmp -l /save/hdg

Three bytes differed:
   4641    0      35
   4642    0      205
   4643    0      10
   bytepos after  before
           boot   boot

wrote the original stuff back:
   dd if=/save/hdg /dev/hdg
   sync 
   hdparm -W0 /dev/hdg
   sync
   reboot

Booted 2.2.25 with "single init=/bin/bash raid=noautodetect" again.

Did 
   head -c50k /dev/hdg | cmp -l /save/hdg
Three same three bytes differed again.

Wrote the stuff back, synced, did hdparm, and powered off. Still, the the
bytes differed on next boot.

Then I booted 2.4.21 with "single init=/bin/bash raid=noautodetect" (I
happened to have 2.4.21 compiled with suitable drivers at hand). Wrote the
same stuff back with dd, synced, turned ide cache off.

Booted 2.4.21 with "single init=/bin/bash raid=noautodetect" again. Did the
diff; the three bytes differed again.

Note that sometimes few bytes on hdc differed, too. Usually it was just the
three hdg bytes.

So this is not a 2.2 kernel issue. I suspect it might not be a kernel issue
at all. Unless it is a bug in kernel partition detection that is still
present in 2.4.x, perhaps in 2.6.

I tried to turn off the ide write cache with hdparm -W0, so it
shouldn't be a write caching issue. 

If it's a bios issue, it's really a strange one, since it affects
both disks on i810 ide and on hpt370. The disks have no partition table,
though, which _could_ confuse the bios. 

Any ideas? Who could write to those three bytes, and why?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/