lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:	Thu, 16 Jul 2009 13:59:06 -0700 (PDT)
From:	david@...g.hm
To:	James Bottomley <James.Bottomley@...senPartnership.com>
cc:	James Smart <James.Smart@...lex.Com>,
	Boaz Harrosh <bharrosh@...asas.com>,
	linux-kernel <linux-kernel@...r.kernel.org>,
	"linux-scsi@...r.kernel.org" <linux-scsi@...r.kernel.org>
Subject: Re: deterministic scsi order with async scan

On Thu, 16 Jul 2009, James Bottomley wrote:

> On Thu, 2009-07-16 at 12:48 -0700, david@...g.hm wrote:
>> On Thu, 16 Jul 2009, James Bottomley wrote:
>>
>>> On Thu, 2009-07-16 at 11:43 -0700, david@...g.hm wrote:
>>>> On Thu, 16 Jul 2009, James Smart wrote:
>>>>
>>>>> david@...g.hm wrote:
>>>>>> On Thu, 16 Jul 2009, Boaz Harrosh wrote:
>>>>>>
>>>>>>
>>>>>>> It is highly discouraged to setup any kind of system that depends
>>>>>>> on device-names for block-devices. mounts have the mount by-label
>>>>>>> or mount by-uuid. Any other subsystem should go by /dev/disk/by-id/*
>>>>>>> slinks to find a persistent raw block-device. the id is generated
>>>>>>> from characteristics inside the disk itself so it will be the same
>>>>>>> no matter what host connection or bus it is connected too (almost).
>>>>>>>
>>>>>>> This is because even if the boot order is consistent, the device-name
>>>>>>> is so volatile in the life-span of a system. Did I boot with a removable
>>>>>>> USB inserted. that camera or printer was on or off, disk was connected
>>>>>>> to the other port. Any such change will break things and give you a very
>>>>>>> poor user experience.
>>>>>>>
>>>>>>
>>>>>> for a laptop you areprobably correct, but for a server or embedded system
>>>>>> that doesn't have it's hardware changing all the time you are not correct.
>>>>>>
>>>>>> especially on a system with lots of drives, why should I have to create an
>>>>>> initrd that goes and searches dozens or hundreds of drives to find out
>>>>>> which one to boot from?
>>>>>>
>>>>> Boaz is correct. Many enterprise SCSI subsystems (FC, SAS) do not have hard
>>>>> transport addresses for each device like Parallel SCSI used to.  Thus, any
>>>>> difference in order of appearance of the devices (power-up ordering, FC ALPA
>>>>> assignment based on who's loop master, order that switch reports them, is an
>>>>> array in a failover mode with 1 controller non-existent), or if LUN
>>>>> configuration on an array changes, or as a drive may fail (especially with
>>>>> hundreds), there's no guarantee you will see the same thing in the same order
>>>>> w/o name binding. Same thing is true if one of those adapters fails or is
>>>>> swapped out.
>>>>
>>>> yes, but does your system change the order of your internal direct
>>>> attached drives with your FC/SAN drives?
>>>
>>> Certainly, it can.  The way BIOS booting gets around this is either to
>>> use some type of physical indicator (like phy number for SAS) to find C:
>>> or to use a persistent ID mapping scheme (which is pretty much
>>> equivalent to our /dev/disk/by-id/ udev one).
>>
>> so if I don't use udev but do want the async detection my only option to
>> have it boot from card 1 instead of card 2 is to just keep rebooting the
>> machine until it guesses right?
>
> Well, for multiple cards that's effectively true with or without async
> scanning ... the kernel doesn't know how you've enabled the bios scans
> on the cards, so it takes first bus discovery order, so your boot drive
> can always end up as /dev/sdb etc.

that's what I am attempting to do, but it's not stable.

I fully agree that if you move cards or change the bios scan order things 
will change. I'm not talking about a case like that. I'm talking about a 
case where the hardware and BIOS do not change.

> In theory, async probing shouldn't be racy, but we've likely got a
> problem between async SCSI scanning and async sd driver attachment, so
> when those are sorted out it should be no worse with than without.

so is there something that I can do to debug this case where it is racy? 
I have a repeatable test case right now. if there is something I can do to 
test this to help track down the race I will do so, otherwise I will need 
to disable the async scanning as being unreliable.

David Lang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ