linux-kernel - Re: Failover root devices

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <55FC20E5.1080801@gmail.com>
Date:	Fri, 18 Sep 2015 10:34:13 -0400
From:	Austin S Hemmelgarn <ahferroin7@...il.com>
To:	Drew DeVault <sir@...wn.com>, linux-kernel@...r.kernel.org
Subject: Re: Failover root devices

On 2015-09-17 13:30, Drew DeVault wrote:
>> That said, using the term failover for this is probably not the best
>> idea, many people associate it almost exclusively with online failover
>> and high-availability setups, and trying to do something like that with
>> the root file system is just asking for trouble (I'll be happy to go
>> into specifics as to why if someone asks).
>
> Do you have a suggestion for another name for this feature? Maybe we can
> just call it "multiple root devices". The issue comes with the
> associated command line options, like "rootfailoverdelay". Perhaps it
> could be called "rootcycledelay". "rootdelay" is the obvious one, but
> it's taken for another feature.
Possibly 'multirootdelay'?

However, is there any case you can think of for wanting the values to be 
different between rootdelay and the per-device scan delay other than 
having the per-device scan delay be 0 and rootdelay be >0?

The way I'd probably write it would be:
1. Wait rootdelay seconds
2. Check for 1st device
3. If first device is not there, check for 2nd
4. If second device is not there, check next one
5. Repeat 4 until all devices are checked.
6. If a device wasn't found, check if we were told to loop until one is 
found, and if so, start at 1 again.
And then add an option to tell it to wait 'rootdelay' seconds between 
checking each device.
>
>>> 1. Wait rootdelay seconds
>>> 2. Check 1st device, not present
>>> 3. Recheck 1st device until rootfailoverdelay seconds has passed
>>> 4. Move on to 2nd device, present -> boot
>>>
>>> Or:
>>>
>>> 1. Wait rootdelay seconds
>>> 2. Check 1st device, not present
>>> 3. Recheck 1st device until rootfailoverdelay seconds has passed
>>> 4. Move on to 2nd device, not present
>>> 5. Recheck 2st device until rootfailoverdelay seconds has passed
>>> 6. GOTO 2
>>>
>>> And so on.
>> As for this, I'd say default to the first method, and then provide an
>> option to switch to the second (both have practical uses).
>
> Sorry to cause confusion - these are actually the same method, but
> handling different scenarios. The first is dealing with the first device
> being nonexistent, and the second device existing. The second is dealing
> with both being nonexistent, and cycling between them until one of them
> shows up. After further thought, though, I think the best solution is a
> bit different: a new command line option called "rootmultiwait" or
> similar, which is a maximum amount of time to wait for the user's first
> choice of root device to become available, then testing all devices
> until that time runs out or the first choice becomes available.
I think there's value in being able to tell it to go through each one 
exactly once, and halt like it does now if it can't find the filesystem 
on any of them.  That should probably be the default behavior in fact, 
as it's more similar to what's done now.

Secondarily, I've been thinking more about this, and I think it would be 
wonderful to have such functionality in the nfsroot code as well (and 
for that matter, also in any other built-in networked root filesystem 
support).


Download attachment "smime.p7s" of type "application/pkcs7-signature" (3019 bytes)