lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <0342fbda-9901-4293-afa7-ba6085eb1688@landley.net>
Date: Mon, 15 Sep 2025 11:43:50 -0500
From: Rob Landley <rob@...dley.net>
To: Askar Safin <safinaskar@...omail.com>, linux-fsdevel@...r.kernel.org,
 linux-kernel@...r.kernel.org
Cc: Linus Torvalds <torvalds@...ux-foundation.org>,
 Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
 Christian Brauner <brauner@...nel.org>, Al Viro <viro@...iv.linux.org.uk>,
 Jan Kara <jack@...e.cz>, Christoph Hellwig <hch@....de>,
 Jens Axboe <axboe@...nel.dk>, Andy Shevchenko <andy.shevchenko@...il.com>,
 Aleksa Sarai <cyphar@...har.com>,
 Thomas Weißschuh <thomas.weissschuh@...utronix.de>,
 Julian Stecklina <julian.stecklina@...erus-technology.de>,
 Gao Xiang <hsiangkao@...ux.alibaba.com>, Art Nikpal <email2tema@...il.com>,
 Andrew Morton <akpm@...ux-foundation.org>, Eric Curtin <ecurtin@...hat.com>,
 Alexander Graf <graf@...zon.com>, Lennart Poettering <mzxreary@...inter.de>,
 linux-arch@...r.kernel.org, linux-alpha@...r.kernel.org,
 linux-snps-arc@...ts.infradead.org, linux-arm-kernel@...ts.infradead.org,
 linux-csky@...r.kernel.org, linux-hexagon@...r.kernel.org,
 loongarch@...ts.linux.dev, linux-m68k@...ts.linux-m68k.org,
 linux-mips@...r.kernel.org, linux-openrisc@...r.kernel.org,
 linux-parisc@...r.kernel.org, linuxppc-dev@...ts.ozlabs.org,
 linux-riscv@...ts.infradead.org, linux-s390@...r.kernel.org,
 linux-sh@...r.kernel.org, sparclinux@...r.kernel.org,
 linux-um@...ts.infradead.org, x86@...nel.org, Ingo Molnar
 <mingo@...hat.com>, linux-block@...r.kernel.org, initramfs@...r.kernel.org,
 linux-api@...r.kernel.org, linux-doc@...r.kernel.org,
 linux-efi@...r.kernel.org, linux-ext4@...r.kernel.org,
 "Theodore Y . Ts'o" <tytso@....edu>, linux-acpi@...r.kernel.org,
 Michal Simek <monstr@...str.eu>, devicetree@...r.kernel.org,
 Luis Chamberlain <mcgrof@...nel.org>, Kees Cook <kees@...nel.org>,
 Thorsten Blum <thorsten.blum@...ux.dev>, Heiko Carstens <hca@...ux.ibm.com>,
 patches@...ts.linux.dev
Subject: Re: [PATCH 00/62] initrd: remove classic initrd support

On 9/12/25 17:38, Askar Safin wrote:
> Intro
> ====
> This patchset removes classic initrd (initial RAM disk) support,
> which was deprecated in 2020.

Still useful for embedded systems that can memory map flash, but it's 
getting harder to find embedded developers who consider new kernels an 
improvement over older ones...

> Initramfs still stays, and RAM disk itself (brd) still stays, too.

While you're at it, could you fix static/builtin initramfs so PID 1 has 
a valid stdin/stdout/stderr?

A static initramfs won't create /dev/console if the embedded initramfs 
image doesn't contain it, which a non-root build can't mknod, so the 
kernel plumbing won't see it dev in the directory we point it at unless 
we build with root access. This means the open("/dev/console") fails, so 
init starts with no error reporting and we have to get far enough to 
mount our own devtmpfs or similar and open our own stdout/stderr before 
we can see any error output from init, which is kinda brittle.

I posted various patches to make CONFIG_DEVTMPFS_MOUNT work for initmpfs 
repeatedly since 2017, which also addressed it, but the kernel 
community's been hermetically sealed against outside intrusion for a 
while now...

https://lkml.iu.edu/hypermail/linux/kernel/2005.1/09399.html

https://lkml.iu.edu/2302.2/05597.html

> init/do_mounts* and init/*initramfs* are listed in VFS entry in
> MAINTAINERS, so I think this patchset should go through VFS tree.
> This patchset touchs every subdirectory in arch/, so I tested it
> on 8 (!!!) archs in Qemu (see details below).

Oh hey, somebody using mkroot. Cool. :)

My current "passes basic automated smoketests" list for 6.16 is:

aarch64 armv4l armv5l armv7l i486 i686 m68k mips64 mipsel mips powerpc 
powerpc64le powerpc64 riscv32 riscv64 s390x sh4 x86_64

I'm assuming that's your 8: arm, x86, m68k, mips, ppc, riscv, s390x, 
superh. (The variants are mostly 32/64 bit and bit/little endian, couple 
architecture generations in there. The old ones go out of patent first, 
you can always tell patents are about to expire and get generic clones 
when corporate shills start insisting that support for something REALLY 
NEEDS TO GO AWAY RIGHT NOW...)

The or1k, microblaze, and sh4eb targets mostly work: sh4eb has broken 
eithernet (never tracked down whether it's kernel or qemu that's wrong I 
just know they disagree), or1k doesn't know how to exit ala 
https://lists.gnu.org/archive/html/qemu-devel/2024-11/msg04522.html and 
microblaze never wired up -hda to their hard drive emulation 
https://lists.nongnu.org/archive/html/qemu-devel/2025-01/msg01149.html
but I haven't had the spoons to argue with IBM Hat developers about 
procedure compliance auditing.

I need to track down a decent qemu emulation for armv7m, last time I 
tried with vanilla was https://landley.net/notes-2023.html#23-02-2023 
which was not promising, I downloaded a pic32 qemu fork last week, but 
haven't had the spoons to follow up on that either. Or to ship a new 
toybox/mkroot release: I've had 6.16 kernel patches since the week it 
came out, unbreaking powerpc and adding fdpic support to sh4-mmu, but 
hobbyist friendly this community ain't. Sigh, I should get back on the 
(beating a dead) horse...

I had hexagon userspace working for a while ("qemu-hexagon ls -l") but 
no kernel for it: Taylor Simpson said he was going to post a 
qemu-system-hexagon patchset with a comet board emulation, but that 
architecture has no gcc support (there was a gcc fork on code aurora but 
they abandoned it when the FSF went gplv3) so it needs an llvm-only 
toolchain build with a non-vanilla musl libc fork... Honestly the 
problem is compiler-rt sucks rocks: I should cycle back around to 
https://landley.net/notes-2021.html#28-07-2021 but just haven't.

(Although part of the "Just haven't" is that I posted a patch to lkml 
making generic $CROSS_COMPILE prefixes automatically work whether your 
toolchain was gcc or llvm, and the response was literally "we decided to 
manually specify LLVM= on the command line so you must always do that 
and we're refusing your two line fix to NOT need to do that". No really: 
https://lkml.iu.edu/2302.2/08170.html

> Warning: this patchset renames CONFIG_BLK_DEV_INITRD (!!!) to CONFIG_INITRAMFS
> and CONFIG_RD_* to CONFIG_INITRAMFS_DECOMPRESS_* (for example,
> CONFIG_RD_GZIP to CONFIG_INITRAMFS_DECOMPRESS_GZIP).
> If you still use initrd, see below for workaround.

Which will break existing configs for what benefit?

I'm not convinced the churn improves matters. Presumably the kernel 
command line paremeter is still rdinit= and grub still uses the "initrd" 
command to load an external cpio.gz.

But I bisect to find breakage like that every release so I assume the 
other embedded linux developers... are mostly shipping 10+ year old 
kernels that use half the memory of today's.

> Details
> ====
> I not only removed initrd, I also removed a lot of code, which
> became dead, including a lot of code in arch/.
> 
> Still I think the only two architectures I touched in non-trivial
> way are sh and 32-bit arm.
> 
> Also I renamed some files, functions and variables (which became misnomers) to proper names,
> moved some code around, removed a lot of mentions of initrd
> in code and comments. Also I cleaned up some docs.

Now that lkml.iu.edu is back up (yay!) all the links in 
ramfs-rootfs-initramfs.txt can theoretically be fixed just by switching 
the domain name.

> For example, I renamed the following global variables:
> 
> __initramfs_start
> __initramfs_size

That already said initramfs, and you renamed it.

> phys_initrd_start
> phys_initrd_size
> initrd_start
> initrd_end

Which is data delivered through grub's "initrd" command. Here's how I've 
been explaining it to people for years:

1) initrd is the external blob from the bootloader's initrd= option.

2) initramfs is the extractor plumbing, _init code that gets discarded.

3) rootfs is (for some reason) the name of the mounted filesystem in 
/proc/mounts (because letting it say "ramfs" or "tmpfs" like normal in 
/proc/mounts would be consistent and immediately understandable, so they 
couldn't have that).

(No I don't know why it's called rootfs. Having things like df not show 
overmounted filesystems isn't special case logic, why...? The argument 
to special case this because you can't unmount it is like saying PID 1 
shouldn't have a number because it can't exit. I would happily call the 
whole thing initramfs... but it's already not.)

> to:
> 
> __builtin_initramfs_start
> __builtin_initramfs_size
> phys_external_initramfs_start
> phys_external_initramfs_size
> virt_external_initramfs_start
> virt_external_initramfs_end

Do you believe people will understand what the slightly longer names are 
without looking them up?

I'm all for removing obsolete code, but a partial cleanup that still 
leaves various sharp edges around isn't necessarily a net improvement. 
Did you remove the NFS mount code from init/do_mounts.c? Part of the 
initramfs justification back in 2005 was "you can have a tiny initramfs 
set up our root filesystem so most of the init special casing can go"... 
and then they added CONFIG_DEVTMPFS_MOUNT but made it ONLY apply to the 
fallback root after the system has decided NOT to stay on rootfs, and 
ignored my patches to at least make it consistent.

The one config symbol that really seems to bite people in this area is 
BLK_DEV_INITRD because a common thing people running from initramfs want 
to do is yank the block layer entirely (CONFIG_BLOCK=n) and use 
initramfs instead, and needing to enable CONFIG_BLK_DEV_INITRD while

And the INSANE part is they generally want a static initrd to do it so 
they're not using the external loader, but Kconfig has INITRAMFS_SOURCE 
under CONFIG_BLK_DEV_INITRD and it's a mess. Renaming THAT symbol would 
be good.

But then, CONFIG_BLOCK is hidden under CONFIG_EXPERT which selects 
DEBUG_KERNEL (INCREASING KERNEL SIZE!!!) and thus everybody who does 
this patches the kconfig plumbing to be less stupid anyway. So the 
problem isn't JUST renaming the symbol...

(Oh CONFIG_EXPERT is SO STUPID. It's got a menu under it, but 
CONFIG_BLOCK isn't in that menu, it's at the top of menuconfig between 
loadable module support and executable file formats, just invisible 
unless you go down into a menu and switch on a setting and then back out 
to go find it. WHY WOULD YOU DO THAT?)

> New names precisely capture meaning of these variables.

To you. I'm not entirely sure what virt_external means. (Yes I could go 
read the code. No I don't want to. I EXPECT to need context and 
refreshing stuff, but having it change out from under me since the LAST 
time I did that is annoying when it's "same thing, new name, because".)

It makes more sense to YOU because you changed it to smell like you. 
Meanwhile 35 years of installed base expertise in other people's heads 
has been discarded and developed version skew for anyone maintaining an 
existing system. (That's not a "never do this", that's a "be aware 
humans consistently have the wrong weightings in our heads for this".)

Personally I usually have to look it up either way. And am spending more 
and more of my time poking at older kernels rather because newer stuff 
has either removed support for things I need or grown dependencies. (And 
because there's 20 years of installed base still in various stages of 
use, I'm personally likely to spend more time looking at the old names 
than the new ones.)

> This will break all configs out there (update your configs!).
> Still I think this is okay,

Because you don't have to clean up after it.

> because config names never were part of stable API.

I can forward everyone who asks me questions to you, or just agree when 
they tell me it's yet another reason not to upgrade.

> Other user-visible changes:
> 
> - Removed kernel command line parameters "load_ramdisk" and
> "prompt_ramdisk", which did nothing and were deprecated

Sure.

> - Removed kernel command line parameter "ramdisk_start",
> which was used for initrd only (not for initramfs)

Some bootloaders appended that to the kernel command line to specify 
where in memory they've loaded the initrd image, which could be a 
cpio.gz once upon a time. No idea what regressions happened since though.

(Last new bootloader I was involved with that had to make it work used 
some horrible hack editing a dtb at a fixed offset, like the old "rdev" 
trick but more brittle. Because "device tree better" than human readable 
textual mechanism. Fixing ramdisk_start to work right sounded like a 
more sane approach to me, but...)

> I tested my patchset on many architectures in Qemu using my Rust
> program, heavily based on mkroot [1].

You rewrote a 400 line bash script in rust.

Yeah, that's a rust developer. (And it smells like you now...)

> I used the following cross-compilers:
> 
> aarch64-linux-musleabi
> armv4l-linux-musleabihf
> armv5l-linux-musleabihf
> armv7l-linux-musleabihf
> i486-linux-musl
> i686-linux-musl
> mips-linux-musl
> mips64-linux-musl
> mipsel-linux-musl
> powerpc-linux-musl
> powerpc64-linux-musl
> powerpc64le-linux-musl
> riscv32-linux-musl
> riscv64-linux-musl
> s390x-linux-musl
> sh4-linux-musl
> sh4eb-linux-musl
> x86_64-linux-musl

or1k and microblaze work, they just don't pass the full smoketest for 
reasons that shouldn't affect initramfs testing.

I'm still waiting for Rich to ship the next musl release to do new 
toolchains...

https://www.openwall.com/lists/musl/2025/08/04/1

> Workaround
> ====
> If "retain_initrd" is passed to kernel, then initramfs/initrd,
> passed by bootloader, is retained and becomes available after boot
> as read-only magic file /sys/firmware/initrd [3].

Common use case for eg romfs is memory mapped flash or rom, so the 
address range in question isn't actually ram anyway. Mostly on mmu 
systems you just don't want the mapping to go away, so the kernel can 
still reach out and read it.

> This is even better than classic initrd, because:
> - You can use fs not supported by classic initrd, for example erofs

Network block device was the most recent one I saw used, but it had a 
tiny initramfs to set up and switch_root into it...

(Network block device != network filesystem. I have a todo item to 
integrate nbd-server into mkroot/testroot.sh but "-hda works" is one of 
the things it's testing...)

> - One copy is involved (from /sys/firmware/initrd to some file in /)
> as opposed to two when using classic initrd

Embedded developers have always been reaching out and using mappable 
flash directly. Vitaly Wool's ELC talk in 2015 (about running Linux in 
256k of sram, yes one quarter of one megabyte) described the process:

https://elinux.org/images/9/90/Linux_for_Microcontrollers-_From_Marginal_to_Mainstream.pdf

Rob

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ