[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2a33d478-b7a8-5b3c-7bc5-f33eb27b44fa@rasmusvillemoes.dk>
Date:   Thu, 11 Mar 2021 01:17:56 +0100
From:   Rasmus Villemoes <linux@...musvillemoes.dk>
To:     Linus Torvalds <torvalds@...ux-foundation.org>
Cc:     Luis Chamberlain <mcgrof@...nel.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Jessica Yu <jeyu@...nel.org>, Borislav Petkov <bp@...en8.de>,
        Jonathan Corbet <corbet@....net>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Nick Desaulniers <ndesaulniers@...gle.com>
Subject: Re: [PATCH v2 1/2] init/initramfs.c: allow asynchronous unpacking
On 09/03/2021 23.16, Linus Torvalds wrote:
> On Tue, Mar 9, 2021 at 1:17 PM Rasmus Villemoes
> <linux@...musvillemoes.dk> wrote:
>>
>> So add an initramfs_async= kernel parameter, allowing the main init
>> process to proceed to handling device_initcall()s without waiting for
>> populate_rootfs() to finish.
> 
> Oh, and a completely unrelated second comment about this: some of the
> initramfs population code seems to be actively written to be slow.
> 
> For example, I'm not sure why that write_buffer() function uses an
> array of indirect function pointer actions. Even ignoring the whole
> "speculation protections make that really slow" issue that came later,
> it seems to always have been actively (a) slower and (b) more complex.
> 
> The normal way to do that is with a simple switch() statement, which
> makes the compiler able to do a much better job. Not just for the
> state selector - maybe it picks that function pointer approach, but
> probably these days just direct comparisons - but simply to do things
> like inline all those "it's used in one place" cases entirely. In
> fact, at that point, a lot of the state machine changes may end up
> turning into pure goto's - compilers are sometimes good at that
> (because state machines have often been very timing-critical).
FWIW, I tried doing
--- a/init/initramfs.c
+++ b/init/initramfs.c
@@ -401,24 +401,26 @@ static int __init do_symlink(void)
        return 0;
 }
-static __initdata int (*actions[])(void) = {
-       [Start]         = do_start,
-       [Collect]       = do_collect,
-       [GotHeader]     = do_header,
-       [SkipIt]        = do_skip,
-       [GotName]       = do_name,
-       [CopyFile]      = do_copy,
-       [GotSymlink]    = do_symlink,
-       [Reset]         = do_reset,
-};
-
 static long __init write_buffer(char *buf, unsigned long len)
 {
+       int ret;
+
        byte_count = len;
        victim = buf;
-       while (!actions[state]())
-               ;
+       do {
+               switch (state) {
+               case Start: ret = do_start(); break;
+               case Collect: ret = do_collect(); break;
+               case GotHeader: ret = do_header(); break;
+               case SkipIt: ret = do_skip(); break;
+               case GotName: ret = do_name(); break;
+               case CopyFile: ret = do_copy(); break;
+               case GotSymlink: ret = do_symlink(); break;
+               case Reset: ret = do_reset(); break;
+               }
+       } while (!ret);
+
        return len - byte_count;
 }
and yes, all the do_* functions get inlined into write_buffer with some
net space saving:
add/remove: 0/9 grow/shrink: 1/0 up/down: 1696/-2112 (-416)
Function                                     old     new   delta
write_buffer                                 100    1796   +1696
actions                                       32       -     -32
do_start                                      52       -     -52
do_reset                                     112       -    -112
do_skip                                      128       -    -128
do_collect                                   144       -    -144
do_symlink                                   172       -    -172
do_copy                                      304       -    -304
do_header                                    572       -    -572
do_name                                      596       -    -596
Total: Before=5360, After=4944, chg -7.76%
(ppc32 build). But the unpacking still takes the same time. It might be
different on x86.
Rasmus
Powered by blists - more mailing lists
 
