linux-kernel - Re: [PATCH v2 0/8] scheduler tinification

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.LFD.2.20.1706111126300.29475@knanqh.ubzr>
Date:   Sun, 11 Jun 2017 12:45:13 -0400 (EDT)
From:   Nicolas Pitre <nicolas.pitre@...aro.org>
To:     Ingo Molnar <mingo@...nel.org>
cc:     Ingo Molnar <mingo@...hat.com>,
        Peter Zijlstra <peterz@...radead.org>,
        linux-kernel@...r.kernel.org,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Thomas Gleixner <tglx@...utronix.de>
Subject: Re: [PATCH v2 0/8] scheduler tinification

On Sun, 11 Jun 2017, Ingo Molnar wrote:

> * Nicolas Pitre <nicolas.pitre@...aro.org> wrote:
> 
> > But that's not that simple.  First there is a fundamental constraint 
> > which is power consumption. If you want your device to run for months 
> > (some will hope years) from the same tiny battery then you just cannot 
> > afford SDRAM. So we're talking static RAM here. And to keep costs down 
> > because you want to give away your thingies by the millions for free it 
> > usually means single-chip designs with on-chip sub-megabyte static RAM.  
> > And in that field the 256KB mark is located towards the high end of the 
> > spectrum.  Many IPv6-capable chips available today have less than that.
> > 
> > And the thing is: people already manage to do a awful lot of stuff in 
> > such a constrained device. Some probably did a good job of it, but most 
> > of them likely suck and we don't know about their bugs because we have 
> > no idea what's running inside.
> 
> Ok, let me put it this way: there's no way in hell I see a viable Linux kernel 
> running (no matter how stripped down) in 32K or even 64K of RAM. 256K is a stretch 
> as well - but that RAM size you claim to be already 'high end', so it probably 
> wouldn't be used as a standardized solution anyway...

I never pretended to make Linux runable in 32KB of RAM. Therefore we 
strongly agree here. I however mentioned that some 32KB chips are IPv6 
capable, just to give you a different perspective given that you're more 
acquainted with multi-gigabyte systems.

And as you did mention Moore's law previously, the fact that 256KB of 
RAM might be somewhat high-end today in that space, that should become 
pretty common in the near future. The test board in front of me has 
384KB of SRAM and bigger ones exist.

> Today a Linux 'allnoconfig' kernel, i.e. a kernel with no device drivers and no 
> filesystems whatsoever and with everything optional turned off (including all 
> networking!), is over 2MB large text+data (on x86, which has a compressed 
> instruction set - it would possibly be larger on simpler CPUs):
> 
>  triton:~/tip> size vmlinux
>     text    data     bss     dec  filename
>   926056  208624 1215904 2350584  vmlinux

On ARM, allnoconfig produces:

   text    data     bss     dec     hex filename
 548144   95480   24252  667876   a30e4 vmlinux

But more realistically, the test system I'm using currently runs the 
kernel XIP from flash, so the text size is an indirect metric. It uses 
external RAM as the 384KB of SRAM still doesn't allow for a successful 
boot. But here's what I get once booted:

/ # free
             total       used       free     shared    buffers     cached
Mem:          7936       1624       6312          0          0        492
-/+ buffers/cache:       1132       6804
/ # uname -a
Linux (none) 4.12.0-rc4-00013-g32352a9367 #35 PREEMPT Sun Jun 11 10:45:02 EDT 2017 armv7ml GNU/Linux

I could make user space XIP from flash as well, but right now it is just 
some initramfs living in RAM.

Obviously you can't use the native Linux networking stack in such small 
systems. But a few IPv6 stacks have been made to work in a few kilobytes 
already.

> A series that shrinks the .text size of the allnoconfig core Linux kernel from 1MB 
> to 9.9MB in isolation is not proof.

I assume you meant 0.9MB.

It is no proof of course. But I'm following the well known and proven 
"release early, release often" mantra here... unless this is no longer 
promoted?

> There will literally have to be two orders of magnitude more patches than that to 
> reach the 32K size envelope, if I (very) optimistically assume that the difficulty 
> to shrink code is constant (which it most certainly is not).

Once again, my goal is _not_ 32KB.

And I don't intend to shrink code. Most of the time I just want to 
_remove_ code. Compiling it out to be precise. The goal of this series 
is all about compiling out code. And to achieve that with the scheduler, 
I simply moved some code to different source files and not including 
those source files in the final build. That keeps the number of #ifdef's 
to a minimum but it makes a big diffstat due to the code movement.

In the TTY layer case, I found out that writing a simplistic parallel 
equivalent that doesn't have to scale to server class systems and 
remains compatible with existing drivers allowed a 6x factor in size 
reduction. The same strategy could be employed with the VFS where any 
kind of file caching doesn't make sense in a tiny system. Don't worry, 
I'm lot looking forward to using BTRFS in 256KB of RAM either.

To give you an idea, here's the size repartition from that booting 
kernel above:

$ size */built-in.o
   text    data     bss     dec     hex filename
 290669   41864    3616  336149   52115 drivers/built-in.o
 173275    1189    5472  179936   2bee0 fs/built-in.o
  10135   14084      84   24303    5eef init/built-in.o
 198624   22000   25160  245784   3c018 kernel/built-in.o
  79064     133      53   79250   13592 lib/built-in.o
  97034    6328    3532  106894   1a18e mm/built-in.o
   2135       0       0    2135     857 security/built-in.o
 146046       0       0  146046   23a7e usr/built-in.o
      0       0       0       0       0 virt/built-in.o

That's without LTO (because with LTO there's no way to size individual 
parts) and without syscall trimming. From previous experiments, LTO 
brings a 20% reduction on the final build size, and LTO with syscall 
trimming together provide about 40% reduction. One nice thing about LTO 
is that part of the 75KB of lib code automatically gets discarded when 
not referenced, etc.  This is not always the case for most of the core 
driver infrastructure despite most of it not being used in my case.

But there are pieces of the kernel that can't automatically be 
eliminated, such as scheduler classes, because the compiler just can't 
tell if they'll be used at run time.

Some "memory hogs" (relatively speaking) might need a tiny version to 
cope with a handful of processes max and a few static drivers. As Alan 
said, wait queues as they are right now consume a lot of memory. But 
since they're well defined and encapsulated already, it is possible to 
provide a light alternative implemented in a way that uses much less 
memory with the side effect of being much less scalable. But scalability 
is not a huge concern when you have only 256KB of RAM.

So it is a combination of strategies that will make the 256KB goal 
possible. And as you can see from the free output above, this is not 
_that_ far off already.

> But you can prove me wrong: show me a Linux kernel for a real device that fits 
> into 32KB of RAM (or even 256 KB) and _then_ I'll consider the cost/benefit 
> equation.

Your insisting on 32KB in this discussion is simply disingenuous.

So you are basically saying that you want me to work another year on 
this project "behind closed doors" and come out with "a final solution" 
before you tell me if my approach is worthy of your consideration? 
Thanks but no thanks. As I said elsewhere, the value in this proposal is 
mainline inclusion in an ongoing process otherwise there is no gain over 
those small OSes out there, and my time is more valuable than that.


Nicolas