linux-kernel - Re: [PATCH v2 0/8] scheduler tinification

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 8 Jun 2017 16:16:06 -0400 (EDT)
From:   Nicolas Pitre <nicolas.pitre@...aro.org>
To:     Ingo Molnar <mingo@...nel.org>
cc:     Ingo Molnar <mingo@...hat.com>,
        Peter Zijlstra <peterz@...radead.org>,
        linux-kernel@...r.kernel.org,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Thomas Gleixner <tglx@...utronix.de>
Subject: Re: [PATCH v2 0/8] scheduler tinification

On Thu, 8 Jun 2017, Ingo Molnar wrote:

> 
> Also, let me make it clear at the outset that we do care about RAM footprint all 
> the time, and I've applied countless data structure and .text reducing patches to 
> the kernel. But there's a cost/benefit analysis to be made, and this series fails 
> that test in my view, because it increases the complexity of an already complex 
> code base:
> 
> * Nicolas Pitre <nicolas.pitre@...aro.org> wrote:
> 
> > Most IOT targets are so small that people are rewriting new operating systems 
> > from scratch for them. Lots of fragmentation already exists.
> 
> Let me offer a speculative if somewhat cynical prediction: 90% of those ghastly 
> IOT hardware hacks won't survive the market. The remaining 10% will be successful 
> financially, despite being ghastly hardware hacks and will eventually, in the next 
> iteration or so, get a proper OS.

Your prediction is based on a false premise. There is simply no money to 
be made with IoT hardware, especially in the low end.  Those little 
devices will be given away for free because it is in the service 
subscription that the money is. So the hardware has to, and will be, 
extremely cheap to produce. If a serious bug turns up in one of those 
device, my own cynical prediction is that no one will bother with field 
upgradability and they will ask you to throw the device away instead 
while they ship you a replacement (field upgradability implies at least 
twice the flash memory size and that comes with a cost so some will 
gamble that obsolescence will happen before a serious bug turns up).

> As users ask for more features the the hardware capabilities will increase 
> dramatically and home-grown microcontroller derived code plus minimal OSes will be 
> replaced by a 'real' OS. Because both developers and users will demand IPv6 
> compatibility, or Bluetooth connectivity, or storage support, or any random range 
> of features we have in the Linux kernel.

The "Cloud" is taking care of most of that. For the rest, your cellphone 
or IoT gateway will take over. IPv6 stacks are already used in tiny 
microcontrollers with as low as 32KB of RAM.

> With the stroke of a pen from the CFO: "yes, we can spend more on our next 
> hardware design!" the problem goes away, overnight, and nobody will look back at 
> the hardware hack that had only 1MB of RAM.

Of course hobbyists can already get a Raspberry Pi Zero and run a full 
featured Linux distro on it... for a mere 5 bucks. That comes with 512MB 
of RAM so my patches certainly don't make a difference there.

But that's not that simple.  First there is a fundamental constraint 
which is power consumption. If you want your device to run for months 
(some will hope years) from the same tiny battery then you just cannot 
afford SDRAM. So we're talking static RAM here. And to keep costs down 
because you want to give away your thingies by the millions for free it 
usually means single-chip designs with on-chip sub-megabyte static RAM.  
And in that field the 256KB mark is located towards the high end of the 
spectrum.  Many IPv6-capable chips available today have less than that.

And the thing is: people already manage to do a awful lot of stuff in 
such a constrained device. Some probably did a good job of it, but most 
of them likely suck and we don't know about their bugs because we have 
no idea what's running inside.

And because it is rather easy to write a new OS from scratch for such a 
small environment (and who didn't dream of writing his own OS, right?) 
then about every company in that field did so. That's not counting most 
Open Source ones which usually are close to single-person projects. So 
you get a lot of fragmentation, very very little peer review, and no 
incentive for proper maintenance because the cost saving simply isn't 
significant enough.

It is just like asteroids. Some of them collapse to form bigger objects 
like planets, while others have too weak a gravitational field to gather 
more matter. My vision is about leveraging the Linux gravitational power 
to bring the tiny embedded space together because, on its own, the tiny 
embedded space simply has not enough community power to actually 
organize itself.

Of course there are important parts of Linux that couldn't be reused as 
is in such a setup, but yet many other things still can be reused with 
either some modifications or a tiny parallel subsystem substitution. 
Technically, it is always possible to find ways to make it low on 
maintenance and beneficial to the wider community. But first and 
foremost you have to agree with the fundamental principle of gathering 
more people around a common codebase to make it better for everyone and 
not suggest that they stick to themselves. If you agree to that then we 
can move back to a technical discussion.

> > [...] We're talking about systems with less than one megabyte of RAM, sometimes 
> > much less.
> 
> Two data points:
> 
> Firstly, by the time any Linux kernel change I commit today gets to a typical 
> distro it's at least 0.5-1 years, 2 years for it to get widely used by hardware 
> shops - 5 years to get used by enterprises. More latency in more conservative 
> places.

Don't forget that you are also merging patches today from the Android 
folks that have been deployed into actual products years ago. So the 
enterprise distro comparison simply has no commonalities here.

> Secondly, I don't see Moore's Law reversing:
> 
>    http://nerdfever.com/wp-content/uploads/2015/06/2015-06_Moravec_MIPS.png
> 
> If you combine those two time frames, the consequence of this:
> 
> Even taking the 1MB size at face value (which I don't: a networking enabled system 
> can probably not function very well with just 1MB of RAM) - the RAM-starved 1 MB 
> system today will effectively be a 2 MB system in 2 years.

As surprising as it might be, IPv6 stacks requiring only a few dozens of 
kilobytes of memory do exist. Not so surprisingly though, some people 
think that the existing stacks simply suck and they are rewriting yet 
another one ... because they think their own will be better of course.

So there *is* still a huge market for sub-megabyte systems. I was also 
counting on Moore's law so that by the time Linux actually has the 
ability to be tailored for such systems then typical SRAM in those 
10-cents microcontrollers will be 512KB instead of 128 or 32.

> You can already fit a mostly full Linux system into 32 MB just fine, i.e. the 
> problem has solved itself just by waiting a bit or by increasing the hardware 
> capabilities a bit.

You just can't procure SDRAM chips smaller than 32MB on the market 
anymore. That's why Linux didn't get any pressure to fit in smaller than 
that for quite a while. But I've heard of some people having use cases 
for thousands if not millions of Linux VMs on a single server and 
they're looking at 10MB VMs or smaller for their application.

> But the kernel complexity you introduce with this series stays with us! It will be 
> an additional cost added to many scheduler commits going forward. It's an added 
> cost for all the other usecases.

OK, let's talk about that a bit. How isn't sched/core.c with its 7387 
lines not overly complex already? How is my moving of rt related code to 
rt.c and dl related code to dl.c not helping things? Isn't it easier to 
understand the 3500 lines of code in futex.c when half of it i.e. the PI 
specific code is split into a separate file? I ask you.

If you want to pick only those patches for now then please be my guest. 
At lease the first two patches of the series should be mergeable without 
even a doubt.

As to the actual complexity I'm introducing... this is just about not 
compiling some files in and stubbing calls to them out. Isn't that a 
sign of good isolation when you can stub the dl class out with only 9 
insertions and 6 deletions to sched/core.c? I'm not saying the 
complexity is nonexistent here, but just the _ability_ to remove a 
scheduler class enforces code abstractions which should be a good thing 
maintenance wise, no?

> Also, it's not like 20k .text savings will magically enable Linux to fit into 1MB 
> of RAM - it won't. The smallest still practical more or less generic Linux system 
> in existence today is around 16 MB. You can shrink it more, but the effort 
> increases exponentially once you go below a natural minimum size.

Again, I'm not after a tiny-and-generic Linux target. I'm after a 
tiny-and-heavily-tailored Linux subset that shares the same ABI and API 
as the generic Linux. Once you start compiling out pieces of the core 
kernel, it obviously isn't generic anymore, but the potential for size 
reduction becomes much bigger.

Anyway... as I said, you have to agree with the high level goal and 
principle of leveraging the Linux codebase to gather the tiny embedded 
people around it. The tiny embedded community simply will never take 
hold otherwise. . If we cannot agree on that then any other point of 
discussion is moot. In which case I'll simply drop this project entirely 
and move on.


Nicolas