linux-kernel - Re: [PATCH v2 0/8] scheduler tinification

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.LFD.2.20.1706071206070.29475@knanqh.ubzr>
Date:   Wed, 7 Jun 2017 13:09:37 -0400 (EDT)
From:   Nicolas Pitre <nicolas.pitre@...aro.org>
To:     Ingo Molnar <mingo@...nel.org>
cc:     Ingo Molnar <mingo@...hat.com>,
        Peter Zijlstra <peterz@...radead.org>,
        linux-kernel@...r.kernel.org,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Thomas Gleixner <tglx@...utronix.de>
Subject: Re: [PATCH v2 0/8] scheduler tinification

On Wed, 7 Jun 2017, Ingo Molnar wrote:

> 
> * Nicolas Pitre <nicolas.pitre@...aro.org> wrote:
> 
> > Many embedded systems don't need the full scheduler support. Most of the
> > time, user space is tightly controlled and many of the scheduler facilities
> > are simply unused.
> 
> Sorry, NAK:
> 
> >  23 files changed, 3190 insertions(+), 2897 deletions(-)
> 
> That's a lot of extra code plus churn for a code base that is already pretty
> #ifdef heavy.
> 
> Also, the savings are marginal, even with significant functionality disabled:
> 
> >   text    data     bss     dec     hex filename
> >  28623    3404     128   32155    7d9b kernel/sched/built-in.o
> >
> > With this series and dl and rt classes disabled:
> >
> >   text    data     bss     dec     hex filename
> >  20734    3334      40   24108    5e2c kernel/sched/built-in.o
> 
> With 1GHz + 1GB RAM SoCs being well below $10 in bulk we worry about code 
> complexity, predictability, testability, behavioral and ABI uniformity a lot more 
> than about the last 10-20k of kernel text footprint...
> 
> So I think the 'tiny' efforts are fundamentally misguided and are shooting for an 
> ever shrinking market of RAM/ROM starved products whose share is shrinking every 
> month.

I'm rather seeing the opposite: an ever growing market of 
internet-connected coin-cell-battery-powered tiny devices where the 
amount of RAM is counted in kilobytes rather than megabytes.

Let me repeat some background as to what my fundamental motivation is, 
and then maybe you'll understand why I'm doing this.

What is the biggest buzzword in the IT industry besides AI right now?
It is IOT.

Most IOT targets are so small that people are rewriting new operating 
systems from scratch for them. Lots of fragmentation already exists. 
We're talking about systems with less than one megabyte of RAM, 
sometimes much less.  Still, those things are being connected to the 
internet. And this is going to be a total security nightmare.

I wish to be able to leverage the Linux ecosystem for as much of the IOT 
space as possible to avoid the worst of those nightmares.  The Linux 
ecosystem has a *lot* of knowledgeable people around it, a lot of 
testing infrastructure and tooling available already, etc.  If a 
security issue turns up on Linux, it has a greater chance of being 
caught early, or fixed quickly otherwise, and finding people with the 
right knowledge is easier on Linux than it could be on any RTOS out 
there. Still with me so far?

Yes we have tools that can automatically reduce the kernel size. We can 
use LTO with the compiler, etc.  LTO is pretty good already. It can 
typically reduce the kernel size by 20%.  If all system calls are 
disabled except for a few ones, then LTO can get rid of another 20%. The 
minimal kernel I get is still 400-500 KB in size.  That's still too big.

There is this 120 KB of VFS code that is always there even though there 
is no real filesystem at all configured in the kernel. There is that 
other 100 KB of core driver support code despite the fact that the set 
of drivers I'm using are very simple and make no use of most of that 
core driver code. Etc.

There comes a point where there is no option but to explicitly trim out 
parts of the kernel as such decisions cannot be automated, hence this 
patch series. Bringing the scheduler under 20KB in size is therefore 
very useful in that context. Alternatively I could push for a parallel 
implementation as I did with the TTY layer where I obtained a 6x size 
reduction. But in the scheduler case I obtained only a 2x size reduction 
so I thought it could be more profitable to get about the same saving by 
reworking the existing code instead., and eventually contributing a very 
bare scheduler class that would be a smaller alternative to the fair 
scheduler for deployments where that makes sense. Unless you actually 
changed your mind about alternative whole scheduler implementations that 
is...

For Linux to be suitable for small IoT, it has to be small, damn small. 
My target is 256 KB of RAM.  And if you look at the kind of application 
those 256-KB systems are doing, it's basically one main task typically 
acquiring sensor data and sending it in some crypted protocol over a 
wireless network on the internet, and possibly accepting commands back.  
So what do you need from the OS to achieve that?  A few system calls, a 
minimal scheduler, minimal memory management, minimal filesystem 
structure and minimal network stack. And your user app.

So, why not having each of those blocks be created using the existing 
Linux syscall interface and internal API?  At that point, it should be 
possible to take your standard full-featured Linux workstation and 
develop your user app on it, run it there using all the existing native 
debugging tools, etc. In the end you just pick the mini version of 
everything for the final target and you're done.  And you don't have to 
learn a whole new OS, development environment and program model, etc.

Next on my list would be a cache-less, completely serialized VFS bypass 
that has only what's needed to make the link between the read/write 
syscalls, a filesystem driver and a block driver while preserving the 
existing kernel APIs. And by being really small, the maintenance cost of 
a "parallel" implementation isn't very high, certainly much less than 
trying to maintain a single code path that can scale to both extremes 
in that case.

PS: As far as I remember, Linus didn't condemn the idea last time I 
    brought up this topic in his presence. I therefore hope we could 
    find ways for allowing Linux usage into the largest computing device 
    deployment to come.


Nicolas