linux-kernel - Re: setup() and odd Syscalls in Ancient History

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <CA+55aFyzwJjoQQoxT2L+mZtSFwssCx9+nex0H+Pqc_SjAC+Rtw@mail.gmail.com>
Date:	Mon, 21 Sep 2015 11:28:40 -0700
From:	Linus Torvalds <torvalds@...ux-foundation.org>
To:	Aleksa Sarai <cyphar@...har.com>
Cc:	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: setup() and odd Syscalls in Ancient History

On Mon, Sep 21, 2015 at 6:07 AM, Aleksa Sarai <cyphar@...har.com> wrote:
>
> I was wondering if you could explain *why* setup() was a syscall in
> early Linux? I understand that it did some ... odd things (one
> function both freeing the initial memory and setting up the
> filesystems, devices and mounting) which you obviously need to do in
> init. But from what I can see (after digging out v0.01 from the tomb),
> it was *never* used by userspace, which begs the question: why was it
> a syscall in the first place?

Heh. Interesting question, and I have to admit I went and looked at
the code to remind me what was going on.

It's not really obvious, because the code process separation memory
management in very early Linux was based on segmentation. Yes, it used
paging too, but it originally used one single page table with 64
chunks of 64MB each (if I remember correctly), and then segments would
be used to make each process see a single 64MB slice of the 4GB
address space.

So the code actually goes into user space, but the very *initial* user
space is actually shared with the kernel (until the first fork()). We
do the initial user mode trasnition by just switching to user
segments.

So in init/main.c, the magic is that

        move_to_user_mode();
        if (!fork()) {          /* we count on this going ok */
                init();
        }
        for(;;) pause();

where that "move_to_user_mode()" will reload all the segments (some by
hand, but CS/SS by doing an "iret").  So that first fork() will
actually be done in user space, and before that happens the kernel
cannot sleep (because there is no idle task).

That "for (;;) pause()" after the fork() is the idle task, which
allows the "init()" code to sleep.

So "setup()" is a system call because it needs to sleep (to do the
IO), and the kernel couldn't sleep before it got to that user-mode and
first fork thing.

Could it have been done differently? Sure. Obviously we don't do it
that way any more, and we create the idle tasks separately and not
with "fork()" any more. But it kind of made sense at the time.

                  Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/