lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CA+55aFyzwJjoQQoxT2L+mZtSFwssCx9+nex0H+Pqc_SjAC+Rtw@mail.gmail.com>
Date:	Mon, 21 Sep 2015 11:28:40 -0700
From:	Linus Torvalds <torvalds@...ux-foundation.org>
To:	Aleksa Sarai <cyphar@...har.com>
Cc:	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: setup() and odd Syscalls in Ancient History

On Mon, Sep 21, 2015 at 6:07 AM, Aleksa Sarai <cyphar@...har.com> wrote:
>
> I was wondering if you could explain *why* setup() was a syscall in
> early Linux? I understand that it did some ... odd things (one
> function both freeing the initial memory and setting up the
> filesystems, devices and mounting) which you obviously need to do in
> init. But from what I can see (after digging out v0.01 from the tomb),
> it was *never* used by userspace, which begs the question: why was it
> a syscall in the first place?

Heh. Interesting question, and I have to admit I went and looked at
the code to remind me what was going on.

It's not really obvious, because the code process separation memory
management in very early Linux was based on segmentation. Yes, it used
paging too, but it originally used one single page table with 64
chunks of 64MB each (if I remember correctly), and then segments would
be used to make each process see a single 64MB slice of the 4GB
address space.

So the code actually goes into user space, but the very *initial* user
space is actually shared with the kernel (until the first fork()). We
do the initial user mode trasnition by just switching to user
segments.

So in init/main.c, the magic is that

        move_to_user_mode();
        if (!fork()) {          /* we count on this going ok */
                init();
        }
        for(;;) pause();

where that "move_to_user_mode()" will reload all the segments (some by
hand, but CS/SS by doing an "iret").  So that first fork() will
actually be done in user space, and before that happens the kernel
cannot sleep (because there is no idle task).

That "for (;;) pause()" after the fork() is the idle task, which
allows the "init()" code to sleep.

So "setup()" is a system call because it needs to sleep (to do the
IO), and the kernel couldn't sleep before it got to that user-mode and
first fork thing.

Could it have been done differently? Sure. Obviously we don't do it
that way any more, and we create the idle tasks separately and not
with "fork()" any more. But it kind of made sense at the time.

                  Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ