linux-kernel - Re: [PATCH 2/4 v0.5] sched/umcg: RFC: add userspace atomic helpers

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <f6fdecfe-963d-4669-ae05-1d7192467a19@www.fastmail.com>
Date:   Tue, 14 Sep 2021 11:40:01 -0700
From:   "Andy Lutomirski" <luto@...nel.org>
To:     "Peter Zijlstra (Intel)" <peterz@...radead.org>
Cc:     "Jann Horn" <jannh@...gle.com>, "Peter Oskolkov" <posk@...gle.com>,
        "Peter Oskolkov" <posk@...k.io>, "Ingo Molnar" <mingo@...hat.com>,
        "Thomas Gleixner" <tglx@...utronix.de>,
        "Linux Kernel Mailing List" <linux-kernel@...r.kernel.org>,
        "Linux API" <linux-api@...r.kernel.org>,
        "Paul Turner" <pjt@...gle.com>, "Ben Segall" <bsegall@...gle.com>,
        "Andrei Vagin" <avagin@...gle.com>,
        "Thierry Delisle" <tdelisle@...terloo.ca>
Subject: Re: [PATCH 2/4 v0.5] sched/umcg: RFC: add userspace atomic helpers



On Tue, Sep 14, 2021, at 11:11 AM, Peter Zijlstra wrote:
> On Tue, Sep 14, 2021 at 09:52:08AM -0700, Andy Lutomirski wrote:
> > With a custom mapping, you don’t need to pin pages at all, I think.
> > As long as you can reconstruct the contents of the shared page and
> > you’re willing to do some slightly careful synchronization, you can
> > detect that the page is missing when you try to update it and skip the
> > update. The vm_ops->fault handler can repopulate the page the next
> > time it’s accessed.
> 
> The point is that the moment we know we need to do this user-poke, is
> schedule(), which could be called while holding mmap_sem (it being a
> preemptable lock). Which means we cannot go and do faults.

That’s fine. The page would be in one or two states: present and writable by kernel or completely gone. If its present, the scheduler writes it. If it’s gone, the scheduler skips the write and the next fault fills it in.

> 
> > All that being said, I feel like I’m missing something. The point of
> > this is to send what the old M:N folks called “scheduler activations”,
> > right?  Wouldn’t it be more efficient to explicitly wake something
> > blockable/pollable and write the message into a more efficient data
> > structure?  Polling one page per task from userspace seems like it
> > will have inherently high latency due to the polling interval and will
> > also have very poor locality.  Or am I missing something?
> 
> The idea was to link the user structures together in a (single) linked
> list. The server structure gets a list of all the blocked tasks. This
> avoids having to a full N iteration (like Java, they're talking stupid
> number of N).
> 
> Polling should not happen, once we run out of runnable tasks, the server
> task gets ran again and it can instantly pick up all the blocked
> notifications.
> 

How does the server task know when to read the linked list?  And what’s wrong with a ring buffer or a syscall?