lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <4D33BE61-16B9-4E70-9781-FB8F3C791FCA@joelfernandes.org>
Date:   Mon, 17 Oct 2022 08:27:48 -0400
From:   Joel Fernandes <joel@...lfernandes.org>
To:     Chengming Zhou <zhouchengming@...edance.com>
Cc:     Connor O'Brien <connoro@...gle.com>, linux-kernel@...r.kernel.org,
        kernel-team@...roid.com, John Stultz <jstultz@...gle.com>,
        Joel Fernandes <joelaf@...gle.com>,
        Qais Yousef <qais.yousef@....com>,
        Ingo Molnar <mingo@...hat.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Juri Lelli <juri.lelli@...hat.com>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        Dietmar Eggemann <dietmar.eggemann@....com>,
        Steven Rostedt <rostedt@...dmis.org>,
        Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
        Daniel Bristot de Oliveira <bristot@...hat.com>,
        Valentin Schneider <vschneid@...hat.com>,
        Will Deacon <will@...nel.org>,
        Waiman Long <longman@...hat.com>,
        Boqun Feng <boqun.feng@...il.com>,
        "Paul E . McKenney" <paulmck@...nel.org>
Subject: Re: [RFC PATCH 00/11] Reviving the Proxy Execution Series



> On Oct 17, 2022, at 12:26 AM, Chengming Zhou <zhouchengming@...edance.com> wrote:
> 
> On 2022/10/17 11:56, Joel Fernandes wrote:
>> 
>> 
>>>> On Oct 16, 2022, at 11:25 PM, Chengming Zhou <zhouchengming@...edance.com> wrote:
>>> 
>>> Hello,
>>> 
>>>> On 2022/10/4 05:44, Connor O'Brien wrote:
>>>> Proxy execution is an approach to implementing priority inheritance
>>>> based on distinguishing between a task's scheduler context (information
>>>> required in order to make scheduling decisions about when the task gets
>>>> to run, such as its scheduler class and priority) and its execution
>>>> context (information required to actually run the task, such as CPU
>>>> affinity). With proxy execution enabled, a task p1 that blocks on a
>>>> mutex remains on the runqueue, but its "blocked" status and the mutex on
>>>> which it blocks are recorded. If p1 is selected to run while still
>>>> blocked, the lock owner p2 can run "on its behalf", inheriting p1's
>>>> scheduler context. Execution context is not inherited, meaning that
>>>> e.g. the CPUs where p2 can run are still determined by its own affinity
>>>> and not p1's.
>>> 
>>> This is cool. We have a problem (others should have encountered it too) that
>>> priority inversion happened when the rwsem writer is waiting for many readers
>>> which held lock but are throttled by CFS bandwidth control. (In our use case,
>>> the rwsem is the mm_struct->mmap_sem)
>>> 
>>> So I'm curious if this work can also solve this problem? If we don't dequeue
>>> the rwsem writer when it blocked on the rwsem, then CFS scheduler pick it to
>>> run, we can use blocked chain to find the readers to run?
>> 
>> That seems a lot harder and unsupported by current patch set AFAICS (my exposure to this work is about a week so take it with a grain of salt). You could have multiple readers so how would you choose which reader to proxy for (round robin?).  Also, you no longer have a chain but a tree of chains, with the leaves being each reader - so you have to track that somehow, then keep migrating the blocked tasks in the chain to each readers CPU. Possibly migrating a lot more than in the case of a single chain. Also it’s not clear if it will be beneficial as proxying for one reader does not mean you’re improving the situation if it is another reader that is in need of the boost.
>> 
> 
> Thanks for your reply, it's indeed more complex than I think, and proxying for just one reader
> is also less efficient.
> 
> But this rwsem priority inversion problem hurts us so much that we are afraid to use
> CFS bandwidth control now. Imaging when 10 readers held mmap_sem then throttled for 20ms,
> the writer will have to wait for at least 200ms, which become worse if the writer held other lock.

I hear you. But on the other hand with so many readers the writer is bound to starve anyway. Rwsem is unfair to the writer by definition. But yes, I agree PE (if made to) can help here. I suggest also look into the per-VMA locks and maple tree work that Suren et all are doing, to improve the situation.

Thanks.

> 
> Thanks.
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ