linux-kernel - Re: [RFC PATCH 0/5] tasklist

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <CANN689HDiJN0GQYyos_-7Vudw9jLkDWEODCVL5PNsO5O4WV3oQ@mail.gmail.com>
Date:	Sat, 9 Mar 2013 18:37:07 -0800
From:	Michel Lespinasse <walken@...gle.com>
To:	Oleg Nesterov <oleg@...hat.com>
Cc:	David Howells <dhowells@...hat.com>,
	Thomas Gleixner <tglx@...utronix.de>,
	torvalds@...ux-foundation.org, akpm@...ux-foundation.org,
	linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH 0/5] tasklist_lock fairness issues

On Sat, Mar 9, 2013 at 10:26 AM, Oleg Nesterov <oleg@...hat.com> wrote:
> Hi Michel,
>
> Well. I can't say I really like this. 4/5 itself looks fine, but other
> complications do not look nice, at least in the long term. Imho, imho,
> I can be wrong.
>
> Everyone seem to agree that tasklist should die, this series doesn't
> even try to solve the fundamental problems with this global lock.

I admit that this series does not make any progress towards removing
tasklist_lock call sites, which has been the direction so far.

However, I think that patches 1-3 are not very complex (they add
inline wrapper functions around the tasklist_lock locking calls, but
the compiled code ends up being the same). And by showing us which
tasklist_lock call sites are taken from interrupt contexts, they show
up which places are imposing the current tasklist_lock semantics on
us:

- sending signals, with the send_sigio() / send_sigurg() / kill_pgrp()
call sites;
- posix_cpu_timer_schedule()
- sysrq debugging features

These are only 9 call sites, so they should be more easily attackable
than the full tasklist_lock removal problem. And if we could get these
few eliminated, then it would become trivial to make the remaining
tasklist_lock fair (and maybe even remove the unfair rwlock_t
implementation), which would IMO be a significant step forward.

In other words, patch 5 is, in a way, cheating by trying to get some
tasklist_lock fairness benefits right now. There is a bit of
complexity to it since it replaces tasklist_lock with a pair of locks,
one fair and one unfair. But if we don't like patch 5, patches 1-3 can
still give us a closer intermediate goal of removing much of the
tasklist_lock nastyness by eliminating only 9 of the existing locking
sites.

> However, I agree. tasklist rework can take ages, so probably we need
> to solve at least the write-starvation problem right now. Probably this
> series is correct (I didn't read it except 4/5), but too complex.

Regarding the complexity - do you mean the complexity of implementing
a fair rwlock (I would guess not, since you said 4/5 look ok) or do
you mean that you don't like having inline wrapper functions around
tasklist_lock locking sites (which is what 1-3 do) ?

> Can't we do some simple hack to avoid the user-triggered livelocks?
> Say, can't we add
>
>         // Just for illustration, at least we need CONTENDED/etc
>
>         void read_lock_fair(rwlock_t *lock)
>         {
>                 while (arch_writer_active(lock))
>                         cpu_relax();
>                 read_lock(lock);
>         }
>
> and then turn some of read_lock's (do_wait, setpriority, more) into
> read_lock_fair?
>
> I have to admit, I do not know how rwlock_t actually works, and it seems
> that arch_writer_active() is not trivial even on x86. __write_lock_failed()
> changes the counter back, and afaics you can't detect the writer in the
> window between addl and subl. Perhaps we can change __write_lock_failed()
> to use RW_LOCK_BIAS + HUGE_NUMBER while it spins, but I am not sure...

IMO having a separate fair rw spinlock is less trouble than trying to
build it on top of rwlock_t - especially if we're planning to remove
the tasklist_lock use of rwlock_t, which is currently the main
justification for rwlock_t's existence. I'm hoping we can eventually
get rid of rwlock_t, so I prefer not to build on top of it when we can
easily avoid doing so.

-- 
Michel "Walken" Lespinasse
A program is never fully debugged until the last user dies.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/