linux-kernel - Re: [bisected] system hang after boot

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20171127125455.GC30679@arm.com>
Date:   Mon, 27 Nov 2017 12:54:56 +0000
From:   Will Deacon <will.deacon@....com>
To:     Martin Schwidefsky <schwidefsky@...ibm.com>
Cc:     Peter Zijlstra <peterz@...radead.org>,
        Sebastian Ott <sebott@...ux.vnet.ibm.com>,
        Ingo Molnar <mingo@...nel.org>,
        Heiko Carstens <heiko.carstens@...ibm.com>,
        linux-kernel@...r.kernel.org
Subject: Re: [bisected] system hang after boot

Hi Martin,

On Mon, Nov 27, 2017 at 01:49:18PM +0100, Martin Schwidefsky wrote:
> On Mon, 27 Nov 2017 11:49:48 +0000
> Will Deacon <will.deacon@....com> wrote:
> > On Wed, Nov 22, 2017 at 09:22:17PM +0100, Peter Zijlstra wrote:
> > > On Wed, Nov 22, 2017 at 06:26:59PM +0000, Will Deacon wrote:
> > >   
> > > > Now, I can't see what the break_lock is doing here other than causing
> > > > problems. Is there a good reason for it, or can you just try removing it
> > > > altogether? Patch below.  
> > > 
> > > The main use is spin_is_contended(), which in turn ends up used in
> > > __cond_resched_lock() through spin_needbreak().
> > > 
> > > This allows better lock wait times for PREEMPT kernels on platforms
> > > where the lock implementation itself cannot provide 'contended' state.
> > > 
> > > In that capacity the write-write race shouldn't be a problem though.  
> > 
> > I'm not sure why it isn't a problem: given that the break_lock variable
> > can read as 1 for a lock that is no longer contended and 0 for a lock that
> > is currently contended, then the __cond_resched_lock is likely to see a
> > value of 0 (i.e. spin_needbreak always return false) more often than no
> > since it's checked by the lock holder.
> 
> Grepping for 'break_lock' the two locking blueprints are the only places
> where the field is written to. Unless I am blind, the associated unlock
> functions do *not* reset 'break_lock'.
> 
> Without the raw_##op##_can_lock(lock) check the first of the blueprints
> now looks like this:
> 
> void __lockfunc __raw_##op##_lock(locktype##_t *lock)                   \
> {                                                                       \
>         for (;;) {                                                      \
>                 preempt_disable();                                      \
>                 if (likely(do_raw_##op##_trylock(lock)))                \
>                         break;                                          \
>                 preempt_enable();                                       \
>                                                                         \
>                 if (!(lock)->break_lock)                                \
>                         (lock)->break_lock = 1;                         \
>                 while ((lock)->break_lock)                              \
>                         arch_##op##_relax(&lock->raw_lock);             \
>         }                                                               \
>         (lock)->break_lock = 0;                                         \
> }                                                                       \
> 
> All it takes to create an endless loop is two CPUs, the first acquired the
> lock and the second tries to get the lock. After the unsuccessful trylock
> of the second CPU, the first CPU releases the lock and never tries to take
> it again. The second CPU will be stuck in an endless loop.

Yes, it basically relies on the lock holder never winning that race.
However, Peter's use-case just needs the lock-holder to be able to detect
contention (which is always best-effort anyway), so I think we can make that
"work" by removing the while loop above (see my subsequent diff sent to
Sebastian).

It's still questionable, because on a machine with store-buffers you really
want to order writes to break_lock against something else, but it might
happen to fall out depending on the details of the trylock() implementation.

> I guess my best course of action is to remove GENERIC_LOCKBREAK from
> arch/s390/Kconfig to avoid this construct altogether. Let us see what
> breaks if I do that ..

We could just consider ripping out GENERIC_LOCKBREAK entirely, but I was
hoping we could get a simpler fix in for now.

Will