linux-kernel - Re: [PATCH V2] MIPS: implement smp_cond_load

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <tencent_26F8B9E004D4512B2225FCE1@qq.com>
Date:   Tue, 10 Jul 2018 19:45:22 +0800
From:   "陈华才" <chenhc@...ote.com>
To:     "Peter Zijlstra" <peterz@...radead.org>
Cc:     "Paul Burton" <paul.burton@...s.com>,
        "Ralf Baechle" <ralf@...ux-mips.org>,
        "James Hogan" <jhogan@...nel.org>,
        "linux-mips" <linux-mips@...ux-mips.org>,
        "Fuxin Zhang" <zhangfx@...ote.com>,
        "wuzhangjin" <wuzhangjin@...il.com>,
        "stable" <stable@...r.kernel.org>,
        "Alan Stern" <stern@...land.harvard.edu>,
        "Andrea Parri" <andrea.parri@...rulasolutions.com>,
        "Will Deacon" <will.deacon@....com>,
        "Boqun Feng" <boqun.feng@...il.com>,
        "Nicholas Piggin" <npiggin@...il.com>,
        "David Howells" <dhowells@...hat.com>,
        "Jade Alglave" <j.alglave@....ac.uk>,
        "Luc Maranget" <luc.maranget@...ia.fr>,
        "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
        "Akira Yokosawa" <akiyks@...il.com>,
        "LKML" <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH V2] MIPS: implement smp_cond_load_acquire() for Loongson-3

Hi, Peter,

I'm afraid that you have missing something......

Firstly, our previous conclusion (READ_ONCE need a barrier to avoid 'reads prioritised over writes') is totally wrong. So define cpu_relax() to smp_mb() like ARM11MPCore is incorrect, even if it can 'solve' Loongson's problem.
Secondly, I think the real problem is like this:
 1, CPU0 set the lock to 0, then do something;
 2, While CPU0 is doing something, CPU1 set the flag to 1 with WRITE_ONCE(), and then wait the lock become to 1 with a READ_ONCE() loop;
 3, After CPU0 complete its work, it wait the flag become to 1, and if so then set the lock to 1; 
 4, If the lock becomes to 1, CPU1 will leave the READ_ONCE() loop.
If without SFB, everything is OK. But with SFB in step 2, a READ_ONCE() loop is right after WRITE_ONCE(), which makes the flag cached in SFB (so be invisible by other CPUs) for ever, then both CPU0 and CPU1 wait for ever.

I don't think this is a hardware bug, in design, SFB will flushed to L1 cache in three cases:
1, data in SFB is full (be a complete cache line);
2, there is a subsequent read access in the same cache line;
3, a 'sync' instruction is executed.

In this case, there is no other memory access (read or write) between WRITE_ONCE() and READ_ONCE() loop. So Case 1 and Case 2 will not happen, and the only way to make the flag be visible is wbflush (wbflush is sync in Loongson's case).

I think this problem is not only happens on Loongson, but will happen on other CPUs which have write buffer (unless the write buffer has a 4th case to be flushed).

Huacai

------------------ Original ------------------
From:  "Peter Zijlstra"<peterz@...radead.org>;
Date:  Tue, Jul 10, 2018 06:54 PM
To:  "Huacai Chen"<chenhc@...ote.com>;
Cc:  "Paul Burton"<paul.burton@...s.com>; "Ralf Baechle"<ralf@...ux-mips.org>; "James Hogan"<jhogan@...nel.org>; "linux-mips"<linux-mips@...ux-mips.org>; "Fuxin Zhang"<zhangfx@...ote.com>; "wuzhangjin"<wuzhangjin@...il.com>; "stable"<stable@...r.kernel.org>; "Alan Stern"<stern@...land.harvard.edu>; "Andrea Parri"<andrea.parri@...rulasolutions.com>; "Will Deacon"<will.deacon@....com>; "Boqun Feng"<boqun.feng@...il.com>; "Nicholas Piggin"<npiggin@...il.com>; "David Howells"<dhowells@...hat.com>; "Jade Alglave"<j.alglave@....ac.uk>; "Luc Maranget"<luc.maranget@...ia.fr>; "Paul E. McKenney"<paulmck@...ux.vnet.ibm.com>; "Akira Yokosawa"<akiyks@...il.com>; "LKML"<linux-kernel@...r.kernel.org>;
Subject:  Re: [PATCH V2] MIPS: implement smp_cond_load_acquire() for Loongson-3

On Tue, Jul 10, 2018 at 11:36:37AM +0200, Peter Zijlstra wrote:

> So now explain why the cpu_relax() hack that arm did doesn't work for
> you?

So below is the patch I think you want; if not explain in detail how
this is wrong.

diff --git a/arch/mips/include/asm/processor.h b/arch/mips/include/asm/processor.h
index af34afbc32d9..e59773de6528 100644
--- a/arch/mips/include/asm/processor.h
+++ b/arch/mips/include/asm/processor.h
@@ -386,7 +386,17 @@ unsigned long get_wchan(struct task_struct *p);
 #define KSTK_ESP(tsk) (task_pt_regs(tsk)->regs[29])
 #define KSTK_STATUS(tsk) (task_pt_regs(tsk)->cp0_status)

+#ifdef CONFIG_CPU_LOONGSON3
+/*
+ * Loongson-3 has a CPU bug where the store buffer gets starved when stuck in a
+ * read loop. Since spin loops of any kind should have a cpu_relax() in them,
+ * force a store-buffer flush from cpu_relax() such that any pending writes
+ * will become available as expected.
+ */
+#define cpu_relax()	smp_mb()
+#else
 #define cpu_relax()	barrier()
+#endif

 /*
  * Return_address is a replacement for __builtin_return_address(count)