netdev - Re: [PATCH v4 bpf-next 1/9] bpf: introduce bpf_spin

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20190125095955.GA4500@hirez.programming.kicks-ass.net>
Date:   Fri, 25 Jan 2019 10:59:55 +0100
From:   Peter Zijlstra <peterz@...radead.org>
To:     Alexei Starovoitov <alexei.starovoitov@...il.com>
Cc:     Alexei Starovoitov <ast@...nel.org>, davem@...emloft.net,
        daniel@...earbox.net, jakub.kicinski@...ronome.com,
        netdev@...r.kernel.org, kernel-team@...com, mingo@...hat.com,
        will.deacon@....com, Paul McKenney <paulmck@...ux.vnet.ibm.com>,
        jannh@...gle.com
Subject: Re: [PATCH v4 bpf-next 1/9] bpf: introduce bpf_spin_lock

On Thu, Jan 24, 2019 at 03:58:59PM -0800, Alexei Starovoitov wrote:
> On Thu, Jan 24, 2019 at 07:01:09PM +0100, Peter Zijlstra wrote:

> > > - on architectures that don't support queued_spin_lock trivial lock is used.
> > >   Note that arch_spin_lock cannot be used, since not all archs agree that
> > >   zero == unlocked and sizeof(arch_spinlock_t) != sizeof(__u32).
> > 
> > I really don't much like direct usage of qspinlock; esp. not as a
> > surprise.
> > 
> > Why does it matter if 0 means unlocked; that's what
> > __ARCH_SPIN_LOCK_UNLOCKED is for.
> > 
> > I get the sizeof(__u32) thing, but why not key off of that?
> 
> what do you mean by 'key off of that' ?
> to use arch_spinlock_t instead of qspinlock ?
> That was my first attempt, but then I painfully found that
> its size on parisc is 16 bytes and we're not going to penalize bpf
> to waste that much space because of single architecture.
> sizeof(arch_spinlock_t) can be 1 byte too (on sparc).

PowerPC has 8 bytes for some config options IIRC.

> That would fit in __u32, but I figured it's cleaner to use qspinlock
> on all archs that support it and dumb_spin_lock on archs that dont.
> 
> Another option is use to arch_spinlock_t when its sizeof==4

That's what I meant.

> and use dumb_spin_lock otherwise.
> It's doable, but imo still less clean than using qspinlock
> due to zero init. Since zero init is a lot less map work
> that zero inits all elements already.
> 
> If arch_spinlock_t is used than at map init time we would need to
> walk all elements and do __ARCH_SPIN_LOCK_UNLOCKED assignment
> (and maps can have millions of elements).
> Not horrible, but 100% waste of cycles for x86/arm64 where qspinlock
> is used. Such waste can be workaround further by doing ugly
> #idef __ARCH_SPIN_LOCK_UNLOCKED == 0 -> don't do init loop.
> And then add another #ifdef for archs with sizeof(arch_spinlock_t)!=4
> to keep zero init for all map types that support bpf_spin_lock
> via dumb_spin_lock.
> Clearly at that point we're getting into ugliness everywhere.
> Hence I've used qspinlock directly.

OK; I see.. but do these locks really have enough contention to run into
trouble with the simple test-and-set lock?

[ I tried to propose a simple ticket lock, but then realized the virt
  archs (s390,powerpc,etc.) would hate that and deleted everything again
]

Argh, what a mess..