lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Tue, 14 Apr 2015 19:02:18 +0000 (UTC)
From:	Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
To:	Thomas Gleixner <tglx@...utronix.de>
Cc:	Andrew Morton <akpm@...ux-foundation.org>,
	linux-kernel@...r.kernel.org,
	Josh Triplett <josh@...htriplett.org>,
	KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>,
	Steven Rostedt <rostedt@...dmis.org>,
	Nicholas Miell <nmiell@...cast.net>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Ingo Molnar <mingo@...hat.com>,
	Alan Cox <gnomes@...rguk.ukuu.org.uk>,
	Lai Jiangshan <laijs@...fujitsu.com>,
	Stephen Hemminger <stephen@...workplumber.org>,
	Peter Zijlstra <peterz@...radead.org>,
	David Howells <dhowells@...hat.com>
Subject: Re: [PATCH v14 for 4.1] sys_membarrier(): system-wide memory
 barrier (x86)

----- Original Message -----
> On Mon, 13 Apr 2015, Mathieu Desnoyers wrote:
> 
> > [ Andrew, can you take this for the 4.1 merge window ? ]
> 
> You probably mean 4.2, right?

It can wait for 4.2 if you think it still need some work,
of course. I was submitting it for 4.1 because I thought
the discussion had settled down in the prior rounds.

> 
> This fails the basic test for exposure in linux-next, adds syscalls
> without the explicit ack of any x86 maintainer and exposes a user
> space ABI with a magic undocumented flags argument.

I guess for the undocumented part we need to add a manpage,
right ?

> 
> > This patch only adds the system call to x86.
> 
> So the changes to
>  
> >  include/uapi/asm-generic/unistd.h |    4 +-
> 
> are just cosmetic, right?

Hrm, right, those are not. I did not get the full impact
of adding it into this generic header. This means arc, arm64,
c6x, hexagon, metag, nios2, openrisc, score, tile,
unicore32 are also getting this system call.

So the question would be: should we introduce this syscall
in different patches for each architecture, or should
we add them all in one go ? There is nothing fundamentally
x86-specific to the implementation of this system call.

> 
> > +/* System call membarrier "flags" argument. */
> > +enum {
> > +	/*
> > +	 * Query whether the rest of the specified flags are supported,
> > +	 * without performing synchronization.
> > +	 */
> 
> Docbook has support for enums.

OK, how about the following ?

/**
 * enum membarrier_flags - membarrier system call "flags" argument bitmask.
 *
 * Bitmask of flags to be passed to the membarrier system call. The "flags"
 * parameter can be either 0 or many of those flags combined with a bitwise
 * "or".
 */

> 
> > +	MEMBARRIER_QUERY = (1 << 31),
> > +};
> 
> Why is this an anonymous enum?

Because I used a "int" type for the syscall flags argument, so I did
not need to name it. Giving it a name does indeed help making the
documentation clearer anyway, so I'm tempted to give it a name.

> 
> > +#ifdef CONFIG_SMP
> 
> So documentation is SMP only, right?

Yeah, those embedded folks still doing UP don't need documentation. ;-)

I'll fix it.

> 
> > +/*
> 
> Docbook comments start with "/**"

Fixed.

> 
> > + * sys_membarrier - issue memory barrier on all running threads
> > + * @flags: MEMBARRIER_QUERY:
> > + *             Query whether the rest of the specified flags are
> > supported,
> > + *             without performing synchronization.
> 
>       @flags:	Explain what this is for, not what a particular implemented
>       		value is used for. The values should be proper documented
> 		in the enum
> 
> Why is this an int and not a named enum?

In my reading of C99:

ISO/IEC 9899:TC2
6.7.2.2 Enumeration specifiers

"4 Each enumerated type shall be compatible with char, a signed integer type, or an
unsigned integer type. The choice of type is implementation-defined,108) but shall be
capable of representing the values of all the members of the enumeration. The
enumerated type is incomplete until after the } that terminates the list of enumerator
declarations."

Since this type is used as a system call ABI, I'm worried that a compiler
implementation may compile a user-space program with a enum representation
of a "char", whereas the kernel would expect an integer, thus causing an
ABI issue, where userland would fail to clear the high-order bits of the
register, and a more recent kernel would care about those bits (if we add
new flags in future kernel releases).

> 
> Why is this named flags and not given a descriptive name? If I
> understand your changelog correctly you want to implement other
> synchronization modes than the current synchronize_sched. So mode
> might be a proper name.

If we look at other system calls like open(), "flags" describes a
feature configuration, whereas "mode" is used to specify the
permissions to use when creating the file. What we want to specify
here is more a configuration of the syscall, which fits better with
the existing semantic of open() "flags".

> 
> Why is MEMBARRIER_QUERY not a proper operation mode and simply returns
> the supported modes instead of doing it backwards and asking whether
> a specific value is supported?

That's an interesting way to do it. We'd have to be careful not to
conflict with return values like -ENOSYS then. We could define
MEMBARRIER_QUERY as (1 << 31), which effectively reserves all
signed negative numbers for the QUERY flag, and then the return value
of the QUERY flag could be either a negative value (e.g. -ENOSYS)
or the set of flags supported (bits 0 to 30).

> 
> > + * On uniprocessor systems, this system call simply returns 0 after
> > + * validating the arguments, so user-space knows it is implemented.
> 
> And the exact point of knowing this is?

I guess the alternative approach would be to return -ENOSYS, and let
userspace discover that the kernel only support uniprocessor systems
by other means ?

> 
> > + */
> > +SYSCALL_DEFINE1(membarrier, int, flags)
> > +{
> > +	int retval;
> > +
> > +	retval = membarrier_validate_flags(flags);
> > +	if (retval)
> > +		goto end;
> > +	if (unlikely(flags & MEMBARRIER_QUERY) || num_online_cpus() == 1)
> > +		goto end;
> 
> So why not doing the obvious?
> 
> enum modes {
>      QUERY     = 0,
>      FULL_SYNC = 1 <<  0,
> };
> 
> #define IMPLEMENTED_MODES	(FULL_SYNC)
> 
> 	switch (mode) {
> 	case FULL_SYNC:
>        		synchronize_sched_on_smp();
> 		return 0;
> 	case QUERY:
> 		return IMPLEMENTED_MODES;
> 	}
> 	return -EINVAL;
> 
> And later on you do
> 
>  enum modes {
>      QUERY       = 0,
>      FULL_SYNC   = 1 <<  0,
> +    MAGIC_SYNC  = 1 <<  1,
>  };
> 
> -#define IMPLEMENTED_MODES	(FULL_SYNC)
> +#define IMPLEMENTED_MODES	(FULL_SYNC | MAGIC_SYNC)
> 
> All you need to make sure is that any mode is a power of 2.

I really prefer that the "default" of passing "0" does
the obviously correct behavior for the system call:
issuing a memory barrier over the entire system. Otherwise,
making "0" a no-op could introduce hard-to-find races
if people misuse the system call. Moreover, we need to
reserve negative return values for errors like -ENOSYS, so
we can use the high-order bit for the QUERY flag.

In summary, how about the following:

- "0" is the default, obviously correct flags parameter,
  which does a memory barrier on all processors.
- 1 << 31 is the QUERY flag, which returns a bitmask of
  all flags supported.
- We keep negative return values reserved for errors
  (e.g. -ENOSYS), even for the QUERY flag.

Thanks for the feedback!

Mathieu



> 
> Hmm?
> 
> Thanks,
> 
> 	tglx
> 

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ