lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <7889af4b7bb84823aca1732fb0d14de5@AcuMS.aculab.com>
Date:   Wed, 7 Sep 2022 11:13:54 +0000
From:   David Laight <David.Laight@...LAB.COM>
To:     'Peter Zijlstra' <peterz@...radead.org>
CC:     "Masami Hiramatsu (Google)" <mhiramat@...nel.org>,
        Steven Rostedt <rostedt@...dmis.org>,
        Ingo Molnar <mingo@...nel.org>,
        Suleiman Souhlal <suleiman@...gle.com>,
        bpf <bpf@...r.kernel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "Borislav Petkov" <bp@...e.de>,
        Josh Poimboeuf <jpoimboe@...nel.org>,
        "x86@...nel.org" <x86@...nel.org>
Subject: RE: [PATCH] objtool,x86: Teach decode about LOOP* instructions

From: Peter Zijlstra
> Sent: 07 September 2022 10:40
> 
> On Wed, Sep 07, 2022 at 09:06:12AM +0000, David Laight wrote:
> > From: Peter Zijlstra
> > > Sent: 07 September 2022 10:01
> > >
> > > On Wed, Sep 07, 2022 at 09:06:45AM +0200, Peter Zijlstra wrote:
> > > > On Wed, Sep 07, 2022 at 09:55:21AM +0900, Masami Hiramatsu (Google) wrote:
> > > >
> > > > > +/* Return the jump target address or 0 */
> > > > > +static inline unsigned long insn_get_branch_addr(struct insn *insn)
> > > > > +{
> > > > > +	switch (insn->opcode.bytes[0]) {
> > > > > +	case 0xe0:	/* loopne */
> > > > > +	case 0xe1:	/* loope */
> > > > > +	case 0xe2:	/* loop */
> > > >
> > > > Oh cute, objtool doesn't know about those, let me go add them.
> >
> > Do they ever appear in the kernel?
> 
> No; that is, not on any of the random vmlinux.o images I checked this
> morning.
> 
> Still, best to properly decode them anyway.

It is annoying that cpu with adox/adcx have slow loop.
You really want to be able to do:
	1:	adox ...
		adcx ...
		loop	1b
That would never run with one iteration/clock.
But unrolling once would probably be enough.

What you can do (and gives the fastest IPcsum loop) is:
	1:	jcxz	2f
		....
		lea	%rcx,...
		jmp	1b
	2:
The extra instructions mean that needs unrolling 4 times.
I've got over 12 bytes/clock that way.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ