lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20130502083856.GA27380@gmail.com>
Date:	Thu, 2 May 2013 10:38:56 +0200
From:	Ingo Molnar <mingo@...nel.org>
To:	Andi Kleen <andi@...stfloor.org>
Cc:	mingo@...e.hu, linux-kernel@...r.kernel.org,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	Arnaldo Carvalho de Melo <acme@...radead.org>,
	Thomas Gleixner <tglx@...utronix.de>,
	"H. Peter Anvin" <hpa@...or.com>,
	Andrew Morton <akpm@...ux-foundation.org>
Subject: Re: Basic perf PMU support for Haswell v11


[ FYI, we are still in the merge window when maintainers are very busy, so
  don't expect quick replies to mails that are not about merge window
  related patches and commits. Those issues are typically handled after
  -rc1 has been released, once most of the merge fallout in the upstream
  kernel has been resolved. ]

* Andi Kleen <andi@...stfloor.org> wrote:

> > How well was this 
> > patch-set tested on non-Haswell hardware, which makes up 99.99% of our 
> > installed base?
> 
> I tested on a couple systems now and then: usually Haswell, IvyBridge,
> sometimes also Westmere and Atom. I don't retest every iteration,
> as you know most of the changes you're requesting don't affect
> the binary.
> 
> My test bed is likely to be smaller than yours though and as usual
> as you well know some part of the kernel QA is after release.
> 
> > 
> > In particular, after applying your patches, 'perf top' stopped working on 
> > an Intel testbox of mine:
> > 
> >   processor       : 15
> >   vendor_id       : GenuineIntel
> >   cpu family      : 6
> >   model           : 26
> >   model name      : Intel(R) Xeon(R) CPU           X55600 @ 2.80GHz
> 
> I assume the second 0 is a typo?

Probably a typo in the BIOS.

> >   stepping        : 5
> 
> > 'perf top' just does not produce any profiling output - it says 0 events.
> 
> Thanks for testing.
> 
> I found a similar system (not same stepping, but same model) and tested
> perf top works fine here. Also on a couple of other systems.
> 
> Since I cannot reproduce I would need your help debugging it.
> 
> I assume it worked before my patches.

Yes, obviously.

Here's another easy to test symptom of the bug:

 $ perf record ./hackbench 10
 Time: 0.097
 [ perf record: Woken up 1 times to write data ]
 [ perf record: Captured and wrote 0.043 MB perf.data (~1866 samples) ]

 $ perf report --stdio
 Error:
 The perf.data file has no samples!

Expected result is a profile displayed by 'perf report'.

> [...]  If you don't know please double check. Also I assume there's no 
> general problem between the user land perf you used and the kernel.
> 
> The only patch I could think of which may affect other systems
> is the moving of the APIC ack.

Btw., I warned you about the delicate placement of the APIC ACK in my 
Haswell patches review feedback mail, months ago:

  https://lkml.org/lkml/2013/2/13/78

which mail you never replied to and which warning you apparently ignored. 

When modifying the PMU ack sequence, please find the relevant Intel SDM 
that recommends a different ACK sequence from what is implemented 
currently, and document this in the changelog.

I'm going to ignore your APIC ACK patch until you do it properly.

> So does it work if you revert 
> 
>  perf, x86: Move NMI clearing to end of PMI handler after ...
> 
> If that is it we could white list it for Haswell.

No, reverting that patch did not fix the bug.

I have bisected it down to this patch of yours:

   "perf/x86: Add Haswell PMU support"

Most of that patch has no effect on non-Haswell machines, so the scope of 
problematic changes should be pretty small.

My quick guess is that your patch broke fixed counters.

If you find the bug or want me to test anything please send a delta patch, 
relative to your last series - as I have parts of your patches applied 
already locally with cleanups, etc.

Thanks,

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ