linux-kernel - Re: RFC: revert request for cpuidle patches e11538d1 and 69a37bea

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20130729165947.GB31525@jeder.rdu.redhat.com>
Date:	Mon, 29 Jul 2013 12:59:47 -0400
From:	Jeremy Eder <jeder@...hat.com>
To:	Youquan Song <youquan.song@...ux.intel.com>
Cc:	Jeremy Eder <jeder@...hat.com>, linux-kernel@...r.kernel.org,
	rafael.j.wysocki@...el.com, riel@...hat.com,
	youquan.song@...el.com, paulmck@...ux.vnet.ibm.com,
	daniel.lezcano@...aro.org, arjan@...ux.intel.com,
	len.brown@...el.com
Subject: Re: RFC:  revert request for cpuidle patches e11538d1 and 69a37bea

On 130729 23:57:31, Youquan Song wrote:
> Hi Jeremy,
> 
> I try reproduce your result and then fix the issue, but I do not reproduce it
>  yet.
> 
> I run at netperf-2.6.0 at one machine as server: netserver, other
> machine: netperf -t TCP_RR -H $SERVER_IP -l 60. The target machine is
> used in both client and server. I do not reproduce the performance drop
> issue. I also notice the result is not stable, sometime it is high,
> sometime is low. In sumarry, it is hard to make a definite result.
> 
> Can you try tell me how to reproduce the issue? how do you get the C0
> data?
> 
> What's your config for kernel?  Do you enable CONFIG_NO_HZ_FULL=y or
> only CONFIG_NO_HZ=y?
> 
> 
> Thanks
> -Youquan 

Hi,

To answer both your and Daniel's question, those results used only
CONFIG_NO_HZ=y.

These network latency benchmarks are fickle creatures, and need careful
tuning to become reproducible.  Plus there are BIOS implications and tuning
varies by vendor.

Anyway for the most part it's probably not stable because in order to get
any sort
of reproducibility between runs you need to do at least these steps:

- ensure as little is running in userspace as possible
- determine PCI affinity for the NIC
- on both machines, isolate the socket connected to the NIC from userspace
  tasks
- Turn off irqbalance and bind all IRQs for that NIC to a single core on
  the same socket as the NIC
- run netperf with -TX,Y where X,Y are core numbers that you wish
  netperf/netserver to run on, respectively.

For example, if your NIC is attached to socket 0 and socket 0 cores are
enumerated 0-7, then:

- set /proc/irq/NNN/smp_affinity_list to, say, 6 for all vectors on that
  NIC.
- nice -20 netperf -t TCP_RR - $SERVER_IP -l 60 -T4,4 -s 2

That should get you most of the way there.  The -s 2 connects and waits 2
seconds, I found this to help with the first few second's worth of data.
Or
you could just toss the first 2 seconds worth, it seems to take that long
to stabilize.  What I mean is, if you're not using -D1,1 option to netperf,
you might not have seen that netperf tests seem to take a few seconds to
stabilize even
when properly tuned.

I got the C0 data by running turbostat in parallel with each benchmark run,
then grabbing the C-state data for the cores relevant to the test.  In my
case that was cores 4 and 6, where core 4 was where I put netperf/netserver
and core 6 was where I put the NIC IRQs.  Then I parsed that output into a
format that this could interpret:

https://github.com/bitly/data_hacks/blob/master/data_hacks/histogram.py

I'm building a kernel from Rafael's tree and will try to confirm what Len
already sent.  Thanks everyone for looking into it.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/