--- Platform : 8 socket (80 Core) Westmere with 1TB RAM. Workload: AIM7-highsystime microbenchmark - 2000 users & 100 jobs per user. Values reported are Jobs Per Minute (Higher is better). The values are average of 3 runs. 1) Native run: -------------- Config 1: 3.7 kernel Config 2: 3.7 + Rik's 1-4 patches ------------------------------------------------------------ 20way 40way 80way ------------------------------------------------------------ Config 1 ~179K ~159K ~146K ------------------------------------------------------------ Config 2 ~180K ~134K ~21K-43K <- high variation! ------------------------------------------------------------ (Note: Used numactl to restrict workload to 2 sockets (20way) and 4 sockets(40way)) ------ 2) KVM run : ------------ Single guest of different sizes (No over commit, NUMA enabled in the guest). Note: This kernel intensive micro benchmark is exposes the PLE handler issue esp. for large guests. Since Raghu's PLE changes are not yet in upstream 'have just run with current PLE handler & then by disabling PLE (ple_gap=0). Config 1 : Host & Guest at 3.7 Config 2 : Host & Guest are at 3.7 + Rik's 1-4 patches -------------------------------------------------------------------------- 20vcpu/128G 40vcpu/256G 80vcpu/512G (on 2 sockets) (on 4 sockets) (on 8 sockets) -------------------------------------------------------------------------- Config 1 ~144K ~39K ~10K -------------------------------------------------------------------------- Config 2 ~143K ~37.5K ~11K -------------------------------------------------------------------------- Config 3 : Host & Guest at 3.7 AND ple_gap=0 Config 4 : Host & Guest are at 3.7 + Rik's 1-4 patches AND ple_gap=0 -------------------------------------------------------------------------- Config 3 ~154K ~131K ~116K -------------------------------------------------------------------------- Config 4 ~151K ~130K ~115K -------------------------------------------------------------------------- (Note: Used numactl to restrict qemu to 2 sockets (20way) and 4 sockets(40way))