lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAPY=qRSbJaqOZ30556SWTgrTMegsmdb9-DoM7R66fb=rLWP6eQ@mail.gmail.com>
Date:   Mon, 31 Aug 2020 21:52:47 +0530
From:   Subhashini Rao Beerisetty <subhashbeerisetty@...il.com>
To:     dedekind1@...il.com
Cc:     linux-pm@...r.kernel.org,
        kernelnewbies <kernelnewbies@...nelnewbies.org>,
        LKML <linux-kernel@...r.kernel.org>
Subject: Re: cpu-freq: running the perf increases the data rate?

On Fri, Aug 28, 2020 at 6:04 PM Artem Bityutskiy <dedekind1@...il.com> wrote:
>
> On Thu, 2020-08-27 at 22:25 +0530, Subhashini Rao Beerisetty wrote:
> > I have an application which finds the data rate over the PCIe
> > interface. I’m getting the lesser data rate in one of my Linux X86
> > systems.
>
> Some more description, may be? Do you have a PCIe device reading one
> RAM buffer and then writing to another RAM buffer? Or does it generate
> dome data and writes them to a RAM buffer? Presumably it uses DMA? How
> much is the CPU involved into the process? Are we talking about
> transferring few kilobytes or gigabytes?
Thanks a lot for your help and reply.
Regarding hardware setup, Xilinx PCIe FPGA endpoint is connected to
HOST CPU via PCIe bus.
Xilinx PCIe FPGA endpoint has the DMA_REF block and it provides a
mechanism to DMA transfer data at the maximum rate between host CPU
memory and a FIFO in the DMA-REF block.
The host software sets up some data in it’s memory, it then transfers
the data to the DMA-REF’s FIFO and then reads it back into a different
location in its host memory. This is repeated in a loop.  There is a
register in the DMA-REF block that gives an indication of transfer
speed.


>
> > When I change the scaling_governor from "powersave" to "performance"
> > mode for each CPU, then there is slight improvement in the PCIe data
> > rate.
>
> Definitely this makes your CPU(s) run at max speed, but depending on
> platform and settings, this may also affect C-states. Are the CPU(s)
> generally idle while you measure, or busy (involved into the test)? You
> could run 'turbostat' while measuring the bandwidth, to get some CPU
> statistics (e.g., do C-states happen during the PCI test, how busy are
> the CPUs).
>
> > Parallely I started profiling the workload with perf. Whenever I start
> > running the profile command “perf stat -a -d -p <PID>” surprisingly
> > the application resulted in excellent data rate over PCIe, but when I
> > kill the perf command again PCIe data rate drops. I am really confused
> > about this behavior.Any clues from this behaviour?
>
> Well, one possible reason that comes to mind - you get rid of C-states
> when you rung perf, and this increases the PCI bandwidth. You can just
> try disabling C-states (there are sysfs knobs) and check it out.
> Turbostat could be useful to check for this (with and without perf, run
> 'turbostat sleep 10' or something like this (measure for 10 seconds in
> this example), do this while running your PCI test.
Disabling the C-states improved the throughput a lot, thanks a lot for
pointing this out. Could you please give some more explanation on how
disabling C-states improved the throughput?
As you suggested I collected and attached the turbostat log with and
without perf while running the PCIe test.
In my system, only 'performance\powersave' are listed in
scaling_available_governors. Rest other governors
"userspace\ondemand\schedutil" are not listed in available_goverors.
What might be the reason for this?

>
> But I am really just guessing here, I do not know enough about your
> test and the system (e.g., "a Linux x86" system can be so many things,
> like Intel or AMD server or a mobile device)…
It's an Intel Atom processor.
>
>

View attachment "trubostat_with_perf.txt" of type "text/plain" (2633 bytes)

View attachment "trubostat_without_perf.txt" of type "text/plain" (2642 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ