lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <9d5fa4ff-9666-6475-7f61-2b45cbd83456@amd.com>
Date:   Tue, 20 Apr 2021 15:38:54 +0700
From:   "Suthikulpanit, Suravee" <suravee.suthikulpanit@....com>
To:     David Coe <david.coe@...e.co.uk>, linux-kernel@...r.kernel.org,
        iommu@...ts.linux-foundation.org
Cc:     joro@...tes.org, will@...nel.org, jsnitsel@...hat.com,
        pmenzel@...gen.mpg.de, Jon.Grimm@....com,
        Tj <ml.linux@...oe.vision>,
        Shuah Khan <skhan@...uxfoundation.org>,
        Alexander Monakov <amonakov@...ras.ru>,
        Alex Hung <1917203@...s.launchpad.net>
Subject: Re: [PATCH 2/2] iommu/amd: Remove performance counter
 pre-initialization test

David / Joerg,

On 4/10/2021 5:03 PM, David Coe wrote:
> 
> The immediately obvious difference is the with the enormous count seen on mem_dte_mis on the older Ryzen 2400G. Will do some RTFM but anyone with comments and insight?
> 
> 841,689,151,202,939       amd_iommu_0/mem_dte_mis/              (33.44%)
> 
> Otherwise, all seems to running smoothly (especially for a distribution still in β). Bravo and many thanks all!

The initial hypothesis is that the issue happens only when users specify more number of events than
the available counters, which Perf will time-multiplex the events onto the counters.

Looking at the Perf and AMD IOMMU PMU multiplexing logic, it requires:
  1. Stop the counter (i.e. set CSOURCE to zero to stop counting)
  2. Save the counter value of the current event
  3. Reload the counter value of the new event (previously saved)
  4. Start the counter (i.e. set CSOURCE to count new events)

The problem here is that when the driver writes zero to CSOURCE register in step 1, this would enable power-gating,
which prevents access to the counter and result in writing/reading value in step 2 and 3.

I have found a system that reproduced this case (w/ unusually large number of count), and debug the issue further.
As a hack, I have tried skipping step 1, and it seems to eliminate this issue. However, this is logically incorrect,
and might result in inaccurate data depending on the events.

Here are the options:
1. Continue to look for workaround for this issue.
2. Find a way to disable event time-multiplexing (e.g. only limit the number of counters to 8)
    if power gating is enabled on the platform.
3. Back to the original logic where we had the pre-init check of the counter vlues, which is still the safest choice
    at the moment unless

Regards,
Suravee

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ