linux-kernel - Re: [RFC 1/1] mm: Add per-task struct tlb counters

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20220914142313.GB4422@fastly.com>
Date:   Wed, 14 Sep 2022 07:23:14 -0700
From:   Joe Damato <jdamato@...tly.com>
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     Dave Hansen <dave.hansen@...el.com>, x86@...nel.org,
        linux-mm@...ck.org, Dave Hansen <dave.hansen@...ux.intel.com>,
        Andy Lutomirski <luto@...nel.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
        "H. Peter Anvin" <hpa@...or.com>,
        Juri Lelli <juri.lelli@...hat.com>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        Dietmar Eggemann <dietmar.eggemann@....com>,
        Steven Rostedt <rostedt@...dmis.org>,
        Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
        Daniel Bristot de Oliveira <bristot@...hat.com>,
        Valentin Schneider <vschneid@...hat.com>,
        linux-kernel@...r.kernel.org, linux-fsdevel@...r.kernel.org
Subject: Re: [RFC 1/1] mm: Add per-task struct tlb counters

On Wed, Sep 14, 2022 at 01:58:27PM +0200, Peter Zijlstra wrote:
> On Wed, Sep 14, 2022 at 12:40:55AM -0700, Dave Hansen wrote:
> >  Why didn't the tracepoints work for you?
> 
> This; perf should be able to get you per-task slices of those events.

Thanks for taking a look; I replied to Dave with a longer form response,
but IMHO, tracepoints are helpful in specific circumstances.

On a heavily loaded system with O(10,000) or O(100,000) tasks, tracepoints
can be difficult to use... especially if the TLB shootdown events are
anomalous events that happen in large bursts at unknown intervals and are
difficult to reproduce.

IMHO, I think that being able to periodically scrape /proc to see that a
particular process has a large TLB shootdown storm can then instruct you as
to when to apply perf (and to which specific tasks) in order to debug the
issue.