lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ff17a13f-ccc2-fc39-7731-6d794c7dd980@collabora.com>
Date:   Mon, 22 May 2023 15:24:56 +0500
From:   Muhammad Usama Anjum <usama.anjum@...labora.com>
To:     Peter Xu <peterx@...hat.com>, linux-mm@...ck.org
Cc:     Muhammad Usama Anjum <usama.anjum@...labora.com>,
        Paul Gofman <pgofman@...eweavers.com>,
        Alexander Viro <viro@...iv.linux.org.uk>,
        Shuah Khan <shuah@...nel.org>,
        Christian Brauner <brauner@...nel.org>,
        Yang Shi <shy828301@...il.com>,
        Vlastimil Babka <vbabka@...e.cz>,
        "Liam R . Howlett" <Liam.Howlett@...cle.com>,
        Yun Zhou <yun.zhou@...driver.com>,
        Cyrill Gorcunov <gorcunov@...il.com>,
        Michał Mirosław <emmir@...gle.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Suren Baghdasaryan <surenb@...gle.com>,
        Andrei Vagin <avagin@...il.com>,
        Alex Sierra <alex.sierra@....com>,
        Matthew Wilcox <willy@...radead.org>,
        Pasha Tatashin <pasha.tatashin@...een.com>,
        Danylo Mocherniuk <mdanylo@...gle.com>,
        Axel Rasmussen <axelrasmussen@...gle.com>,
        "Gustavo A . R . Silva" <gustavoars@...nel.org>,
        David Hildenbrand <david@...hat.com>,
        Dan Williams <dan.j.williams@...el.com>,
        linux-kernel@...r.kernel.org, Mike Rapoport <rppt@...nel.org>,
        linux-fsdevel@...r.kernel.org, linux-kselftest@...r.kernel.org,
        Greg KH <gregkh@...uxfoundation.org>, kernel@...labora.com,
        Nadav Amit <namit@...are.com>
Subject: Re: [PATCH RESEND v15 2/5] fs/proc/task_mmu: Implement IOCTL to get
 and optionally clear info about PTEs

On 4/26/23 7:13 PM, Peter Xu wrote:
> Hi, Muhammad,
> 
> On Wed, Apr 26, 2023 at 12:06:23PM +0500, Muhammad Usama Anjum wrote:
>> On 4/20/23 11:01 AM, Muhammad Usama Anjum wrote:
>>> +/* Supported flags */
>>> +#define PM_SCAN_OP_GET	(1 << 0)
>>> +#define PM_SCAN_OP_WP	(1 << 1)
>> We have only these flag options available in PAGEMAP_SCAN IOCTL.
>> PM_SCAN_OP_GET must always be specified for this IOCTL. PM_SCAN_OP_WP can
>> be specified as need. But PM_SCAN_OP_WP cannot be specified without
>> PM_SCAN_OP_GET. (This was removed after you had asked me to not duplicate
>> functionality which can be achieved by UFFDIO_WRITEPROTECT.)
>>
>> 1) PM_SCAN_OP_GET | PM_SCAN_OP_WP
>> vs
>> 2) UFFDIO_WRITEPROTECT
>>
>> After removing the usage of uffd_wp_range() from PAGEMAP_SCAN IOCTL, we are
>> getting really good performance which is comparable just like we are
>> depending on SOFT_DIRTY flags in the PTE. But when we want to perform wp,
>> PM_SCAN_OP_GET | PM_SCAN_OP_WP is more desirable than UFFDIO_WRITEPROTECT
>> performance and behavior wise.
>>
>> I've got the results from someone else that UFFDIO_WRITEPROTECT block
>> pagefaults somehow which PAGEMAP_IOCTL doesn't. I still need to verify this
>> as I don't have tests comparing them one-to-one.
>>
>> What are your thoughts about it? Have you thought about making
>> UFFDIO_WRITEPROTECT perform better?
>>
>> I'm sorry to mention the word "performance" here. Actually we want better
>> performance to emulate Windows syscall. That is why we are adding this
>> functionality. So either we need to see what can be improved in
>> UFFDIO_WRITEPROTECT or can I please add only PM_SCAN_OP_WP back in
>> pagemap_ioctl?
> 
> I'm fine if you want to add it back if it works for you.  Though before
> that, could you remind me why there can be a difference on performance?
I've looked at the code again and I think I've found something. Lets look
at exact performance numbers:

I've run 2 different tests. In first test UFFDIO_WRITEPROTECT is being used
for engaging WP. In second test PM_SCAN_OP_WP is being used. I've measured
the average write time to the same memory which is being WP-ed and total
time of execution of these APIs:

**avg write time:**
| No of pages            | 2000 | 8192 | 100000 | 500000 |
|------------------------|------|------|--------|--------|
| UFFDIO_WRITEPROTECT    | 2200 | 2300 | 4100   | 4200   |
| PM_SCAN_OP_WP          | 2000 | 2300 | 2500   | 2800   |

**Execution time measured in rdtsc:**
| No of pages            | 2000 | 8192  | 100000 | 500000 |
|------------------------|------|-------|--------|--------|
| UFFDIO_WRITEPROTECT    | 3200 | 14000 | 59000  | 58000  |
| PM_SCAN_OP_WP          | 1900 | 7000  | 38000  | 40000  |

Avg write time for UFFDIO_WRITEPROTECT is 1.3 times slow. The execution
time is 1.5 times slower in the case of UFFDIO_WRITEPROTECT. So
UFFDIO_WRITEPROTECT is making writes slower to the pages and execution time
is also slower.

This proves that PM_SCAN_OP_WP is better than UFFDIO_WRITEPROTECT. Although
PM_SCAN_OP_WP and UFFDIO_WRITEPROTECT have been implemented differently. We
should have seen no difference in performance. But we have quite a lot of
difference in performance here. PM_SCAN_OP_WP takes read mm lock, uses
walk_page_range() to walk over pages which finds VMAs from address ranges
to walk over them and pagemap_scan_pmd_entry() is handling most of the work
including tlb flushing. UFFDIO_WRITEPROTECT is also taking the mm lock and
iterating from all the different page directories until a pte is found and
then flags are updated there and tlb is flushed for every pte.

My next deduction would be that we are getting worse performance as we are
flushing tlb for one page at a time in case of UFFDIO_WRITEPROTECT. While
we flush tlb for 512 pages (moslty) at a time in case of PM_SCAN_OP_WP.
I've just verified this by adding some logs to the change_pte_range() and
pagemap_scan_pmd_entry(). Logs are attached. I've allocated memory of 1000
pages and write-protected it with UFFDIO_WRITEPROTECT and PM_SCAN_OP_WP.
The logs show that UFFDIO_WRITEPROTECT has flushed tlb 1000 times of size 1
page each time. While PM_SCAN_OP_WP has flushed only 3 times of bigger
sizes. I've learned over my last experience that tlb flush is very
expensive. Probably this is what we need to improve if we don't want to add
PM_SCAN_OP_WP?

The UFFDIO_WRITEPROTECT uses change_pte_range() which is very generic
function and I'm not sure if can try to not do tlb flushes if uffd_wp is
true. We can try to do flush somewhere else and hopefully we should do only
one flush if possible. It will not be so straight forward to move away from
generic fundtion. Thoughts?


> Thanks,
> 

-- 
BR,
Muhammad Usama Anjum
View attachment "log" of type "text/plain" (50192 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ