linux-kernel - Re: [PATCH -v7 2/2] Update ctime and mtime for memory-mapped files

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	22 Jan 2008 05:39:43 +0100
From:	Andi Kleen <andi@...stfloor.org>
To:	Anton Salikhmetov <salikhmetov@...il.com>
Cc:	linux-mm@...ck.org, jakob@...hought.net,
	linux-kernel@...r.kernel.org, valdis.kletnieks@...edu,
	riel@...hat.com, ksm@...dk, staubach@...hat.com,
	jesper.juhl@...il.com, torvalds@...l.org
Subject: Re: [PATCH -v7 2/2] Update ctime and mtime for memory-mapped files

Anton Salikhmetov <salikhmetov@...il.com> writes:

You should probably put your design document somewhere in Documentation
with a patch.

> + * Scan the PTEs for pages belonging to the VMA and mark them read-only.
> + * It will force a pagefault on the next write access.
> + */
> +static void vma_wrprotect(struct vm_area_struct *vma)
> +{
> +	unsigned long addr;
> +
> +	for (addr = vma->vm_start; addr < vma->vm_end; addr += PAGE_SIZE) {
> +		spinlock_t *ptl;
> +		pgd_t *pgd = pgd_offset(vma->vm_mm, addr);
> +		pud_t *pud = pud_offset(pgd, addr);
> +		pmd_t *pmd = pmd_offset(pud, addr);
> +		pte_t *pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl);

This means on i386 with highmem ptes you will map/flush tlb/unmap each
PTE individually. You will do 512 times as much work as really needed
per PTE leaf page.

The performance critical address space walkers use a different design
pattern that avoids this.

> +		if (pte_dirty(*pte) && pte_write(*pte)) {
> +			pte_t entry = ptep_clear_flush(vma, addr, pte);

Flushing TLBs unbatched can also be very expensive because if the MM is
shared by several CPUs you'll have a inter-processor interrupt for 
each iteration. They are quite costly even on smaller systems.

It would be better if you did a single flush_tlb_range() at the end.
This means on x86 this will currently always do a full flush, but that's
still better than really slowing down in the heavily multithreaded case.

-Andi
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/