From: Roland McGrath get_user_pages can force a fault to succeed despite the vma's protections. This is used in access_process_vm, so that ptrace can write an unwritable page, or read an unreadable but present page. The problem is that the page table entry is then left with the escalated permissions, so a benign ptrace call can cause a normal user access that should fault not to. A simple test case is to run this program under gdb: main() { puts ("hi there"); *(volatile int *) &main = 0; puts ("should be dead now"); return 22; } do: $ gdb ./foo (gdb) break main (gdb) run (gdb) cont With existing Linux kernels, the program will print: hi there should be dead now and exit normally. However, the second line of main obviously should fault. The following patch makes it do so. There are real-world scenarios where this has bitten people. For example, some garbage collectors use mprotect to ensure that they get known SIGSEGV faults for accessing certain pages during collection. When dealing with such a program (e.g. a JVM), just examining the data at certain addresses in the debugger can completely break the normal behavior of the program. No fun for the poor bastard trying to debug such a garbage collector. As the comment says, this is not airtight, i.e. atomic for racing accesses. But it is perfectly adequate for the reasonable debugger scenario where everything else that might use that user page is stopped while access_process_vm gets called. Previously there was a window where the protections behaved wrong, from after the first ptrace-induced fault until perhaps forever (page out). Now there is a window from after the fault until a tiny bit later. I think that's a satisfactory improvement. --- 25-akpm/mm/memory.c | 38 ++++++++++++++++++++++++++++++++++++++ 1 files changed, 38 insertions(+) diff -puN mm/memory.c~get_user_pages-restore-protections mm/memory.c --- 25/mm/memory.c~get_user_pages-restore-protections Mon Feb 2 13:03:06 2004 +++ 25-akpm/mm/memory.c Mon Feb 2 13:04:32 2004 @@ -690,6 +690,28 @@ static inline struct page *get_page_map( return page; } +/* + * Reset the page table entry for the given address after faulting in a page. + * We restore the protections indicated by its vma. + */ +static void +restore_page_prot(struct mm_struct *mm, struct vm_area_struct *vma, + unsigned long address) +{ + pgd_t *pgd = pgd_offset(mm, address); + pmd_t *pmd = pmd_alloc(mm, pgd, address); + pte_t *pte; + + if (!pmd) + return; + pte = pte_alloc_map(mm, pmd, address); + if (!pte) + return; + flush_cache_page(vma, address); + ptep_establish(vma, address, pte, pte_modify(*pte, vma->vm_page_prot)); + update_mmu_cache(vma, address, entry); + pte_unmap(pte); +} int get_user_pages(struct task_struct *tsk, struct mm_struct *mm, unsigned long start, int len, int write, int force, @@ -769,6 +791,22 @@ int get_user_pages(struct task_struct *t } spin_lock(&mm->page_table_lock); } + if (force && + !(vma->vm_flags & (write ? VM_WRITE : VM_READ))) + /* + * We forced a fault for protections the + * page does not ordinarily allow. Restore + * the proper protections so that improper + * user accesses are not allowed just + * because we accessed it from the side. + * Note that there is a window here where + * user access could exploit the temporary + * page proections. Tough beans. You just + * have prevent user code from running while + * you are accessing memory via ptrace if you + * want it to get the proper faults. + */ + restore_page_prot(mm, vma, start); if (pages) { pages[i] = get_page_map(map); if (!pages[i]) { _