summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorLinus Torvalds <torvalds@linux-foundation.org>2020-09-24 08:41:32 -0700
committerGeorg Veichtlbauer <georg@vware.at>2023-07-16 12:47:43 +0200
commit45df1516d04a155794e56e6ded1c4813a3e56048 (patch)
tree57ca84ffa1a48da1b23e23cebb02d948ff73f25b
parent899def5edcd48ff16c0f41624ecf8be77c81a18d (diff)
UPSTREAM: mm: fix misplaced unlock_page in do_wp_page()
Commit 09854ba94c6a ("mm: do_wp_page() simplification") reorganized all the code around the page re-use vs copy, but in the process also moved the final unlock_page() around to after the wp_page_reuse() call. That normally doesn't matter - but it means that the unlock_page() is now done after releasing the page table lock. Again, not a big deal, you'd think. But it turns out that it's very wrong indeed, because once we've released the page table lock, we've basically lost our only reference to the page - the page tables - and it could now be free'd at any time. We do hold the mmap_sem, so no actual unmap() can happen, but madvise can come in and a MADV_DONTNEED will zap the page range - and free the page. So now the page may be free'd just as we're unlocking it, which in turn will usually trigger a "Bad page state" error in the freeing path. To make matters more confusing, by the time the debug code prints out the page state, the unlock has typically completed and everything looks fine again. This all doesn't happen in any normal situations, but it does trigger with the dirtyc0w_child LTP test. And it seems to trigger much more easily (but not expclusively) on s390 than elsewhere, probably because s390 doesn't do the "batch pages up for freeing after the TLB flush" that gives the unlock_page() more time to complete and makes the race harder to hit. Fixes: 09854ba94c6a ("mm: do_wp_page() simplification") Link: https://lore.kernel.org/lkml/a46e9bbef2ed4e17778f5615e818526ef848d791.camel@redhat.com/ Link: https://lore.kernel.org/linux-mm/c41149a8-211e-390b-af1d-d5eee690fecb@linux.alibaba.com/ Reported-by: Qian Cai <cai@redhat.com> Reported-by: Alex Shi <alex.shi@linux.alibaba.com> Bisected-and-analyzed-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Tested-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Change-Id: Idae0eb8d7478ab61de567ca9e446df4cc4dfc2a0
-rw-r--r--mm/memory.c2
1 files changed, 1 insertions, 1 deletions
diff --git a/mm/memory.c b/mm/memory.c
index 8e44f5d2d5f4..0c69908d3eed 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2414,9 +2414,9 @@ static int do_wp_page(struct mm_struct *mm, struct vm_area_struct *vma,
* page count reference, and the page is locked,
* it's dark out, and we're wearing sunglasses. Hit it.
*/
+ unlock_page(old_page);
wp_page_reuse(mm, vma, address, page_table, ptl,
orig_pte, old_page, 0, 0);
- unlock_page(old_page);
return VM_FAULT_WRITE;
} else if (unlikely((vma->vm_flags & (VM_WRITE|VM_SHARED)) ==
(VM_WRITE|VM_SHARED))) {