From 5e9113731a3ce616e8b5aa128ffc1aeaa4942571 Mon Sep 17 00:00:00 2001 From: Mike Kravetz Date: Tue, 8 Sep 2015 15:01:28 -0700 Subject: mm/hugetlb: add cache of descriptors to resv_map for region_add hugetlbfs is used today by applications that want a high degree of control over huge page usage. Often, large hugetlbfs files are used to map a large number huge pages into the application processes. The applications know when page ranges within these large files will no longer be used, and ideally would like to release them back to the subpool or global pools for other uses. The fallocate() system call provides an interface for preallocation and hole punching within files. This patch set adds fallocate functionality to hugetlbfs. fallocate hole punch will want to remove a specific range of pages. When pages are removed, their associated entries in the region/reserve map will also be removed. This will break an assumption in the region_chg/region_add calling sequence. If a new region descriptor must be allocated, it is done as part of the region_chg processing. In this way, region_add can not fail because it does not need to attempt an allocation. To prepare for fallocate hole punch, create a "cache" of descriptors that can be used by region_add if necessary. region_chg will ensure there are sufficient entries in the cache. It will be necessary to track the number of in progress add operations to know a sufficient number of descriptors reside in the cache. A new routine region_abort is added to adjust this in progress count when add operations are aborted. vma_abort_reservation is also added for callers creating reservations with vma_needs_reservation/vma_commit_reservation. [akpm@linux-foundation.org: fix typo in comment, use more cols] Signed-off-by: Mike Kravetz Reviewed-by: Naoya Horiguchi Acked-by: Hillf Danton Cc: Dave Hansen Cc: David Rientjes Cc: Hugh Dickins Cc: Davidlohr Bueso Cc: Aneesh Kumar Cc: Christoph Hellwig Cc: Michal Hocko Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/hugetlb.h | 3 +++ 1 file changed, 3 insertions(+) (limited to 'include/linux/hugetlb.h') diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index d891f949466a..e2d94960b38b 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -35,6 +35,9 @@ struct resv_map { struct kref refs; spinlock_t lock; struct list_head regions; + long adds_in_progress; + struct list_head region_cache; + long region_cache_count; }; extern struct resv_map *resv_map_alloc(void); void resv_map_release(struct kref *ref); -- cgit v1.2.3 From c672c7f29f2fdb73e1f72911bf499675c81fcdbb Mon Sep 17 00:00:00 2001 From: Mike Kravetz Date: Tue, 8 Sep 2015 15:01:35 -0700 Subject: mm/hugetlb: expose hugetlb fault mutex for use by fallocate hugetlb page faults are currently synchronized by the table of mutexes (htlb_fault_mutex_table). fallocate code will need to synchronize with the page fault code when it allocates or deletes pages. Expose interfaces so that fallocate operations can be synchronized with page faults. Minor name changes to be more consistent with other global hugetlb symbols. Signed-off-by: Mike Kravetz Reviewed-by: Naoya Horiguchi Acked-by: Hillf Danton Cc: Dave Hansen Cc: David Rientjes Cc: Hugh Dickins Cc: Davidlohr Bueso Cc: Aneesh Kumar Cc: Christoph Hellwig Cc: Michal Hocko Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/hugetlb.h | 5 +++++ 1 file changed, 5 insertions(+) (limited to 'include/linux/hugetlb.h') diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index e2d94960b38b..530cf6fc24c7 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -88,6 +88,11 @@ int dequeue_hwpoisoned_huge_page(struct page *page); bool isolate_huge_page(struct page *page, struct list_head *list); void putback_active_hugepage(struct page *page); void free_huge_page(struct page *page); +extern struct mutex *hugetlb_fault_mutex_table; +u32 hugetlb_fault_mutex_hash(struct hstate *h, struct mm_struct *mm, + struct vm_area_struct *vma, + struct address_space *mapping, + pgoff_t idx, unsigned long address); #ifdef CONFIG_ARCH_WANT_HUGE_PMD_SHARE pte_t *huge_pmd_share(struct mm_struct *mm, unsigned long addr, pud_t *pud); -- cgit v1.2.3 From b5cec28d36f5ee6b4e6f68a0a40aa1e4045d6d99 Mon Sep 17 00:00:00 2001 From: Mike Kravetz Date: Tue, 8 Sep 2015 15:01:41 -0700 Subject: hugetlbfs: truncate_hugepages() takes a range of pages Modify truncate_hugepages() to take a range of pages (start, end) instead of simply start. If an end value of LLONG_MAX is passed, the current "truncate" functionality is maintained. Existing callers are modified to pass LLONG_MAX as end of range. By keying off end == LLONG_MAX, the routine behaves differently for truncate and hole punch. Page removal is now synchronized with page allocation via faults by using the fault mutex table. The hole punch case can experience the rare region_del error and must handle accordingly. Add the routine hugetlb_fix_reserve_counts to fix up reserve counts in the case where region_del returns an error. Since the routine handles more than just the truncate case, it is renamed to remove_inode_hugepages(). To be consistent, the routine truncate_huge_page() is renamed remove_huge_page(). Downstream of remove_inode_hugepages(), the routine hugetlb_unreserve_pages() is also modified to take a range of pages. hugetlb_unreserve_pages is modified to detect an error from region_del and pass it back to the caller. Signed-off-by: Mike Kravetz Reviewed-by: Naoya Horiguchi Acked-by: Hillf Danton Cc: Dave Hansen Cc: David Rientjes Cc: Hugh Dickins Cc: Davidlohr Bueso Cc: Aneesh Kumar Cc: Christoph Hellwig Cc: Michal Hocko Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/hugetlb.h | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) (limited to 'include/linux/hugetlb.h') diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 530cf6fc24c7..35afca1692fb 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -83,11 +83,13 @@ int hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, int hugetlb_reserve_pages(struct inode *inode, long from, long to, struct vm_area_struct *vma, vm_flags_t vm_flags); -void hugetlb_unreserve_pages(struct inode *inode, long offset, long freed); +long hugetlb_unreserve_pages(struct inode *inode, long start, long end, + long freed); int dequeue_hwpoisoned_huge_page(struct page *page); bool isolate_huge_page(struct page *page, struct list_head *list); void putback_active_hugepage(struct page *page); void free_huge_page(struct page *page); +void hugetlb_fix_reserve_counts(struct inode *inode, bool restore_reserve); extern struct mutex *hugetlb_fault_mutex_table; u32 hugetlb_fault_mutex_hash(struct hstate *h, struct mm_struct *mm, struct vm_area_struct *vma, -- cgit v1.2.3 From ab76ad540a50191308e5bb6b5e2d9e26c78616d3 Mon Sep 17 00:00:00 2001 From: Mike Kravetz Date: Tue, 8 Sep 2015 15:01:50 -0700 Subject: hugetlbfs: New huge_add_to_page_cache helper routine Currently, there is only a single place where hugetlbfs pages are added to the page cache. The new fallocate code be adding a second one, so break the functionality out into its own helper. Signed-off-by: Dave Hansen Signed-off-by: Mike Kravetz Reviewed-by: Naoya Horiguchi Acked-by: Hillf Danton Cc: David Rientjes Cc: Hugh Dickins Cc: Davidlohr Bueso Cc: Aneesh Kumar Cc: Christoph Hellwig Cc: Michal Hocko Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/hugetlb.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux/hugetlb.h') diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 35afca1692fb..1222fb07a746 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -333,6 +333,8 @@ struct huge_bootmem_page { struct page *alloc_huge_page_node(struct hstate *h, int nid); struct page *alloc_huge_page_noerr(struct vm_area_struct *vma, unsigned long addr, int avoid_reserve); +int huge_add_to_page_cache(struct page *page, struct address_space *mapping, + pgoff_t idx); /* arch callback */ int __init alloc_bootmem_huge_page(struct hstate *h); -- cgit v1.2.3 From 70c3547e36f5c9fbc4caecfeca98f0effa6932c5 Mon Sep 17 00:00:00 2001 From: Mike Kravetz Date: Tue, 8 Sep 2015 15:01:54 -0700 Subject: hugetlbfs: add hugetlbfs_fallocate() This is based on the shmem version, but it has diverged quite a bit. We have no swap to worry about, nor the new file sealing. Add synchronication via the fault mutex table to coordinate page faults, fallocate allocation and fallocate hole punch. What this allows us to do is move physical memory in and out of a hugetlbfs file without having it mapped. This also gives us the ability to support MADV_REMOVE since it is currently implemented using fallocate(). MADV_REMOVE lets madvise() remove pages from the middle of a hugetlbfs file, which wasn't possible before. hugetlbfs fallocate only operates on whole huge pages. Based on code by Dave Hansen. Signed-off-by: Mike Kravetz Reviewed-by: Naoya Horiguchi Acked-by: Hillf Danton Cc: Dave Hansen Cc: David Rientjes Cc: Hugh Dickins Cc: Davidlohr Bueso Cc: Aneesh Kumar Cc: Christoph Hellwig Cc: Michal Hocko Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/hugetlb.h | 3 +++ 1 file changed, 3 insertions(+) (limited to 'include/linux/hugetlb.h') diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 1222fb07a746..5e35379f58a5 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -330,6 +330,8 @@ struct huge_bootmem_page { #endif }; +struct page *alloc_huge_page(struct vm_area_struct *vma, + unsigned long addr, int avoid_reserve); struct page *alloc_huge_page_node(struct hstate *h, int nid); struct page *alloc_huge_page_noerr(struct vm_area_struct *vma, unsigned long addr, int avoid_reserve); @@ -483,6 +485,7 @@ static inline spinlock_t *huge_pte_lockptr(struct hstate *h, #else /* CONFIG_HUGETLB_PAGE */ struct hstate {}; +#define alloc_huge_page(v, a, r) NULL #define alloc_huge_page_node(h, nid) NULL #define alloc_huge_page_noerr(v, a, r) NULL #define alloc_bootmem_huge_page(h) NULL -- cgit v1.2.3