From aac34df11791d25417f7d756dc277b6f95996b47 Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Mon, 9 Sep 2013 07:16:41 -0700 Subject: fs: remove vfs_follow_link For a long time no filesystem has been using vfs_follow_link, and as seen by recent filesystem submissions any new use is accidental as well. Remove vfs_follow_link, document the replacement in Documentation/filesystems/porting and also rename __vfs_follow_link to match its only caller better. Signed-off-by: Christoph Hellwig Signed-off-by: Al Viro --- include/linux/fs.h | 1 - 1 file changed, 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/fs.h b/include/linux/fs.h index 529d8711baba..49e71b0f0e9f 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -2494,7 +2494,6 @@ extern const struct file_operations generic_ro_fops; #define special_file(m) (S_ISCHR(m)||S_ISBLK(m)||S_ISFIFO(m)||S_ISSOCK(m)) extern int vfs_readlink(struct dentry *, char __user *, int, const char *); -extern int vfs_follow_link(struct nameidata *, const char *); extern int page_readlink(struct dentry *, char __user *, int); extern void *page_follow_link_light(struct dentry *, struct nameidata *); extern void page_put_link(struct dentry *, struct nameidata *, void *); -- cgit v1.2.3 From 3942c07ccf98e66b8893f396dca98f5b076f905f Mon Sep 17 00:00:00 2001 From: Glauber Costa Date: Wed, 28 Aug 2013 10:17:53 +1000 Subject: fs: bump inode and dentry counters to long MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit This series reworks our current object cache shrinking infrastructure in two main ways: * Noticing that a lot of users copy and paste their own version of LRU lists for objects, we put some effort in providing a generic version. It is modeled after the filesystem users: dentries, inodes, and xfs (for various tasks), but we expect that other users could benefit in the near future with little or no modification. Let us know if you have any issues. * The underlying list_lru being proposed automatically and transparently keeps the elements in per-node lists, and is able to manipulate the node lists individually. Given this infrastructure, we are able to modify the up-to-now hammer called shrink_slab to proceed with node-reclaim instead of always searching memory from all over like it has been doing. Per-node lru lists are also expected to lead to less contention in the lru locks on multi-node scans, since we are now no longer fighting for a global lock. The locks usually disappear from the profilers with this change. Although we have no official benchmarks for this version - be our guest to independently evaluate this - earlier versions of this series were performance tested (details at http://permalink.gmane.org/gmane.linux.kernel.mm/100537) yielding no visible performance regressions while yielding a better qualitative behavior in NUMA machines. With this infrastructure in place, we can use the list_lru entry point to provide memcg isolation and per-memcg targeted reclaim. Historically, those two pieces of work have been posted together. This version presents only the infrastructure work, deferring the memcg work for a later time, so we can focus on getting this part tested. You can see more about the history of such work at http://lwn.net/Articles/552769/ Dave Chinner (18): dcache: convert dentry_stat.nr_unused to per-cpu counters dentry: move to per-sb LRU locks dcache: remove dentries from LRU before putting on dispose list mm: new shrinker API shrinker: convert superblock shrinkers to new API list: add a new LRU list type inode: convert inode lru list to generic lru list code. dcache: convert to use new lru list infrastructure list_lru: per-node list infrastructure shrinker: add node awareness fs: convert inode and dentry shrinking to be node aware xfs: convert buftarg LRU to generic code xfs: rework buffer dispose list tracking xfs: convert dquot cache lru to list_lru fs: convert fs shrinkers to new scan/count API drivers: convert shrinkers to new count/scan API shrinker: convert remaining shrinkers to count/scan API shrinker: Kill old ->shrink API. Glauber Costa (7): fs: bump inode and dentry counters to long super: fix calculation of shrinkable objects for small numbers list_lru: per-node API vmscan: per-node deferred work i915: bail out earlier when shrinker cannot acquire mutex hugepage: convert huge zero page shrinker to new shrinker API list_lru: dynamically adjust node arrays This patch: There are situations in very large machines in which we can have a large quantity of dirty inodes, unused dentries, etc. This is particularly true when umounting a filesystem, where eventually since every live object will eventually be discarded. Dave Chinner reported a problem with this while experimenting with the shrinker revamp patchset. So we believe it is time for a change. This patch just moves int to longs. Machines where it matters should have a big long anyway. Signed-off-by: Glauber Costa Cc: Dave Chinner Cc: "Theodore Ts'o" Cc: Adrian Hunter Cc: Al Viro Cc: Artem Bityutskiy Cc: Arve Hjønnevåg Cc: Carlos Maiolino Cc: Christoph Hellwig Cc: Chuck Lever Cc: Daniel Vetter Cc: Dave Chinner Cc: David Rientjes Cc: Gleb Natapov Cc: Greg Thelen Cc: J. Bruce Fields Cc: Jan Kara Cc: Jerome Glisse Cc: John Stultz Cc: KAMEZAWA Hiroyuki Cc: Kent Overstreet Cc: Kirill A. Shutemov Cc: Marcelo Tosatti Cc: Mel Gorman Cc: Steven Whitehouse Cc: Thomas Hellstrom Cc: Trond Myklebust Signed-off-by: Andrew Morton Signed-off-by: Al Viro --- include/linux/dcache.h | 10 +++++----- include/linux/fs.h | 4 ++-- 2 files changed, 7 insertions(+), 7 deletions(-) (limited to 'include/linux') diff --git a/include/linux/dcache.h b/include/linux/dcache.h index feaa8d88eef7..844a1ef387e4 100644 --- a/include/linux/dcache.h +++ b/include/linux/dcache.h @@ -55,11 +55,11 @@ struct qstr { #define hashlen_len(hashlen) ((u32)((hashlen) >> 32)) struct dentry_stat_t { - int nr_dentry; - int nr_unused; - int age_limit; /* age in seconds */ - int want_pages; /* pages requested by system */ - int dummy[2]; + long nr_dentry; + long nr_unused; + long age_limit; /* age in seconds */ + long want_pages; /* pages requested by system */ + long dummy[2]; }; extern struct dentry_stat_t dentry_stat; diff --git a/include/linux/fs.h b/include/linux/fs.h index 49e71b0f0e9f..3b3edac75df2 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -1271,12 +1271,12 @@ struct super_block { struct list_head s_mounts; /* list of mounts; _not_ for fs use */ /* s_dentry_lru, s_nr_dentry_unused protected by dcache.c lru locks */ struct list_head s_dentry_lru; /* unused dentry lru */ - int s_nr_dentry_unused; /* # of dentry on lru */ + long s_nr_dentry_unused; /* # of dentry on lru */ /* s_inode_lru_lock protects s_inode_lru and s_nr_inodes_unused */ spinlock_t s_inode_lru_lock ____cacheline_aligned_in_smp; struct list_head s_inode_lru; /* unused inode lru */ - int s_nr_inodes_unused; /* # of inodes on lru */ + long s_nr_inodes_unused; /* # of inodes on lru */ struct block_device *s_bdev; struct backing_dev_info *s_bdi; -- cgit v1.2.3 From 55f841ce9395a72c6285fbcc4c403c0c786e1c74 Mon Sep 17 00:00:00 2001 From: Glauber Costa Date: Wed, 28 Aug 2013 10:17:53 +1000 Subject: super: fix calculation of shrinkable objects for small numbers MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The sysctl knob sysctl_vfs_cache_pressure is used to determine which percentage of the shrinkable objects in our cache we should actively try to shrink. It works great in situations in which we have many objects (at least more than 100), because the aproximation errors will be negligible. But if this is not the case, specially when total_objects < 100, we may end up concluding that we have no objects at all (total / 100 = 0, if total < 100). This is certainly not the biggest killer in the world, but may matter in very low kernel memory situations. Signed-off-by: Glauber Costa Reviewed-by: Carlos Maiolino Acked-by: KAMEZAWA Hiroyuki Acked-by: Mel Gorman Cc: Dave Chinner Cc: Al Viro Cc: "Theodore Ts'o" Cc: Adrian Hunter Cc: Al Viro Cc: Artem Bityutskiy Cc: Arve Hjønnevåg Cc: Carlos Maiolino Cc: Christoph Hellwig Cc: Chuck Lever Cc: Daniel Vetter Cc: David Rientjes Cc: Gleb Natapov Cc: Greg Thelen Cc: J. Bruce Fields Cc: Jan Kara Cc: Jerome Glisse Cc: John Stultz Cc: KAMEZAWA Hiroyuki Cc: Kent Overstreet Cc: Kirill A. Shutemov Cc: Marcelo Tosatti Cc: Mel Gorman Cc: Steven Whitehouse Cc: Thomas Hellstrom Cc: Trond Myklebust Signed-off-by: Andrew Morton Signed-off-by: Al Viro --- include/linux/dcache.h | 4 ++++ 1 file changed, 4 insertions(+) (limited to 'include/linux') diff --git a/include/linux/dcache.h b/include/linux/dcache.h index 844a1ef387e4..59066e0b4ff1 100644 --- a/include/linux/dcache.h +++ b/include/linux/dcache.h @@ -395,4 +395,8 @@ static inline bool d_mountpoint(const struct dentry *dentry) extern int sysctl_vfs_cache_pressure; +static inline unsigned long vfs_pressure_ratio(unsigned long val) +{ + return mult_frac(val, sysctl_vfs_cache_pressure, 100); +} #endif /* __LINUX_DCACHE_H */ -- cgit v1.2.3 From 19156840e33a23eeb1a749c0f991dab6588b077d Mon Sep 17 00:00:00 2001 From: Dave Chinner Date: Wed, 28 Aug 2013 10:17:55 +1000 Subject: dentry: move to per-sb LRU locks MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit With the dentry LRUs being per-sb structures, there is no real need for a global dentry_lru_lock. The locking can be made more fine-grained by moving to a per-sb LRU lock, isolating the LRU operations of different filesytsems completely from each other. The need for this is independent of any performance consideration that may arise: in the interest of abstracting the lru operations away, it is mandatory that each lru works around its own lock instead of a global lock for all of them. [glommer@openvz.org: updated changelog ] Signed-off-by: Dave Chinner Signed-off-by: Glauber Costa Reviewed-by: Christoph Hellwig Acked-by: Mel Gorman Cc: "Theodore Ts'o" Cc: Adrian Hunter Cc: Al Viro Cc: Artem Bityutskiy Cc: Arve Hjønnevåg Cc: Carlos Maiolino Cc: Christoph Hellwig Cc: Chuck Lever Cc: Daniel Vetter Cc: David Rientjes Cc: Gleb Natapov Cc: Greg Thelen Cc: J. Bruce Fields Cc: Jan Kara Cc: Jerome Glisse Cc: John Stultz Cc: KAMEZAWA Hiroyuki Cc: Kent Overstreet Cc: Kirill A. Shutemov Cc: Marcelo Tosatti Cc: Mel Gorman Cc: Steven Whitehouse Cc: Thomas Hellstrom Cc: Trond Myklebust Signed-off-by: Andrew Morton Signed-off-by: Al Viro --- include/linux/fs.h | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/fs.h b/include/linux/fs.h index 3b3edac75df2..14a90f6886fa 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -1269,7 +1269,9 @@ struct super_block { struct list_head s_files; #endif struct list_head s_mounts; /* list of mounts; _not_ for fs use */ - /* s_dentry_lru, s_nr_dentry_unused protected by dcache.c lru locks */ + + /* s_dentry_lru_lock protects s_dentry_lru and s_nr_dentry_unused */ + spinlock_t s_dentry_lru_lock ____cacheline_aligned_in_smp; struct list_head s_dentry_lru; /* unused dentry lru */ long s_nr_dentry_unused; /* # of dentry on lru */ -- cgit v1.2.3 From 24f7c6b981fb70084757382da464ea85d72af300 Mon Sep 17 00:00:00 2001 From: Dave Chinner Date: Wed, 28 Aug 2013 10:17:56 +1000 Subject: mm: new shrinker API MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The current shrinker callout API uses an a single shrinker call for multiple functions. To determine the function, a special magical value is passed in a parameter to change the behaviour. This complicates the implementation and return value specification for the different behaviours. Separate the two different behaviours into separate operations, one to return a count of freeable objects in the cache, and another to scan a certain number of objects in the cache for freeing. In defining these new operations, ensure the return values and resultant behaviours are clearly defined and documented. Modify shrink_slab() to use the new API and implement the callouts for all the existing shrinkers. Signed-off-by: Dave Chinner Signed-off-by: Glauber Costa Acked-by: Mel Gorman Cc: "Theodore Ts'o" Cc: Adrian Hunter Cc: Al Viro Cc: Artem Bityutskiy Cc: Arve Hjønnevåg Cc: Carlos Maiolino Cc: Christoph Hellwig Cc: Chuck Lever Cc: Daniel Vetter Cc: David Rientjes Cc: Gleb Natapov Cc: Greg Thelen Cc: J. Bruce Fields Cc: Jan Kara Cc: Jerome Glisse Cc: John Stultz Cc: KAMEZAWA Hiroyuki Cc: Kent Overstreet Cc: Kirill A. Shutemov Cc: Marcelo Tosatti Cc: Mel Gorman Cc: Steven Whitehouse Cc: Thomas Hellstrom Cc: Trond Myklebust Signed-off-by: Andrew Morton Signed-off-by: Al Viro --- include/linux/shrinker.h | 38 +++++++++++++++++++++++++++++--------- 1 file changed, 29 insertions(+), 9 deletions(-) (limited to 'include/linux') diff --git a/include/linux/shrinker.h b/include/linux/shrinker.h index ac6b8ee07825..884e76222e1b 100644 --- a/include/linux/shrinker.h +++ b/include/linux/shrinker.h @@ -4,6 +4,12 @@ /* * This struct is used to pass information from page reclaim to the shrinkers. * We consolidate the values for easier extention later. + * + * The 'gfpmask' refers to the allocation we are currently trying to + * fulfil. + * + * Note that 'shrink' will be passed nr_to_scan == 0 when the VM is + * querying the cache size, so a fastpath for that case is appropriate. */ struct shrink_control { gfp_t gfp_mask; @@ -12,23 +18,37 @@ struct shrink_control { unsigned long nr_to_scan; }; +#define SHRINK_STOP (~0UL) /* * A callback you can register to apply pressure to ageable caches. * - * 'sc' is passed shrink_control which includes a count 'nr_to_scan' - * and a 'gfpmask'. It should look through the least-recently-used - * 'nr_to_scan' entries and attempt to free them up. It should return - * the number of objects which remain in the cache. If it returns -1, it means - * it cannot do any scanning at this time (eg. there is a risk of deadlock). + * @shrink() should look through the least-recently-used 'nr_to_scan' entries + * and attempt to free them up. It should return the number of objects which + * remain in the cache. If it returns -1, it means it cannot do any scanning at + * this time (eg. there is a risk of deadlock). * - * The 'gfpmask' refers to the allocation we are currently trying to - * fulfil. + * @count_objects should return the number of freeable items in the cache. If + * there are no objects to free or the number of freeable items cannot be + * determined, it should return 0. No deadlock checks should be done during the + * count callback - the shrinker relies on aggregating scan counts that couldn't + * be executed due to potential deadlocks to be run at a later call when the + * deadlock condition is no longer pending. * - * Note that 'shrink' will be passed nr_to_scan == 0 when the VM is - * querying the cache size, so a fastpath for that case is appropriate. + * @scan_objects will only be called if @count_objects returned a non-zero + * value for the number of freeable objects. The callout should scan the cache + * and attempt to free items from the cache. It should then return the number + * of objects freed during the scan, or SHRINK_STOP if progress cannot be made + * due to potential deadlocks. If SHRINK_STOP is returned, then no further + * attempts to call the @scan_objects will be made from the current reclaim + * context. */ struct shrinker { int (*shrink)(struct shrinker *, struct shrink_control *sc); + unsigned long (*count_objects)(struct shrinker *, + struct shrink_control *sc); + unsigned long (*scan_objects)(struct shrinker *, + struct shrink_control *sc); + int seeks; /* seeks to recreate an obj */ long batch; /* reclaim batch size, 0 = default */ -- cgit v1.2.3 From 0a234c6dcb79a270803f5c9773ed650b78730962 Mon Sep 17 00:00:00 2001 From: Dave Chinner Date: Wed, 28 Aug 2013 10:17:57 +1000 Subject: shrinker: convert superblock shrinkers to new API MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Convert superblock shrinker to use the new count/scan API, and propagate the API changes through to the filesystem callouts. The filesystem callouts already use a count/scan API, so it's just changing counters to longs to match the VM API. This requires the dentry and inode shrinker callouts to be converted to the count/scan API. This is mainly a mechanical change. [glommer@openvz.org: use mult_frac for fractional proportions, build fixes] Signed-off-by: Dave Chinner Signed-off-by: Glauber Costa Acked-by: Mel Gorman Cc: "Theodore Ts'o" Cc: Adrian Hunter Cc: Al Viro Cc: Artem Bityutskiy Cc: Arve Hjønnevåg Cc: Carlos Maiolino Cc: Christoph Hellwig Cc: Chuck Lever Cc: Daniel Vetter Cc: David Rientjes Cc: Gleb Natapov Cc: Greg Thelen Cc: J. Bruce Fields Cc: Jan Kara Cc: Jerome Glisse Cc: John Stultz Cc: KAMEZAWA Hiroyuki Cc: Kent Overstreet Cc: Kirill A. Shutemov Cc: Marcelo Tosatti Cc: Mel Gorman Cc: Steven Whitehouse Cc: Thomas Hellstrom Cc: Trond Myklebust Signed-off-by: Andrew Morton Signed-off-by: Al Viro --- include/linux/fs.h | 8 ++------ 1 file changed, 2 insertions(+), 6 deletions(-) (limited to 'include/linux') diff --git a/include/linux/fs.h b/include/linux/fs.h index 14a90f6886fa..0ae0bc3c1fde 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -1335,10 +1335,6 @@ struct super_block { struct workqueue_struct *s_dio_done_wq; }; -/* superblock cache pruning functions */ -extern void prune_icache_sb(struct super_block *sb, int nr_to_scan); -extern void prune_dcache_sb(struct super_block *sb, int nr_to_scan); - extern struct timespec current_fs_time(struct super_block *sb); /* @@ -1631,8 +1627,8 @@ struct super_operations { ssize_t (*quota_write)(struct super_block *, int, const char *, size_t, loff_t); #endif int (*bdev_try_to_free_page)(struct super_block*, struct page*, gfp_t); - int (*nr_cached_objects)(struct super_block *); - void (*free_cached_objects)(struct super_block *, int); + long (*nr_cached_objects)(struct super_block *); + long (*free_cached_objects)(struct super_block *, long); }; /* -- cgit v1.2.3 From a38e40824844a5ec85f3ea95632be953477d2afa Mon Sep 17 00:00:00 2001 From: Dave Chinner Date: Wed, 28 Aug 2013 10:17:58 +1000 Subject: list: add a new LRU list type MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Several subsystems use the same construct for LRU lists - a list head, a spin lock and and item count. They also use exactly the same code for adding and removing items from the LRU. Create a generic type for these LRU lists. This is the beginning of generic, node aware LRUs for shrinkers to work with. [glommer@openvz.org: enum defined constants for lru. Suggested by gthelen, don't relock over retry] Signed-off-by: Dave Chinner Signed-off-by: Glauber Costa Reviewed-by: Greg Thelen Acked-by: Mel Gorman Cc: "Theodore Ts'o" Cc: Adrian Hunter Cc: Al Viro Cc: Artem Bityutskiy Cc: Arve Hjønnevåg Cc: Carlos Maiolino Cc: Christoph Hellwig Cc: Chuck Lever Cc: Daniel Vetter Cc: David Rientjes Cc: Gleb Natapov Cc: Greg Thelen Cc: J. Bruce Fields Cc: Jan Kara Cc: Jerome Glisse Cc: John Stultz Cc: KAMEZAWA Hiroyuki Cc: Kent Overstreet Cc: Kirill A. Shutemov Cc: Marcelo Tosatti Cc: Mel Gorman Cc: Steven Whitehouse Cc: Thomas Hellstrom Cc: Trond Myklebust Signed-off-by: Andrew Morton Signed-off-by: Al Viro --- include/linux/list_lru.h | 115 +++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 115 insertions(+) create mode 100644 include/linux/list_lru.h (limited to 'include/linux') diff --git a/include/linux/list_lru.h b/include/linux/list_lru.h new file mode 100644 index 000000000000..1a548b0b7578 --- /dev/null +++ b/include/linux/list_lru.h @@ -0,0 +1,115 @@ +/* + * Copyright (c) 2013 Red Hat, Inc. and Parallels Inc. All rights reserved. + * Authors: David Chinner and Glauber Costa + * + * Generic LRU infrastructure + */ +#ifndef _LRU_LIST_H +#define _LRU_LIST_H + +#include + +/* list_lru_walk_cb has to always return one of those */ +enum lru_status { + LRU_REMOVED, /* item removed from list */ + LRU_ROTATE, /* item referenced, give another pass */ + LRU_SKIP, /* item cannot be locked, skip */ + LRU_RETRY, /* item not freeable. May drop the lock + internally, but has to return locked. */ +}; + +struct list_lru { + spinlock_t lock; + struct list_head list; + /* kept as signed so we can catch imbalance bugs */ + long nr_items; +}; + +int list_lru_init(struct list_lru *lru); + +/** + * list_lru_add: add an element to the lru list's tail + * @list_lru: the lru pointer + * @item: the item to be added. + * + * If the element is already part of a list, this function returns doing + * nothing. Therefore the caller does not need to keep state about whether or + * not the element already belongs in the list and is allowed to lazy update + * it. Note however that this is valid for *a* list, not *this* list. If + * the caller organize itself in a way that elements can be in more than + * one type of list, it is up to the caller to fully remove the item from + * the previous list (with list_lru_del() for instance) before moving it + * to @list_lru + * + * Return value: true if the list was updated, false otherwise + */ +bool list_lru_add(struct list_lru *lru, struct list_head *item); + +/** + * list_lru_del: delete an element to the lru list + * @list_lru: the lru pointer + * @item: the item to be deleted. + * + * This function works analogously as list_lru_add in terms of list + * manipulation. The comments about an element already pertaining to + * a list are also valid for list_lru_del. + * + * Return value: true if the list was updated, false otherwise + */ +bool list_lru_del(struct list_lru *lru, struct list_head *item); + +/** + * list_lru_count: return the number of objects currently held by @lru + * @lru: the lru pointer. + * + * Always return a non-negative number, 0 for empty lists. There is no + * guarantee that the list is not updated while the count is being computed. + * Callers that want such a guarantee need to provide an outer lock. + */ +static inline unsigned long list_lru_count(struct list_lru *lru) +{ + return lru->nr_items; +} + +typedef enum lru_status +(*list_lru_walk_cb)(struct list_head *item, spinlock_t *lock, void *cb_arg); +/** + * list_lru_walk: walk a list_lru, isolating and disposing freeable items. + * @lru: the lru pointer. + * @isolate: callback function that is resposible for deciding what to do with + * the item currently being scanned + * @cb_arg: opaque type that will be passed to @isolate + * @nr_to_walk: how many items to scan. + * + * This function will scan all elements in a particular list_lru, calling the + * @isolate callback for each of those items, along with the current list + * spinlock and a caller-provided opaque. The @isolate callback can choose to + * drop the lock internally, but *must* return with the lock held. The callback + * will return an enum lru_status telling the list_lru infrastructure what to + * do with the object being scanned. + * + * Please note that nr_to_walk does not mean how many objects will be freed, + * just how many objects will be scanned. + * + * Return value: the number of objects effectively removed from the LRU. + */ +unsigned long list_lru_walk(struct list_lru *lru, list_lru_walk_cb isolate, + void *cb_arg, unsigned long nr_to_walk); + +typedef void (*list_lru_dispose_cb)(struct list_head *dispose_list); +/** + * list_lru_dispose_all: forceably flush all elements in an @lru + * @lru: the lru pointer + * @dispose: callback function to be called for each lru list. + * + * This function will forceably isolate all elements into the dispose list, and + * call the @dispose callback to flush the list. Please note that the callback + * should expect items in any state, clean or dirty, and be able to flush all of + * them. + * + * Return value: how many objects were freed. It should be equal to all objects + * in the list_lru. + */ +unsigned long +list_lru_dispose_all(struct list_lru *lru, list_lru_dispose_cb dispose); +#endif /* _LRU_LIST_H */ -- cgit v1.2.3 From bc3b14cb2d505dda969dbe3a31038dbb24aca945 Mon Sep 17 00:00:00 2001 From: Dave Chinner Date: Wed, 28 Aug 2013 10:17:58 +1000 Subject: inode: convert inode lru list to generic lru list code. MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit [glommer@openvz.org: adapted for new LRU return codes] Signed-off-by: Dave Chinner Signed-off-by: Glauber Costa Cc: "Theodore Ts'o" Cc: Adrian Hunter Cc: Al Viro Cc: Artem Bityutskiy Cc: Arve Hjønnevåg Cc: Carlos Maiolino Cc: Christoph Hellwig Cc: Chuck Lever Cc: Daniel Vetter Cc: David Rientjes Cc: Gleb Natapov Cc: Greg Thelen Cc: J. Bruce Fields Cc: Jan Kara Cc: Jerome Glisse Cc: John Stultz Cc: KAMEZAWA Hiroyuki Cc: Kent Overstreet Cc: Kirill A. Shutemov Cc: Marcelo Tosatti Cc: Mel Gorman Cc: Steven Whitehouse Cc: Thomas Hellstrom Cc: Trond Myklebust Signed-off-by: Andrew Morton Signed-off-by: Al Viro --- include/linux/fs.h | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-) (limited to 'include/linux') diff --git a/include/linux/fs.h b/include/linux/fs.h index 0ae0bc3c1fde..e04786569c28 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -10,6 +10,7 @@ #include #include #include +#include #include #include #include @@ -1275,10 +1276,7 @@ struct super_block { struct list_head s_dentry_lru; /* unused dentry lru */ long s_nr_dentry_unused; /* # of dentry on lru */ - /* s_inode_lru_lock protects s_inode_lru and s_nr_inodes_unused */ - spinlock_t s_inode_lru_lock ____cacheline_aligned_in_smp; - struct list_head s_inode_lru; /* unused inode lru */ - long s_nr_inodes_unused; /* # of inodes on lru */ + struct list_lru s_inode_lru ____cacheline_aligned_in_smp; struct block_device *s_bdev; struct backing_dev_info *s_bdi; -- cgit v1.2.3 From f604156751db77e08afe47ce29fe8f3d51ad9b04 Mon Sep 17 00:00:00 2001 From: Dave Chinner Date: Wed, 28 Aug 2013 10:18:00 +1000 Subject: dcache: convert to use new lru list infrastructure MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit [glommer@openvz.org: don't reintroduce double decrement of nr_unused_dentries, adapted for new LRU return codes] Signed-off-by: Dave Chinner Signed-off-by: Glauber Costa Cc: "Theodore Ts'o" Cc: Adrian Hunter Cc: Al Viro Cc: Artem Bityutskiy Cc: Arve Hjønnevåg Cc: Carlos Maiolino Cc: Christoph Hellwig Cc: Chuck Lever Cc: Daniel Vetter Cc: David Rientjes Cc: Gleb Natapov Cc: Greg Thelen Cc: J. Bruce Fields Cc: Jan Kara Cc: Jerome Glisse Cc: John Stultz Cc: KAMEZAWA Hiroyuki Cc: Kent Overstreet Cc: Kirill A. Shutemov Cc: Marcelo Tosatti Cc: Mel Gorman Cc: Steven Whitehouse Cc: Thomas Hellstrom Cc: Trond Myklebust Signed-off-by: Andrew Morton Signed-off-by: Al Viro --- include/linux/fs.h | 15 +++++++-------- 1 file changed, 7 insertions(+), 8 deletions(-) (limited to 'include/linux') diff --git a/include/linux/fs.h b/include/linux/fs.h index e04786569c28..36e45df87f6e 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -1270,14 +1270,6 @@ struct super_block { struct list_head s_files; #endif struct list_head s_mounts; /* list of mounts; _not_ for fs use */ - - /* s_dentry_lru_lock protects s_dentry_lru and s_nr_dentry_unused */ - spinlock_t s_dentry_lru_lock ____cacheline_aligned_in_smp; - struct list_head s_dentry_lru; /* unused dentry lru */ - long s_nr_dentry_unused; /* # of dentry on lru */ - - struct list_lru s_inode_lru ____cacheline_aligned_in_smp; - struct block_device *s_bdev; struct backing_dev_info *s_bdi; struct mtd_info *s_mtd; @@ -1331,6 +1323,13 @@ struct super_block { /* AIO completions deferred from interrupt context */ struct workqueue_struct *s_dio_done_wq; + + /* + * Keep the lru lists last in the structure so they always sit on their + * own individual cachelines. + */ + struct list_lru s_dentry_lru ____cacheline_aligned_in_smp; + struct list_lru s_inode_lru ____cacheline_aligned_in_smp; }; extern struct timespec current_fs_time(struct super_block *sb); -- cgit v1.2.3 From 3b1d58a4c96799eb4c92039e1b851b86f853548a Mon Sep 17 00:00:00 2001 From: Dave Chinner Date: Wed, 28 Aug 2013 10:18:00 +1000 Subject: list_lru: per-node list infrastructure MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Now that we have an LRU list API, we can start to enhance the implementation. This splits the single LRU list into per-node lists and locks to enhance scalability. Items are placed on lists according to the node the memory belongs to. To make scanning the lists efficient, also track whether the per-node lists have entries in them in a active nodemask. Note: We use a fixed-size array for the node LRU, this struct can be very big if MAX_NUMNODES is big. If this becomes a problem this is fixable by turning this into a pointer and dynamically allocating this to nr_node_ids. This quantity is firwmare-provided, and still would provide room for all nodes at the cost of a pointer lookup and an extra allocation. Because that allocation will most likely come from a may very well fail. [glommer@openvz.org: fix warnings, added note about node lru] Signed-off-by: Dave Chinner Signed-off-by: Glauber Costa Reviewed-by: Greg Thelen Acked-by: Mel Gorman Cc: "Theodore Ts'o" Cc: Adrian Hunter Cc: Al Viro Cc: Artem Bityutskiy Cc: Arve Hjønnevåg Cc: Carlos Maiolino Cc: Christoph Hellwig Cc: Chuck Lever Cc: Daniel Vetter Cc: David Rientjes Cc: Gleb Natapov Cc: Greg Thelen Cc: J. Bruce Fields Cc: Jan Kara Cc: Jerome Glisse Cc: John Stultz Cc: KAMEZAWA Hiroyuki Cc: Kent Overstreet Cc: Kirill A. Shutemov Cc: Marcelo Tosatti Cc: Mel Gorman Cc: Steven Whitehouse Cc: Thomas Hellstrom Cc: Trond Myklebust Signed-off-by: Andrew Morton Signed-off-by: Al Viro --- include/linux/list_lru.h | 23 ++++++++++++++++++----- 1 file changed, 18 insertions(+), 5 deletions(-) (limited to 'include/linux') diff --git a/include/linux/list_lru.h b/include/linux/list_lru.h index 1a548b0b7578..f4d4cb608c02 100644 --- a/include/linux/list_lru.h +++ b/include/linux/list_lru.h @@ -8,6 +8,7 @@ #define _LRU_LIST_H #include +#include /* list_lru_walk_cb has to always return one of those */ enum lru_status { @@ -18,11 +19,26 @@ enum lru_status { internally, but has to return locked. */ }; -struct list_lru { +struct list_lru_node { spinlock_t lock; struct list_head list; /* kept as signed so we can catch imbalance bugs */ long nr_items; +} ____cacheline_aligned_in_smp; + +struct list_lru { + /* + * Because we use a fixed-size array, this struct can be very big if + * MAX_NUMNODES is big. If this becomes a problem this is fixable by + * turning this into a pointer and dynamically allocating this to + * nr_node_ids. This quantity is firwmare-provided, and still would + * provide room for all nodes at the cost of a pointer lookup and an + * extra allocation. Because that allocation will most likely come from + * a different slab cache than the main structure holding this + * structure, we may very well fail. + */ + struct list_lru_node node[MAX_NUMNODES]; + nodemask_t active_nodes; }; int list_lru_init(struct list_lru *lru); @@ -66,10 +82,7 @@ bool list_lru_del(struct list_lru *lru, struct list_head *item); * guarantee that the list is not updated while the count is being computed. * Callers that want such a guarantee need to provide an outer lock. */ -static inline unsigned long list_lru_count(struct list_lru *lru) -{ - return lru->nr_items; -} +unsigned long list_lru_count(struct list_lru *lru); typedef enum lru_status (*list_lru_walk_cb)(struct list_head *item, spinlock_t *lock, void *cb_arg); -- cgit v1.2.3 From 6a4f496fd2fc74fa036732ae52c184952d6e3e37 Mon Sep 17 00:00:00 2001 From: Glauber Costa Date: Wed, 28 Aug 2013 10:18:02 +1000 Subject: list_lru: per-node API MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit This patch adapts the list_lru API to accept an optional node argument, to be used by NUMA aware shrinking functions. Code that does not care about the NUMA placement of objects can still call into the very same functions as before. They will simply iterate over all nodes. Signed-off-by: Glauber Costa Cc: Dave Chinner Cc: Mel Gorman Cc: "Theodore Ts'o" Cc: Adrian Hunter Cc: Al Viro Cc: Artem Bityutskiy Cc: Arve Hjønnevåg Cc: Carlos Maiolino Cc: Christoph Hellwig Cc: Chuck Lever Cc: Daniel Vetter Cc: David Rientjes Cc: Gleb Natapov Cc: Greg Thelen Cc: J. Bruce Fields Cc: Jan Kara Cc: Jerome Glisse Cc: John Stultz Cc: KAMEZAWA Hiroyuki Cc: Kent Overstreet Cc: Kirill A. Shutemov Cc: Marcelo Tosatti Cc: Mel Gorman Cc: Steven Whitehouse Cc: Thomas Hellstrom Cc: Trond Myklebust Signed-off-by: Andrew Morton Signed-off-by: Al Viro --- include/linux/list_lru.h | 39 ++++++++++++++++++++++++++++++++++----- 1 file changed, 34 insertions(+), 5 deletions(-) (limited to 'include/linux') diff --git a/include/linux/list_lru.h b/include/linux/list_lru.h index f4d4cb608c02..2fe13e1a809a 100644 --- a/include/linux/list_lru.h +++ b/include/linux/list_lru.h @@ -75,20 +75,32 @@ bool list_lru_add(struct list_lru *lru, struct list_head *item); bool list_lru_del(struct list_lru *lru, struct list_head *item); /** - * list_lru_count: return the number of objects currently held by @lru + * list_lru_count_node: return the number of objects currently held by @lru * @lru: the lru pointer. + * @nid: the node id to count from. * * Always return a non-negative number, 0 for empty lists. There is no * guarantee that the list is not updated while the count is being computed. * Callers that want such a guarantee need to provide an outer lock. */ -unsigned long list_lru_count(struct list_lru *lru); +unsigned long list_lru_count_node(struct list_lru *lru, int nid); +static inline unsigned long list_lru_count(struct list_lru *lru) +{ + long count = 0; + int nid; + + for_each_node_mask(nid, lru->active_nodes) + count += list_lru_count_node(lru, nid); + + return count; +} typedef enum lru_status (*list_lru_walk_cb)(struct list_head *item, spinlock_t *lock, void *cb_arg); /** - * list_lru_walk: walk a list_lru, isolating and disposing freeable items. + * list_lru_walk_node: walk a list_lru, isolating and disposing freeable items. * @lru: the lru pointer. + * @nid: the node id to scan from. * @isolate: callback function that is resposible for deciding what to do with * the item currently being scanned * @cb_arg: opaque type that will be passed to @isolate @@ -106,8 +118,25 @@ typedef enum lru_status * * Return value: the number of objects effectively removed from the LRU. */ -unsigned long list_lru_walk(struct list_lru *lru, list_lru_walk_cb isolate, - void *cb_arg, unsigned long nr_to_walk); +unsigned long list_lru_walk_node(struct list_lru *lru, int nid, + list_lru_walk_cb isolate, void *cb_arg, + unsigned long *nr_to_walk); + +static inline unsigned long +list_lru_walk(struct list_lru *lru, list_lru_walk_cb isolate, + void *cb_arg, unsigned long nr_to_walk) +{ + long isolated = 0; + int nid; + + for_each_node_mask(nid, lru->active_nodes) { + isolated += list_lru_walk_node(lru, nid, isolate, + cb_arg, &nr_to_walk); + if (nr_to_walk <= 0) + break; + } + return isolated; +} typedef void (*list_lru_dispose_cb)(struct list_head *dispose_list); /** -- cgit v1.2.3 From 4e717f5c1083995c334ced639cc77a75e9972567 Mon Sep 17 00:00:00 2001 From: Glauber Costa Date: Wed, 28 Aug 2013 10:18:03 +1000 Subject: list_lru: remove special case function list_lru_dispose_all. The list_lru implementation has one function, list_lru_dispose_all, with only one user (the dentry code). At first, such function appears to make sense because we are really not interested in the result of isolating each dentry separately - all of them are going away anyway. However, it's implementation is buggy in the following way: When we call list_lru_dispose_all in fs/dcache.c, we scan all dentries marking them with DCACHE_SHRINK_LIST. However, this is done without the nlru->lock taken. The imediate result of that is that someone else may add or remove the dentry from the LRU at the same time. When list_lru_del happens in that scenario we will see an element that is not yet marked with DCACHE_SHRINK_LIST (even though it will be in the future) and obviously remove it from an lru where the element no longer is. Since list_lru_dispose_all will in effect count down nlru's nr_items and list_lru_del will do the same, this will lead to an imbalance. The solution for this would not be so simple: we can obviously just keep the lru_lock taken, but then we have no guarantees that we will be able to acquire the dentry lock (dentry->d_lock). To properly solve this, we need a communication mechanism between the lru and dentry code, so they can coordinate this with each other. Such mechanism already exists in the form of the list_lru_walk_cb callback. So it is possible to construct a dcache-side prune function that does the right thing only by calling list_lru_walk in a loop until no more dentries are available. With only one user, plus the fact that a sane solution for the problem would involve boucing between dcache and list_lru anyway, I see little justification to keep the special case list_lru_dispose_all in tree. Signed-off-by: Glauber Costa Cc: Michal Hocko Acked-by: Dave Chinner Signed-off-by: Andrew Morton Signed-off-by: Al Viro --- include/linux/list_lru.h | 17 ----------------- 1 file changed, 17 deletions(-) (limited to 'include/linux') diff --git a/include/linux/list_lru.h b/include/linux/list_lru.h index 2fe13e1a809a..4d02ad3badab 100644 --- a/include/linux/list_lru.h +++ b/include/linux/list_lru.h @@ -137,21 +137,4 @@ list_lru_walk(struct list_lru *lru, list_lru_walk_cb isolate, } return isolated; } - -typedef void (*list_lru_dispose_cb)(struct list_head *dispose_list); -/** - * list_lru_dispose_all: forceably flush all elements in an @lru - * @lru: the lru pointer - * @dispose: callback function to be called for each lru list. - * - * This function will forceably isolate all elements into the dispose list, and - * call the @dispose callback to flush the list. Please note that the callback - * should expect items in any state, clean or dirty, and be able to flush all of - * them. - * - * Return value: how many objects were freed. It should be equal to all objects - * in the list_lru. - */ -unsigned long -list_lru_dispose_all(struct list_lru *lru, list_lru_dispose_cb dispose); #endif /* _LRU_LIST_H */ -- cgit v1.2.3 From 0ce3d74450815500e31f16a0b65f6bab687985c3 Mon Sep 17 00:00:00 2001 From: Dave Chinner Date: Wed, 28 Aug 2013 10:18:03 +1000 Subject: shrinker: add node awareness MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Pass the node of the current zone being reclaimed to shrink_slab(), allowing the shrinker control nodemask to be set appropriately for node aware shrinkers. Signed-off-by: Dave Chinner Signed-off-by: Glauber Costa Acked-by: Mel Gorman Cc: "Theodore Ts'o" Cc: Adrian Hunter Cc: Al Viro Cc: Artem Bityutskiy Cc: Arve Hjønnevåg Cc: Carlos Maiolino Cc: Christoph Hellwig Cc: Chuck Lever Cc: Daniel Vetter Cc: David Rientjes Cc: Gleb Natapov Cc: Greg Thelen Cc: J. Bruce Fields Cc: Jan Kara Cc: Jerome Glisse Cc: John Stultz Cc: KAMEZAWA Hiroyuki Cc: Kent Overstreet Cc: Kirill A. Shutemov Cc: Marcelo Tosatti Cc: Mel Gorman Cc: Steven Whitehouse Cc: Thomas Hellstrom Cc: Trond Myklebust Signed-off-by: Andrew Morton Signed-off-by: Al Viro --- include/linux/shrinker.h | 3 +++ 1 file changed, 3 insertions(+) (limited to 'include/linux') diff --git a/include/linux/shrinker.h b/include/linux/shrinker.h index 884e76222e1b..76f520c4c394 100644 --- a/include/linux/shrinker.h +++ b/include/linux/shrinker.h @@ -16,6 +16,9 @@ struct shrink_control { /* How many slab objects shrinker() should scan and try to reclaim */ unsigned long nr_to_scan; + + /* shrink from these nodes */ + nodemask_t nodes_to_scan; }; #define SHRINK_STOP (~0UL) -- cgit v1.2.3 From 1d3d4437eae1bb2963faab427f65f90663c64aa1 Mon Sep 17 00:00:00 2001 From: Glauber Costa Date: Wed, 28 Aug 2013 10:18:04 +1000 Subject: vmscan: per-node deferred work MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The list_lru infrastructure already keeps per-node LRU lists in its node-specific list_lru_node arrays and provide us with a per-node API, and the shrinkers are properly equiped with node information. This means that we can now focus our shrinking effort in a single node, but the work that is deferred from one run to another is kept global at nr_in_batch. Work can be deferred, for instance, during direct reclaim under a GFP_NOFS allocation, where situation, all the filesystem shrinkers will be prevented from running and accumulate in nr_in_batch the amount of work they should have done, but could not. This creates an impedance problem, where upon node pressure, work deferred will accumulate and end up being flushed in other nodes. The problem we describe is particularly harmful in big machines, where many nodes can accumulate at the same time, all adding to the global counter nr_in_batch. As we accumulate more and more, we start to ask for the caches to flush even bigger numbers. The result is that the caches are depleted and do not stabilize. To achieve stable steady state behavior, we need to tackle it differently. In this patch we keep the deferred count per-node, in the new array nr_deferred[] (the name is also a bit more descriptive) and will never accumulate that to other nodes. Signed-off-by: Glauber Costa Cc: Dave Chinner Cc: Mel Gorman Cc: "Theodore Ts'o" Cc: Adrian Hunter Cc: Al Viro Cc: Artem Bityutskiy Cc: Arve Hjønnevåg Cc: Carlos Maiolino Cc: Christoph Hellwig Cc: Chuck Lever Cc: Daniel Vetter Cc: David Rientjes Cc: Gleb Natapov Cc: Greg Thelen Cc: J. Bruce Fields Cc: Jan Kara Cc: Jerome Glisse Cc: John Stultz Cc: KAMEZAWA Hiroyuki Cc: Kent Overstreet Cc: Kirill A. Shutemov Cc: Marcelo Tosatti Cc: Mel Gorman Cc: Steven Whitehouse Cc: Thomas Hellstrom Cc: Trond Myklebust Signed-off-by: Andrew Morton Signed-off-by: Al Viro --- include/linux/shrinker.h | 14 ++++++++++++-- 1 file changed, 12 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/shrinker.h b/include/linux/shrinker.h index 76f520c4c394..8f80f243fed9 100644 --- a/include/linux/shrinker.h +++ b/include/linux/shrinker.h @@ -19,6 +19,8 @@ struct shrink_control { /* shrink from these nodes */ nodemask_t nodes_to_scan; + /* current node being shrunk (for NUMA aware shrinkers) */ + int nid; }; #define SHRINK_STOP (~0UL) @@ -44,6 +46,8 @@ struct shrink_control { * due to potential deadlocks. If SHRINK_STOP is returned, then no further * attempts to call the @scan_objects will be made from the current reclaim * context. + * + * @flags determine the shrinker abilities, like numa awareness */ struct shrinker { int (*shrink)(struct shrinker *, struct shrink_control *sc); @@ -54,12 +58,18 @@ struct shrinker { int seeks; /* seeks to recreate an obj */ long batch; /* reclaim batch size, 0 = default */ + unsigned long flags; /* These are for internal use */ struct list_head list; - atomic_long_t nr_in_batch; /* objs pending delete */ + /* objs pending delete, per node */ + atomic_long_t *nr_deferred; }; #define DEFAULT_SEEKS 2 /* A good number if you don't know better. */ -extern void register_shrinker(struct shrinker *); + +/* Flags */ +#define SHRINKER_NUMA_AWARE (1 << 0) + +extern int register_shrinker(struct shrinker *); extern void unregister_shrinker(struct shrinker *); #endif -- cgit v1.2.3 From 9b17c62382dd2e7507984b9890bf44e070cdd8bb Mon Sep 17 00:00:00 2001 From: Dave Chinner Date: Wed, 28 Aug 2013 10:18:05 +1000 Subject: fs: convert inode and dentry shrinking to be node aware MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Now that the shrinker is passing a node in the scan control structure, we can pass this to the the generic LRU list code to isolate reclaim to the lists on matching nodes. Signed-off-by: Dave Chinner Signed-off-by: Glauber Costa Acked-by: Mel Gorman Cc: "Theodore Ts'o" Cc: Adrian Hunter Cc: Al Viro Cc: Artem Bityutskiy Cc: Arve Hjønnevåg Cc: Carlos Maiolino Cc: Christoph Hellwig Cc: Chuck Lever Cc: Daniel Vetter Cc: David Rientjes Cc: Gleb Natapov Cc: Greg Thelen Cc: J. Bruce Fields Cc: Jan Kara Cc: Jerome Glisse Cc: John Stultz Cc: KAMEZAWA Hiroyuki Cc: Kent Overstreet Cc: Kirill A. Shutemov Cc: Marcelo Tosatti Cc: Mel Gorman Cc: Steven Whitehouse Cc: Thomas Hellstrom Cc: Trond Myklebust Signed-off-by: Andrew Morton Signed-off-by: Al Viro --- include/linux/fs.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/fs.h b/include/linux/fs.h index 36e45df87f6e..a4acd3c61190 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -1624,8 +1624,8 @@ struct super_operations { ssize_t (*quota_write)(struct super_block *, int, const char *, size_t, loff_t); #endif int (*bdev_try_to_free_page)(struct super_block*, struct page*, gfp_t); - long (*nr_cached_objects)(struct super_block *); - long (*free_cached_objects)(struct super_block *, long); + long (*nr_cached_objects)(struct super_block *, int); + long (*free_cached_objects)(struct super_block *, long, int); }; /* -- cgit v1.2.3 From a0b02131c5fcd8545b867db72224b3659e813f10 Mon Sep 17 00:00:00 2001 From: Dave Chinner Date: Wed, 28 Aug 2013 10:18:16 +1000 Subject: shrinker: Kill old ->shrink API. MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit There are no more users of this API, so kill it dead, dead, dead and quietly bury the corpse in a shallow, unmarked grave in a dark forest deep in the hills... [glommer@openvz.org: added flowers to the grave] Signed-off-by: Dave Chinner Signed-off-by: Glauber Costa Reviewed-by: Greg Thelen Acked-by: Mel Gorman Cc: "Theodore Ts'o" Cc: Adrian Hunter Cc: Al Viro Cc: Artem Bityutskiy Cc: Arve Hjønnevåg Cc: Carlos Maiolino Cc: Christoph Hellwig Cc: Chuck Lever Cc: Daniel Vetter Cc: David Rientjes Cc: Gleb Natapov Cc: Greg Thelen Cc: J. Bruce Fields Cc: Jan Kara Cc: Jerome Glisse Cc: John Stultz Cc: KAMEZAWA Hiroyuki Cc: Kent Overstreet Cc: Kirill A. Shutemov Cc: Marcelo Tosatti Cc: Mel Gorman Cc: Steven Whitehouse Cc: Thomas Hellstrom Cc: Trond Myklebust Signed-off-by: Andrew Morton Signed-off-by: Al Viro --- include/linux/shrinker.h | 15 +++++---------- 1 file changed, 5 insertions(+), 10 deletions(-) (limited to 'include/linux') diff --git a/include/linux/shrinker.h b/include/linux/shrinker.h index 8f80f243fed9..68c097077ef0 100644 --- a/include/linux/shrinker.h +++ b/include/linux/shrinker.h @@ -7,14 +7,15 @@ * * The 'gfpmask' refers to the allocation we are currently trying to * fulfil. - * - * Note that 'shrink' will be passed nr_to_scan == 0 when the VM is - * querying the cache size, so a fastpath for that case is appropriate. */ struct shrink_control { gfp_t gfp_mask; - /* How many slab objects shrinker() should scan and try to reclaim */ + /* + * How many objects scan_objects should scan and try to reclaim. + * This is reset before every call, so it is safe for callees + * to modify. + */ unsigned long nr_to_scan; /* shrink from these nodes */ @@ -27,11 +28,6 @@ struct shrink_control { /* * A callback you can register to apply pressure to ageable caches. * - * @shrink() should look through the least-recently-used 'nr_to_scan' entries - * and attempt to free them up. It should return the number of objects which - * remain in the cache. If it returns -1, it means it cannot do any scanning at - * this time (eg. there is a risk of deadlock). - * * @count_objects should return the number of freeable items in the cache. If * there are no objects to free or the number of freeable items cannot be * determined, it should return 0. No deadlock checks should be done during the @@ -50,7 +46,6 @@ struct shrink_control { * @flags determine the shrinker abilities, like numa awareness */ struct shrinker { - int (*shrink)(struct shrinker *, struct shrink_control *sc); unsigned long (*count_objects)(struct shrinker *, struct shrink_control *sc); unsigned long (*scan_objects)(struct shrinker *, -- cgit v1.2.3 From 5ca302c8e502ca53b7d75f12127ec0289904003a Mon Sep 17 00:00:00 2001 From: Glauber Costa Date: Wed, 28 Aug 2013 10:18:18 +1000 Subject: list_lru: dynamically adjust node arrays MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit We currently use a compile-time constant to size the node array for the list_lru structure. Due to this, we don't need to allocate any memory at initialization time. But as a consequence, the structures that contain embedded list_lru lists can become way too big (the superblock for instance contains two of them). This patch aims at ameliorating this situation by dynamically allocating the node arrays with the firmware provided nr_node_ids. Signed-off-by: Glauber Costa Cc: Dave Chinner Cc: Mel Gorman Cc: "Theodore Ts'o" Cc: Adrian Hunter Cc: Al Viro Cc: Artem Bityutskiy Cc: Arve Hjønnevåg Cc: Carlos Maiolino Cc: Christoph Hellwig Cc: Chuck Lever Cc: Daniel Vetter Cc: David Rientjes Cc: Gleb Natapov Cc: Greg Thelen Cc: J. Bruce Fields Cc: Jan Kara Cc: Jerome Glisse Cc: John Stultz Cc: KAMEZAWA Hiroyuki Cc: Kent Overstreet Cc: Kirill A. Shutemov Cc: Marcelo Tosatti Cc: Mel Gorman Cc: Steven Whitehouse Cc: Thomas Hellstrom Cc: Trond Myklebust Signed-off-by: Andrew Morton Signed-off-by: Al Viro --- include/linux/list_lru.h | 13 ++----------- 1 file changed, 2 insertions(+), 11 deletions(-) (limited to 'include/linux') diff --git a/include/linux/list_lru.h b/include/linux/list_lru.h index 4d02ad3badab..3ce541753c88 100644 --- a/include/linux/list_lru.h +++ b/include/linux/list_lru.h @@ -27,20 +27,11 @@ struct list_lru_node { } ____cacheline_aligned_in_smp; struct list_lru { - /* - * Because we use a fixed-size array, this struct can be very big if - * MAX_NUMNODES is big. If this becomes a problem this is fixable by - * turning this into a pointer and dynamically allocating this to - * nr_node_ids. This quantity is firwmare-provided, and still would - * provide room for all nodes at the cost of a pointer lookup and an - * extra allocation. Because that allocation will most likely come from - * a different slab cache than the main structure holding this - * structure, we may very well fail. - */ - struct list_lru_node node[MAX_NUMNODES]; + struct list_lru_node *node; nodemask_t active_nodes; }; +void list_lru_destroy(struct list_lru *lru); int list_lru_init(struct list_lru *lru); /** -- cgit v1.2.3