summaryrefslogtreecommitdiff
path: root/include/linux/dcache.h (follow)
Commit message (Collapse)AuthorAge
* f2fs: backport from (4c1fad64 - Merge tag 'for-f2fs-4.9' of ↵Jaegeuk Kim2017-09-25
| | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs) Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* Merge 4.4.80 into android-4.4Greg Kroah-Hartman2017-08-07
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Changes in 4.4.80 af_key: Add lock to key dump pstore: Make spinlock per zone instead of global net: reduce skb_warn_bad_offload() noise powerpc/pseries: Fix of_node_put() underflow during reconfig remove crypto: authencesn - Fix digest_null crash md/raid5: add thread_group worker async_tx_issue_pending_all drm/vmwgfx: Fix gcc-7.1.1 warning drm/nouveau/bar/gf100: fix access to upper half of BAR2 KVM: PPC: Book3S HV: Context-switch EBB registers properly KVM: PPC: Book3S HV: Restore critical SPRs to host values on guest exit KVM: PPC: Book3S HV: Reload HTM registers explicitly KVM: PPC: Book3S HV: Save/restore host values of debug registers Revert "powerpc/numa: Fix percpu allocations to be NUMA aware" Staging: comedi: comedi_fops: Avoid orphaned proc entry drm/rcar: Nuke preclose hook drm: rcar-du: Perform initialization/cleanup at probe/remove time drm: rcar-du: Simplify and fix probe error handling perf intel-pt: Fix ip compression perf intel-pt: Fix last_ip usage perf intel-pt: Use FUP always when scanning for an IP perf intel-pt: Ensure never to set 'last_ip' when packet 'count' is zero xfs: don't BUG() on mixed direct and mapped I/O nfc: fdp: fix NULL pointer dereference net: phy: Do not perform software reset for Generic PHY isdn: Fix a sleep-in-atomic bug isdn/i4l: fix buffer overflow ath10k: fix null deref on wmi-tlv when trying spectral scan wil6210: fix deadlock when using fw_no_recovery option mailbox: always wait in mbox_send_message for blocking Tx mode mailbox: skip complete wait event if timer expired mailbox: handle empty message in tx_tick mpt3sas: Don't overreach ioc->reply_post[] during initialization kaweth: fix firmware download kaweth: fix oops upon failed memory allocation sched/cgroup: Move sched_online_group() back into css_online() to fix crash PM / Domains: defer dev_pm_domain_set() until genpd->attach_dev succeeds if present RDMA/uverbs: Fix the check for port number libnvdimm, btt: fix btt_rw_page not returning errors ipmi/watchdog: fix watchdog timeout set on reboot dentry name snapshots v4l: s5c73m3: fix negation operator Make file credentials available to the seqfile interfaces /proc/iomem: only expose physical resource addresses to privileged users vlan: Propagate MAC address to VLANs pstore: Allow prz to control need for locking pstore: Correctly initialize spinlock and flags pstore: Use dynamic spinlock initializer net: skb_needs_check() accepts CHECKSUM_NONE for tx sched/cputime: Fix prev steal time accouting during CPU hotplug xen/blkback: don't free be structure too early xen/blkback: don't use xen_blkif_get() in xen-blkback kthread tpm: fix a kernel memory leak in tpm-sysfs.c tpm: Replace device number bitmap with IDR x86/mce/AMD: Make the init code more robust r8169: add support for RTL8168 series add-on card. ARM: dts: n900: Mark eMMC slot with no-sdio and no-sd flags ipv6: Should use consistent conditional judgement for ip6 fragment between __ip6_append_data and ip6_finish_output net/mlx4: Remove BUG_ON from ICM allocation routine drm/msm: Ensure that the hardware write pointer is valid drm/msm: Verify that MSM_SUBMIT_BO_FLAGS are set vfio-pci: use 32-bit comparisons for register address for gcc-4.5 irqchip/keystone: Fix "scheduling while atomic" on rt ASoC: tlv320aic3x: Mark the RESET register as volatile spi: dw: Make debugfs name unique between instances ASoC: nau8825: fix invalid configuration in Pre-Scalar of FLL irqchip/mxs: Enable SKIP_SET_WAKE and MASK_ON_SUSPEND openrisc: Add _text symbol to fix ksym build error dmaengine: ioatdma: Add Skylake PCI Dev ID dmaengine: ioatdma: workaround SKX ioatdma version dmaengine: ti-dma-crossbar: Add some 'of_node_put()' in error path. ARM64: zynqmp: Fix W=1 dtc 1.4 warnings ARM64: zynqmp: Fix i2c node's compatible string ARM: s3c2410_defconfig: Fix invalid values for NF_CT_PROTO_* ACPI / scan: Prefer devices without _HID/_CID for _ADR matching usb: gadget: Fix copy/pasted error message Btrfs: adjust outstanding_extents counter properly when dio write is split tools lib traceevent: Fix prev/next_prio for deadline tasks xfrm: Don't use sk_family for socket policy lookups perf tools: Install tools/lib/traceevent plugins with install-bin perf symbols: Robustify reading of build-id from sysfs video: fbdev: cobalt_lcdfb: Handle return NULL error from devm_ioremap vfio-pci: Handle error from pci_iomap arm64: mm: fix show_pte KERN_CONT fallout nvmem: imx-ocotp: Fix wrong register size sh_eth: enable RX descriptor word 0 shift on SH7734 ALSA: usb-audio: test EP_FLAG_RUNNING at urb completion HID: ignore Petzl USB headlamp scsi: fnic: Avoid sending reset to firmware when another reset is in progress scsi: snic: Return error code on memory allocation failure ASoC: dpcm: Avoid putting stream state to STOP when FE stream is paused Linux 4.4.80 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
| * dentry name snapshotsAl Viro2017-08-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 49d31c2f389acfe83417083e1208422b4091cd9e upstream. take_dentry_name_snapshot() takes a safe snapshot of dentry name; if the name is a short one, it gets copied into caller-supplied structure, otherwise an extra reference to external name is grabbed (those are never modified). In either case the pointer to stable string is stored into the same structure. dentry must be held by the caller of take_dentry_name_snapshot(), but may be freely dropped afterwards - the snapshot will stay until destroyed by release_dentry_name_snapshot(). Intended use: struct name_snapshot s; take_dentry_name_snapshot(&s, dentry); ... access s.name ... release_dentry_name_snapshot(&s); Replaces fsnotify_oldname_...(), gets used in fsnotify to obtain the name to pass down with event. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* | Merge tag 'v4.4.16' into android-4.4.yDmitry Shmidt2016-08-01
|\| | | | | | | | | | | This is the 4.4.16 stable release Change-Id: Ibaf7b7e03695e1acebc654a2ca1a4bfcc48fcea4
| * vfs: add d_real_inode() helperMiklos Szeredi2016-07-27
| | | | | | | | | | | | | | | | | | | | commit a118084432d642eeccb961c7c8cc61525a941fcb upstream. Needed by the following fix. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * vfs: add vfs_select_inode() helperMiklos Szeredi2016-05-18
| | | | | | | | | | | | | | | | commit 54d5ca871e72f2bb172ec9323497f01cd5091ec7 upstream. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * fs: add file_dentry()Miklos Szeredi2016-04-20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit d101a125954eae1d397adda94ca6319485a50493 upstream. This series fixes bugs in nfs and ext4 due to 4bacc9c9234c ("overlayfs: Make f_path always point to the overlay and f_inode to the underlay"). Regular files opened on overlayfs will result in the file being opened on the underlying filesystem, while f_path points to the overlayfs mount/dentry. This confuses filesystems which get the dentry from struct file and assume it's theirs. Add a new helper, file_dentry() [*], to get the filesystem's own dentry from the file. This checks file->f_path.dentry->d_flags against DCACHE_OP_REAL, and returns file->f_path.dentry if DCACHE_OP_REAL is not set (this is the common, non-overlayfs case). In the uncommon case it will call into overlayfs's ->d_real() to get the underlying dentry, matching file_inode(file). The reason we need to check against the inode is that if the file is copied up while being open, d_real() would return the upper dentry, while the open file comes from the lower dentry. [*] If possible, it's better simply to use file_inode() instead. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Tested-by: Goldwyn Rodrigues <rgoldwyn@suse.com> Reviewed-by: Trond Myklebust <trond.myklebust@primarydata.com> Cc: David Howells <dhowells@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Daniel Axtens <dja@axtens.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * use ->d_seq to get coherency between ->d_inode and ->d_flagsAl Viro2016-03-09
| | | | | | | | | | | | | | | | | | | | | | | | | | commit a528aca7f359f4b0b1d72ae406097e491a5ba9ea upstream. Games with ordering and barriers are way too brittle. Just bump ->d_seq before and after updating ->d_inode and ->d_flags type bits, so that verifying ->d_seq would guarantee they are coherent. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* | vfs: change d_canonical_path to take two pathsDaniel Rosenberg2016-04-25
| | | | | | | | | | | | bug: 23904372 Change-Id: I4a686d64b6de37decf60019be1718e1d820193e6 Signed-off-by: Daniel Rosenberg <drosen@google.com>
* | vfs: add d_canonical_path for stacked filesystem supportDaniel Rosenberg2016-03-22
|/ | | | | | | | | | | | | | | Inotify does not currently know when a filesystem is acting as a wrapper around another fs. This means that inotify watchers will miss any modifications to the base file, as well as any made in a separate stacked fs that points to the same file. d_canonical_path solves this problem by allowing the fs to map a dentry to a path in the lower fs. Inotify can use it to find the appropriate place to watch to be informed of all changes to a file. Change-Id: I09563baffad1711a045e45c1bd0bd8713c2cc0b6 Signed-off-by: Daniel Rosenberg <drosen@google.com>
* include, lib: add __printf attributes to several function prototypesNicolas Iooss2015-07-17
| | | | | | | | | | | | | | | | | | | | | | Using __printf attributes helps to detect several format string issues at compile time (even though -Wformat-security is currently disabled in Makefile). For example it can detect when formatting a pointer as a number, like the issue fixed in commit a3fa71c40f18 ("wl18xx: show rx_frames_per_rates as an array as it really is"), or when the arguments do not match the format string, c.f. for example commit 5ce1aca81435 ("reiserfs: fix __RASSERT format string"). To prevent similar bugs in the future, add a __printf attribute to every function prototype which needs one in include/linux/ and lib/. These functions were mostly found by using gcc's -Wsuggest-attribute=format flag. Signed-off-by: Nicolas Iooss <nicolas.iooss_linux@m4x.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Felipe Balbi <balbi@ti.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* make simple_positive() publicAl Viro2015-06-23
| | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* overlayfs: Make f_path always point to the overlay and f_inode to the underlayDavid Howells2015-06-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Make file->f_path always point to the overlay dentry so that the path in /proc/pid/fd is correct and to ensure that label-based LSMs have access to the overlay as well as the underlay (path-based LSMs probably don't need it). Using my union testsuite to set things up, before the patch I see: [root@andromeda union-testsuite]# bash 5</mnt/a/foo107 [root@andromeda union-testsuite]# ls -l /proc/$$/fd/ ... lr-x------. 1 root root 64 Jun 5 14:38 5 -> /a/foo107 [root@andromeda union-testsuite]# stat /mnt/a/foo107 ... Device: 23h/35d Inode: 13381 Links: 1 ... [root@andromeda union-testsuite]# stat -L /proc/$$/fd/5 ... Device: 23h/35d Inode: 13381 Links: 1 ... After the patch: [root@andromeda union-testsuite]# bash 5</mnt/a/foo107 [root@andromeda union-testsuite]# ls -l /proc/$$/fd/ ... lr-x------. 1 root root 64 Jun 5 14:22 5 -> /mnt/a/foo107 [root@andromeda union-testsuite]# stat /mnt/a/foo107 ... Device: 23h/35d Inode: 40346 Links: 1 ... [root@andromeda union-testsuite]# stat -L /proc/$$/fd/5 ... Device: 23h/35d Inode: 40346 Links: 1 ... Note the change in where /proc/$$/fd/5 points to in the ls command. It was pointing to /a/foo107 (which doesn't exist) and now points to /mnt/a/foo107 (which is correct). The inode accessed, however, is the lower layer. The union layer is on device 25h/37d and the upper layer on 24h/36d. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* VFS: Impose ordering on accesses of d_inode and d_flagsDavid Howells2015-04-15
| | | | | | | | | | | | | | | | | | | | | | | | Impose ordering on accesses of d_inode and d_flags to avoid the need to do this: if (!dentry->d_inode || d_is_negative(dentry)) { when this: if (d_is_negative(dentry)) { should suffice. This check is especially problematic if a dentry can have its type field set to something other than DENTRY_MISS_TYPE when d_inode is NULL (as in unionmount). What we really need to do is stick a write barrier between setting d_inode and setting d_flags and a read barrier between reading d_flags and reading d_inode. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* VFS: Add owner-filesystem positive/negative dentry checksDavid Howells2015-04-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Supply two functions to test whether a filesystem's own dentries are positive or negative (d_really_is_positive() and d_really_is_negative()). The problem is that the DCACHE_ENTRY_TYPE field of dentry->d_flags may be overridden by the union part of a layered filesystem and isn't thus necessarily indicative of the type of dentry. Normally, this would involve a negative dentry (ie. ->d_inode == NULL) having ->d_layer.lower pointed to a lower layer dentry, DCACHE_PINNING_LOWER set and the DCACHE_ENTRY_TYPE field set to something other than DCACHE_MISS_TYPE - but it could also involve, say, a DCACHE_SPECIAL_TYPE being overridden to DCACHE_WHITEOUT_TYPE if a 0,0 chardev is detected in the top layer. However, inside a filesystem, when that fs is looking at its own dentries, it probably wants to know if they are really negative or not - and doesn't care about the fallthrough bits used by the union. To this end, a filesystem should normally use d_really_is_positive/negative() when looking at its own dentries rather than d_is_positive/negative() and should use d_inode() to get at the inode. Anyone looking at someone else's dentries (this includes pathwalk) should use d_is_xxx() and d_backing_inode(). Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* VFS: Split DCACHE_FILE_TYPE into regular and special typesDavid Howells2015-02-22
| | | | | | | | | | | | | | | Split DCACHE_FILE_TYPE into DCACHE_REGULAR_TYPE (dentries representing regular files) and DCACHE_SPECIAL_TYPE (representing blockdev, chardev, FIFO and socket files). d_is_reg() and d_is_special() are added to detect these subtypes and d_is_file() is left as the union of the two. This allows a number of places that use S_ISREG(dentry->d_inode->i_mode) to use d_is_reg(dentry) instead. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* VFS: Add a fallthrough flag for marking virtual dentriesDavid Howells2015-02-22
| | | | | | | | | | | | | Add a DCACHE_FALLTHRU flag to indicate that, in a layered filesystem, this is a virtual dentry that covers another one in a lower layer that should be used instead. This may be recorded on medium if directory integration is stored there. The flag can be set with d_set_fallthru() and tested with d_is_fallthru(). Original-author: Valerie Aurora <vaurora@redhat.com> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* VFS: Add a whiteout dentry typeDavid Howells2015-02-22
| | | | | | | | | | Add DCACHE_WHITEOUT_TYPE and provide a d_is_whiteout() accessor function. A d_is_miss() accessor is also added for ordinary cache misses and d_is_negative() is modified to indicate either an ordinary miss or an enforced miss (whiteout). Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* VFS: Introduce inode-getting helpers for layered/unioned fs environmentsDavid Howells2015-02-22
| | | | | | | | | | | | | | | | Introduce some function for getting the inode (and also the dentry) in an environment where layered/unioned filesystems are in operation. The problem is that we have places where we need *both* the union dentry and the lower source or workspace inode or dentry available, but we can only have a handle on one of them. Therefore we need to derive the handle to the other from that. The idea is to introduce an extra field in struct dentry that allows the union dentry to refer to and pin the lower dentry. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* kill d_validate()Al Viro2015-01-25
| | | | | | no users left Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* switch d_materialise_unique() users to d_splice_alias()Al Viro2014-11-19
| | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* merge d_materialise_unique() into d_splice_alias()Al Viro2014-11-19
| | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* move d_rcu from overlapping d_child to overlapping d_aliasAl Viro2014-11-03
| | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* don't need that forward declaration of struct nameidata in dcache.h anymoreAl Viro2014-10-12
| | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* take dname_external() into fs/dcache.cAl Viro2014-10-12
| | | | | | | never used outside and it's too low-level for legitimate uses outside of fs/dcache.c anyway Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* vfs: Make d_invalidate return voidEric W. Biederman2014-10-09
| | | | | | | | | | Now that d_invalidate can no longer fail, stop returning a useless return code. For the few callers that checked the return code update remove the handling of d_invalidate failure. Reviewed-by: Miklos Szeredi <miklos@szeredi.hu> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* vfs: Merge check_submounts_and_drop and d_invalidateEric W. Biederman2014-10-09
| | | | | | | | | Now that d_invalidate is the only caller of check_submounts_and_drop, expand check_submounts_and_drop inline in d_invalidate. Reviewed-by: Miklos Szeredi <miklos@szeredi.hu> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* vfs: avoid non-forwarding large load after small store in path lookupLinus Torvalds2014-09-14
| | | | | | | | | | | | | | | | | | | | | | | | | | The performance regression that Josef Bacik reported in the pathname lookup (see commit 99d263d4c5b2 "vfs: fix bad hashing of dentries") made me look at performance stability of the dcache code, just to verify that the problem was actually fixed. That turned up a few other problems in this area. There are a few cases where we exit RCU lookup mode and go to the slow serializing case when we shouldn't, Al has fixed those and they'll come in with the next VFS pull. But my performance verification also shows that link_path_walk() turns out to have a very unfortunate 32-bit store of the length and hash of the name we look up, followed by a 64-bit read of the combined hash_len field. That screws up the processor store to load forwarding, causing an unnecessary hickup in this critical routine. It's caused by the ugly calling convention for the "hash_name()" function, and easily fixed by just making hash_name() fill in the whole 'struct qstr' rather than passing it a pointer to just the hash value. With that, the profile for this function looks much smoother. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* dcache: d_obtain_alias callers don't all want DISCONNECTEDJ. Bruce Fields2014-08-07
| | | | | | | | | | | | | | | | | | | | | There are a few d_obtain_alias callers that are using it to get the root of a filesystem which may already have an alias somewhere else. This is not the same as the filehandle-lookup case, and none of them actually need DCACHE_DISCONNECTED set. It isn't really a serious problem, but it would really be clearer if we reserved DCACHE_DISCONNECTED for those cases where it's actually needed. In the btrfs case this was causing a spurious printk from nfsd/nfsfh.c:fh_verify when it found an unexpected DCACHE_DISCONNECTED dentry. Josef worked around this by unsetting DCACHE_DISCONNECTED manually in 3a0dfa6a12e "Btrfs: unset DCACHE_DISCONNECTED when mounting default subvol", and this replaces that workaround. Cc: Josef Bacik <jbacik@fb.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* dentry_kill(): don't try to remove from shrink listAl Viro2014-05-01
| | | | | | | | | | | | | | | | | | | If the victim in on the shrink list, don't remove it from there. If shrink_dentry_list() manages to remove it from the list before we are done - fine, we'll just free it as usual. If not - mark it with new flag (DCACHE_MAY_FREE) and leave it there. Eventually, shrink_dentry_list() will get to it, remove the sucker from shrink list and call dentry_kill(dentry, 0). Which is where we'll deal with freeing. Since now dentry_kill(dentry, 0) may happen after or during dentry_kill(dentry, 1), we need to recognize that (by seeing DCACHE_DENTRY_KILLED already set), unlock everything and either free the sucker (in case DCACHE_MAY_FREE has been set) or leave it for ongoing dentry_kill(dentry, 1) to deal with. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* vfs: add cross-renameMiklos Szeredi2014-04-01
| | | | | | | | | | If flags contain RENAME_EXCHANGE then exchange source and destination files. There's no restriction on the type of the files; e.g. a directory can be exchanged with a symlink. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: J. Bruce Fields <bfields@redhat.com>
* vfs: add d_is_dir()Miklos Szeredi2014-04-01
| | | | | | | | | Add d_is_dir(dentry) helper which is analogous to S_ISDIR(). To avoid confusion, rename d_is_directory() to d_can_lookup(). Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Reviewed-by: J. Bruce Fields <bfields@redhat.com>
* dcache: allow word-at-a-time name hashing with big-endian CPUsWill Deacon2013-12-12
| | | | | | | | | | | | | | | | When explicitly hashing the end of a string with the word-at-a-time interface, we have to be careful which end of the word we pick up. On big-endian CPUs, the upper-bits will contain the data we're after, so ensure we generate our masks accordingly (and avoid hashing whatever random junk may have been sitting after the string). This patch adds a new dcache helper, bytemask_from_count, which creates a mask appropriate for the CPU endianness. Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* VFS: Put a small type field into struct dentry::d_flagsDavid Howells2013-11-09
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Put a type field into struct dentry::d_flags to indicate if the dentry is one of the following types that relate particularly to pathwalk: Miss (negative dentry) Directory "Automount" directory (defective - no i_op->lookup()) Symlink Other (regular, socket, fifo, device) The type field is set to one of the first five types on a dentry by calls to __d_instantiate() and d_obtain_alias() from information in the inode (if one is given). The type is cleared by dentry_unlink_inode() when it reconstitutes an existing dentry as a negative dentry. Accessors provided are: d_set_type(dentry, type) d_is_directory(dentry) d_is_autodir(dentry) d_is_symlink(dentry) d_is_file(dentry) d_is_negative(dentry) d_is_positive(dentry) A bunch of checks in pathname resolution switched to those. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* vfs: introduce d_instantiate_no_diralias()Miklos Szeredi2013-10-24
| | | | | | | | | | ...which just returns -EBUSY if a directory alias would be created. This is to be used by fuse mkdir to make sure that a buggy or malicious userspace filesystem doesn't do anything nasty. Previously fuse used a private mutex for this purpose, which can now go away. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
* super: fix calculation of shrinkable objects for small numbersGlauber Costa2013-09-10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The sysctl knob sysctl_vfs_cache_pressure is used to determine which percentage of the shrinkable objects in our cache we should actively try to shrink. It works great in situations in which we have many objects (at least more than 100), because the aproximation errors will be negligible. But if this is not the case, specially when total_objects < 100, we may end up concluding that we have no objects at all (total / 100 = 0, if total < 100). This is certainly not the biggest killer in the world, but may matter in very low kernel memory situations. Signed-off-by: Glauber Costa <glommer@openvz.org> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Acked-by: Mel Gorman <mgorman@suse.de> Cc: Dave Chinner <david@fromorbit.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Cc: Arve Hjønnevåg <arve@android.com> Cc: Carlos Maiolino <cmaiolino@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Chuck Lever <chuck.lever@oracle.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: David Rientjes <rientjes@google.com> Cc: Gleb Natapov <gleb@redhat.com> Cc: Greg Thelen <gthelen@google.com> Cc: J. Bruce Fields <bfields@redhat.com> Cc: Jan Kara <jack@suse.cz> Cc: Jerome Glisse <jglisse@redhat.com> Cc: John Stultz <john.stultz@linaro.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Kent Overstreet <koverstreet@google.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Thomas Hellstrom <thellstrom@vmware.com> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* fs: bump inode and dentry counters to longGlauber Costa2013-09-10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This series reworks our current object cache shrinking infrastructure in two main ways: * Noticing that a lot of users copy and paste their own version of LRU lists for objects, we put some effort in providing a generic version. It is modeled after the filesystem users: dentries, inodes, and xfs (for various tasks), but we expect that other users could benefit in the near future with little or no modification. Let us know if you have any issues. * The underlying list_lru being proposed automatically and transparently keeps the elements in per-node lists, and is able to manipulate the node lists individually. Given this infrastructure, we are able to modify the up-to-now hammer called shrink_slab to proceed with node-reclaim instead of always searching memory from all over like it has been doing. Per-node lru lists are also expected to lead to less contention in the lru locks on multi-node scans, since we are now no longer fighting for a global lock. The locks usually disappear from the profilers with this change. Although we have no official benchmarks for this version - be our guest to independently evaluate this - earlier versions of this series were performance tested (details at http://permalink.gmane.org/gmane.linux.kernel.mm/100537) yielding no visible performance regressions while yielding a better qualitative behavior in NUMA machines. With this infrastructure in place, we can use the list_lru entry point to provide memcg isolation and per-memcg targeted reclaim. Historically, those two pieces of work have been posted together. This version presents only the infrastructure work, deferring the memcg work for a later time, so we can focus on getting this part tested. You can see more about the history of such work at http://lwn.net/Articles/552769/ Dave Chinner (18): dcache: convert dentry_stat.nr_unused to per-cpu counters dentry: move to per-sb LRU locks dcache: remove dentries from LRU before putting on dispose list mm: new shrinker API shrinker: convert superblock shrinkers to new API list: add a new LRU list type inode: convert inode lru list to generic lru list code. dcache: convert to use new lru list infrastructure list_lru: per-node list infrastructure shrinker: add node awareness fs: convert inode and dentry shrinking to be node aware xfs: convert buftarg LRU to generic code xfs: rework buffer dispose list tracking xfs: convert dquot cache lru to list_lru fs: convert fs shrinkers to new scan/count API drivers: convert shrinkers to new count/scan API shrinker: convert remaining shrinkers to count/scan API shrinker: Kill old ->shrink API. Glauber Costa (7): fs: bump inode and dentry counters to long super: fix calculation of shrinkable objects for small numbers list_lru: per-node API vmscan: per-node deferred work i915: bail out earlier when shrinker cannot acquire mutex hugepage: convert huge zero page shrinker to new shrinker API list_lru: dynamically adjust node arrays This patch: There are situations in very large machines in which we can have a large quantity of dirty inodes, unused dentries, etc. This is particularly true when umounting a filesystem, where eventually since every live object will eventually be discarded. Dave Chinner reported a problem with this while experimenting with the shrinker revamp patchset. So we believe it is time for a change. This patch just moves int to longs. Machines where it matters should have a big long anyway. Signed-off-by: Glauber Costa <glommer@openvz.org> Cc: Dave Chinner <dchinner@redhat.com> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Cc: Arve Hjønnevåg <arve@android.com> Cc: Carlos Maiolino <cmaiolino@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Chuck Lever <chuck.lever@oracle.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Dave Chinner <dchinner@redhat.com> Cc: David Rientjes <rientjes@google.com> Cc: Gleb Natapov <gleb@redhat.com> Cc: Greg Thelen <gthelen@google.com> Cc: J. Bruce Fields <bfields@redhat.com> Cc: Jan Kara <jack@suse.cz> Cc: Jerome Glisse <jglisse@redhat.com> Cc: John Stultz <john.stultz@linaro.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Kent Overstreet <koverstreet@google.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Thomas Hellstrom <thellstrom@vmware.com> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* vfs: reorganize dput() memory accessesLinus Torvalds2013-09-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is me being a bit OCD after all the dentry optimization work this merge window: profiles end up showing 'dput()' as a rather expensive operation, and there were two unrelated bad reasons for that. The first reason was reading d_lockref.count for debugging purposes, which touches the lockref cacheline (for reads) before really need to. More importantly, the debugging test in question is _wrong_, and has hidden bugs. It's true that we can only sleep when the count goes down to zero, but the test as-is hides the much more subtle bug that happens if we race with somebody else deleting the file. Anyway we _will_ touch that cacheline, but let's do it for a write and in the right routine (ie in "lockref_put_or_lock()") which annotates the costs better. So remove the misleading debug code. The other was an unnecessary access to the cacheline that contains the d_lru list, just to check whether we already were on the LRU list or not. This is exactly what we have d_flags for, so that we can avoid touching extra cache lines for the common case. So just add another bit for "is this dentry on the LRU". Finally, mark the tests properly likely/unlikely, so that the common fast-paths are dense in the instruction stream. This makes the profiles look much saner. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* constify dcache.c inlined helpers where possibleAl Viro2013-09-05
| | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* vfs: check submounts and drop atomicallyMiklos Szeredi2013-09-05
| | | | | | | | | | | | | | | | | | | | | | | | | | We check submounts before doing d_drop() on a non-empty directory dentry in NFS (have_submounts()), but we do not exclude a racing mount. Process A: have_submounts() -> returns false Process B: mount() -> success Process A: d_drop() This patch prepares the ground for the fix by doing the following operations all under the same rename lock: have_submounts() shrink_dcache_parent() d_drop() This is actually an optimization since have_submounts() and shrink_dcache_parent() both traverse the same dentry tree separately. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> CC: David Howells <dhowells@redhat.com> CC: Steven Whitehouse <swhiteho@redhat.com> CC: Trond Myklebust <Trond.Myklebust@netapp.com> CC: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* vfs: reimplement d_rcu_to_refcount() using lockref_get_or_lock()Linus Torvalds2013-09-02
| | | | | | | | | | | | | | | | | | | | | This moves __d_rcu_to_refcount() from <linux/dcache.h> into fs/namei.c and re-implements it using the lockref infrastructure instead. It also adds a lot of comments about what is actually going on, because turning a dentry that was looked up using RCU into a long-lived reference counted entry is one of the more subtle parts of the rcu walk. We also used to be _particularly_ subtle in unlazy_walk() where we re-validate both the dentry and its parent using the same sequence count. We used to do it by nesting the locks and then verifying the sequence count just once. That was silly, because nested locking is expensive, but the sequence count check is not. So this just re-validates the dentry and the parent separately, avoiding the nested locking, and making the lockref lookup possible. Acked-by: Waiman Long <waiman.long@hp.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* vfs: make the dentry cache use the lockref infrastructureWaiman Long2013-08-28
| | | | | | | | | | | | | | | | | | | | | | | | | | | This just replaces the dentry count/lock combination with the lockref structure that contains both a count and a spinlock, and does the mechanical conversion to use the lockref infrastructure. There are no semantic changes here, it's purely syntactic. The reference lockref implementation uses the spinlock exactly the same way that the old dcache code did, and the bulk of this patch is just expanding the internal "d_count" use in the dcache code to use "d_lockref.count" instead. This is purely preparation for the real change to make the reference count updates be lockless during the 3.12 merge window. [ As with the previous commit, this is a rewritten version of a concept originally from Waiman, so credit goes to him, blame for any errors goes to me. Waiman's patch had some semantic differences for taking advantage of the lockless update in dget_parent(), while this patch is intentionally a pure search-and-replace change with no semantic changes. - Linus ] Signed-off-by: Waiman Long <Waiman.Long@hp.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* cope with potentially long ->d_dname() output for shmem/hugetlbAl Viro2013-08-24
| | | | | | | | | | dynamic_dname() is both too much and too little for those - the output may be well in excess of 64 bytes dynamic_dname() assumes to be enough (thanks to ashmem feeding really long names to shmem_file_setup()) and vsnprintf() is an overkill for those guys. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* vfs: constify dentry parameter in d_count()Peng Tao2013-07-20
| | | | | | | | so that it can be used in places like d_compare/d_hash without causing a compiler warning. Signed-off-by: Peng Tao <tao.peng@emc.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* helper for reading ->d_countAl Viro2013-07-05
| | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* Don't pass inode to ->d_hash() and ->d_compare()Linus Torvalds2013-06-29
| | | | | | | | | | | | Instances either don't look at it at all (the majority of cases) or only want it to find the superblock (which can be had as dentry->d_sb). A few cases that want more are actually safe with dentry->d_inode - the only precaution needed is the check that it hadn't been replaced with NULL by rmdir() or by overwriting rename(), which case should be simply treated as cache miss. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* [O_TMPFILE] it's still short a few helpers, but infrastructure should be OK ↵Al Viro2013-06-29
| | | | | | now... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* vfs: kill FS_REVAL_DOT by adding a d_weak_revalidate dentry opJeff Layton2013-02-26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The following set of operations on a NFS client and server will cause server# mkdir a client# cd a server# mv a a.bak client# sleep 30 # (or whatever the dir attrcache timeout is) client# stat . stat: cannot stat `.': Stale NFS file handle Obviously, we should not be getting an ESTALE error back there since the inode still exists on the server. The problem is that the lookup code will call d_revalidate on the dentry that "." refers to, because NFS has FS_REVAL_DOT set. nfs_lookup_revalidate will see that the parent directory has changed and will try to reverify the dentry by redoing a LOOKUP. That of course fails, so the lookup code returns ESTALE. The problem here is that d_revalidate is really a bad fit for this case. What we really want to know at this point is whether the inode is still good or not, but we don't really care what name it goes by or whether the dcache is still valid. Add a new d_op->d_weak_revalidate operation and have complete_walk call that instead of d_revalidate. The intent there is to allow for a "weaker" d_revalidate that just checks to see whether the inode is still good. This is also gives us an opportunity to kill off the FS_REVAL_DOT special casing. [AV: changed method name, added note in porting, fixed confusion re having it possibly called from RCU mode (it won't be)] Cc: NeilBrown <neilb@suse.de> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* constify d_lookup() argumentsAl Viro2013-02-22
| | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* constify __d_lookup() argumentsAl Viro2013-02-22
| | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>