summaryrefslogtreecommitdiff
path: root/include/linux/fs.h (follow)
Commit message (Collapse)AuthorAge
* Merge 4.4.123 into android-4.4Greg Kroah-Hartman2018-03-22
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Changes in 4.4.123 blkcg: fix double free of new_blkg in blkcg_init_queue Input: tsc2007 - check for presence and power down tsc2007 during probe staging: speakup: Replace BUG_ON() with WARN_ON(). staging: wilc1000: add check for kmalloc allocation failure. HID: reject input outside logical range only if null state is set drm: qxl: Don't alloc fbdev if emulation is not supported ath10k: fix a warning during channel switch with multiple vaps PCI/MSI: Stop disabling MSI/MSI-X in pci_device_shutdown() selinux: check for address length in selinux_socket_bind() perf sort: Fix segfault with basic block 'cycles' sort dimension i40e: Acquire NVM lock before reads on all devices i40e: fix ethtool to get EEPROM data from X722 interface perf tools: Make perf_event__synthesize_mmap_events() scale drivers: net: xgene: Fix hardware checksum setting drm: Defer disabling the vblank IRQ until the next interrupt (for instant-off) ath10k: disallow DFS simulation if DFS channel is not enabled perf probe: Return errno when not hitting any event HID: clamp input to logical range if no null state net/8021q: create device with all possible features in wanted_features ARM: dts: Adjust moxart IRQ controller and flags batman-adv: handle race condition for claims between gateways of: fix of_device_get_modalias returned length when truncating buffers solo6x10: release vb2 buffers in solo_stop_streaming() scsi: ipr: Fix missed EH wakeup media: i2c/soc_camera: fix ov6650 sensor getting wrong clock timers, sched_clock: Update timeout for clock wrap sysrq: Reset the watchdog timers while displaying high-resolution timers Input: qt1070 - add OF device ID table sched: act_csum: don't mangle TCP and UDP GSO packets ASoC: rcar: ssi: don't set SSICR.CKDV = 000 with SSIWSR.CONT spi: omap2-mcspi: poll OMAP2_MCSPI_CHSTAT_RXS for PIO transfer tcp: sysctl: Fix a race to avoid unexpected 0 window from space dmaengine: imx-sdma: add 1ms delay to ensure SDMA channel is stopped driver: (adm1275) set the m,b and R coefficients correctly for power mm: Fix false-positive VM_BUG_ON() in page_cache_{get,add}_speculative() blk-throttle: make sure expire time isn't too big f2fs: relax node version check for victim data in gc bonding: refine bond_fold_stats() wrap detection braille-console: Fix value returned by _braille_console_setup drm/vmwgfx: Fixes to vmwgfx_fb vxlan: vxlan dev should inherit lowerdev's gso_max_size NFC: nfcmrvl: Include unaligned.h instead of access_ok.h NFC: nfcmrvl: double free on error path ARM: dts: r8a7790: Correct parent of SSI[0-9] clocks ARM: dts: r8a7791: Correct parent of SSI[0-9] clocks powerpc: Avoid taking a data miss on every userspace instruction miss net/faraday: Add missing include of of.h ARM: dts: koelsch: Correct clock frequency of X2 DU clock input reiserfs: Make cancel_old_flush() reliable ALSA: firewire-digi00x: handle all MIDI messages on streaming packets fm10k: correctly check if interface is removed scsi: ses: don't get power status of SES device slot on probe apparmor: Make path_max parameter readonly iommu/iova: Fix underflow bug in __alloc_and_insert_iova_range video: ARM CLCD: fix dma allocation size drm/radeon: Fail fb creation from imported dma-bufs. drm/amdgpu: Fail fb creation from imported dma-bufs. (v2) coresight: Fixes coresight DT parse to get correct output port ID. MIPS: BPF: Quit clobbering callee saved registers in JIT code. MIPS: BPF: Fix multiple problems in JIT skb access helpers. MIPS: r2-on-r6-emu: Fix BLEZL and BGTZL identification MIPS: r2-on-r6-emu: Clear BLTZALL and BGEZALL debugfs counters regulator: isl9305: fix array size md/raid6: Fix anomily when recovering a single device in RAID6. usb: dwc2: Make sure we disconnect the gadget state usb: gadget: dummy_hcd: Fix wrong power status bit clear/reset in dummy_hub_control() drivers/perf: arm_pmu: handle no platform_device perf inject: Copy events when reordering events in pipe mode perf session: Don't rely on evlist in pipe mode scsi: sg: check for valid direction before starting the request scsi: sg: close race condition in sg_remove_sfp_usercontext() kprobes/x86: Fix kprobe-booster not to boost far call instructions kprobes/x86: Set kprobes pages read-only pwm: tegra: Increase precision in PWM rate calculation wil6210: fix memory access violation in wil_memcpy_from/toio_32 drm/edid: set ELD connector type in drm_edid_to_eld() video/hdmi: Allow "empty" HDMI infoframes HID: elo: clear BTN_LEFT mapping ARM: dts: exynos: Correct Trats2 panel reset line sched: Stop switched_to_rt() from sending IPIs to offline CPUs sched: Stop resched_cpu() from sending IPIs to offline CPUs test_firmware: fix setting old custom fw path back on exit net: xfrm: allow clearing socket xfrm policies. mtd: nand: fix interpretation of NAND_CMD_NONE in nand_command[_lp]() ARM: dts: am335x-pepper: Fix the audio CODEC's reset pin ARM: dts: omap3-n900: Fix the audio CODEC's reset pin ath10k: update tdls teardown state to target cpufreq: Fix governor module removal race clk: qcom: msm8916: fix mnd_width for codec_digcodec ath10k: fix invalid STS_CAP_OFFSET_MASK tools/usbip: fixes build with musl libc toolchain spi: sun6i: disable/unprepare clocks on remove scsi: core: scsi_get_device_flags_keyed(): Always return device flags scsi: devinfo: apply to HP XP the same flags as Hitachi VSP scsi: dh: add new rdac devices media: cpia2: Fix a couple off by one bugs veth: set peer GSO values drm/amdkfd: Fix memory leaks in kfd topology agp/intel: Flush all chipset writes after updating the GGTT mac80211_hwsim: enforce PS_MANUAL_POLL to be set after PS_ENABLED mac80211: remove BUG() when interface type is invalid ASoC: nuc900: Fix a loop timeout test ipvlan: add L2 check for packets arriving via virtual devices rcutorture/configinit: Fix build directory error message ima: relax requiring a file signature for new files with zero length selftests/x86/entry_from_vm86: Exit with 1 if we fail selftests/x86: Add tests for User-Mode Instruction Prevention selftests/x86: Add tests for the STR and SLDT instructions selftests/x86/entry_from_vm86: Add test cases for POPF x86/vm86/32: Fix POPF emulation x86/mm: Fix vmalloc_fault to use pXd_large ALSA: pcm: Fix UAF in snd_pcm_oss_get_formats() ALSA: hda - Revert power_save option default value ALSA: seq: Fix possible UAF in snd_seq_check_queue() ALSA: seq: Clear client entry before deleting else at closing drm/amdgpu/dce: Don't turn off DP sink when disconnected fs: Teach path_connected to handle nfs filesystems with multiple roots. lock_parent() needs to recheck if dentry got __dentry_kill'ed under it fs/aio: Add explicit RCU grace period when freeing kioctx fs/aio: Use RCU accessors for kioctx_table->table[] irqchip/gic-v3-its: Ensure nr_ites >= nr_lpis scsi: sg: fix SG_DXFER_FROM_DEV transfers scsi: sg: fix static checker warning in sg_is_valid_dxfer scsi: sg: only check for dxfer_len greater than 256M ARM: dts: LogicPD Torpedo: Fix I2C1 pinmux btrfs: alloc_chunk: fix DUP stripe size handling btrfs: Fix use-after-free when cleaning up fs_devs with a single stale device USB: gadget: udc: Add missing platform_device_put() on error in bdc_pci_probe() usb: gadget: bdc: 64-bit pointer capability check bpf: fix incorrect sign extension in check_alu_op() Linux 4.4.123 Change-Id: Ieb89411248f93522dde29edb8581f8ece22e33a7 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
| * fs: Teach path_connected to handle nfs filesystems with multiple roots.Eric W. Biederman2018-03-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 95dd77580ccd66a0da96e6d4696945b8cea39431 upstream. On nfsv2 and nfsv3 the nfs server can export subsets of the same filesystem and report the same filesystem identifier, so that the nfs client can know they are the same filesystem. The subsets can be from disjoint directory trees. The nfsv2 and nfsv3 filesystems provides no way to find the common root of all directory trees exported form the server with the same filesystem identifier. The practical result is that in struct super s_root for nfs s_root is not necessarily the root of the filesystem. The nfs mount code sets s_root to the root of the first subset of the nfs filesystem that the kernel mounts. This effects the dcache invalidation code in generic_shutdown_super currently called shrunk_dcache_for_umount and that code for years has gone through an additional list of dentries that might be dentry trees that need to be freed to accomodate nfs. When I wrote path_connected I did not realize nfs was so special, and it's hueristic for avoiding calling is_subdir can fail. The practical case where this fails is when there is a move of a directory from the subtree exposed by one nfs mount to the subtree exposed by another nfs mount. This move can happen either locally or remotely. With the remote case requiring that the move directory be cached before the move and that after the move someone walks the path to where the move directory now exists and in so doing causes the already cached directory to be moved in the dcache through the magic of d_splice_alias. If someone whose working directory is in the move directory or a subdirectory and now starts calling .. from the initial mount of nfs (where s_root == mnt_root), then path_connected as a heuristic will not bother with the is_subdir check. As s_root really is not the root of the nfs filesystem this heuristic is wrong, and the path may actually not be connected and path_connected can fail. The is_subdir function might be cheap enough that we can call it unconditionally. Verifying that will take some benchmarking and the result may not be the same on all kernels this fix needs to be backported to. So I am avoiding that for now. Filesystems with snapshots such as nilfs and btrfs do something similar. But as the directory tree of the snapshots are disjoint from one another and from the main directory tree rename won't move things between them and this problem will not occur. Cc: stable@vger.kernel.org Reported-by: Al Viro <viro@ZenIV.linux.org.uk> Fixes: 397d425dc26d ("vfs: Test for and handle paths that are unreachable from their mnt_root") Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* | Merge 4.4.116 into android-4.4Greg Kroah-Hartman2018-02-20
|\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Changes in 4.4.116 powerpc/bpf/jit: Disable classic BPF JIT on ppc64le powerpc/64: Fix flush_(d|i)cache_range() called from modules powerpc: Fix VSX enabling/flushing to also test MSR_FP and MSR_VEC powerpc: Simplify module TOC handling powerpc/pseries: Add H_GET_CPU_CHARACTERISTICS flags & wrapper powerpc/64: Add macros for annotating the destination of rfid/hrfid powerpc/64s: Simple RFI macro conversions powerpc/64: Convert fast_exception_return to use RFI_TO_USER/KERNEL powerpc/64: Convert the syscall exit path to use RFI_TO_USER/KERNEL powerpc/64s: Convert slb_miss_common to use RFI_TO_USER/KERNEL powerpc/64s: Add support for RFI flush of L1-D cache powerpc/64s: Support disabling RFI flush with no_rfi_flush and nopti powerpc/pseries: Query hypervisor for RFI flush settings powerpc/powernv: Check device-tree for RFI flush settings powerpc/64s: Wire up cpu_show_meltdown() powerpc/64s: Allow control of RFI flush via debugfs ASoC: pcm512x: add missing MODULE_DESCRIPTION/AUTHOR/LICENSE usbip: vhci_hcd: clear just the USB_PORT_STAT_POWER bit usbip: fix 3eee23c3ec14 tcp_socket address still in the status file net: cdc_ncm: initialize drvflags before usage ASoC: simple-card: Fix misleading error message ASoC: rsnd: don't call free_irq() on Parent SSI ASoC: rsnd: avoid duplicate free_irq() drm: rcar-du: Use the VBK interrupt for vblank events drm: rcar-du: Fix race condition when disabling planes at CRTC stop x86/asm: Fix inline asm call constraints for GCC 4.4 ip6mr: fix stale iterator net: igmp: add a missing rcu locking section qlcnic: fix deadlock bug r8169: fix RTL8168EP take too long to complete driver initialization. tcp: release sk_frag.page in tcp_disconnect vhost_net: stop device during reset owner media: soc_camera: soc_scale_crop: add missing MODULE_DESCRIPTION/AUTHOR/LICENSE KEYS: encrypted: fix buffer overread in valid_master_desc() don't put symlink bodies in pagecache into highmem crypto: tcrypt - fix S/G table for test_aead_speed() x86/microcode/AMD: Do not load when running on a hypervisor x86/microcode: Do the family check first powerpc/pseries: include linux/types.h in asm/hvcall.h cifs: Fix missing put_xid in cifs_file_strict_mmap cifs: Fix autonegotiate security settings mismatch CIFS: zero sensitive data when freeing dmaengine: dmatest: fix container_of member in dmatest_callback x86/kaiser: fix build error with KASAN && !FUNCTION_GRAPH_TRACER kaiser: fix compile error without vsyscall netfilter: nf_queue: Make the queue_handler pernet posix-timer: Properly check sigevent->sigev_notify usb: gadget: uvc: Missing files for configfs interface sched/rt: Use container_of() to get root domain in rto_push_irq_work_func() sched/rt: Up the root domain ref count when passing it around via IPIs dccp: CVE-2017-8824: use-after-free in DCCP code media: dvb-usb-v2: lmedm04: Improve logic checking of warm start media: dvb-usb-v2: lmedm04: move ts2020 attach to dm04_lme2510_tuner mtd: cfi: convert inline functions to macros mtd: nand: brcmnand: Disable prefetch by default mtd: nand: Fix nand_do_read_oob() return value mtd: nand: sunxi: Fix ECC strength choice ubi: block: Fix locking for idr_alloc/idr_remove nfs/pnfs: fix nfs_direct_req ref leak when i/o falls back to the mds NFS: Add a cond_resched() to nfs_commit_release_pages() NFS: commit direct writes even if they fail partially NFS: reject request for id_legacy key without auxdata kernfs: fix regression in kernfs_fop_write caused by wrong type ahci: Annotate PCI ids for mobile Intel chipsets as such ahci: Add PCI ids for Intel Bay Trail, Cherry Trail and Apollo Lake AHCI ahci: Add Intel Cannon Lake PCH-H PCI ID crypto: hash - introduce crypto_hash_alg_has_setkey() crypto: cryptd - pass through absence of ->setkey() crypto: poly1305 - remove ->setkey() method nsfs: mark dentry with DCACHE_RCUACCESS media: v4l2-ioctl.c: don't copy back the result for -ENOTTY vb2: V4L2_BUF_FLAG_DONE is set after DQBUF media: v4l2-compat-ioctl32.c: add missing VIDIOC_PREPARE_BUF media: v4l2-compat-ioctl32.c: fix the indentation media: v4l2-compat-ioctl32.c: move 'helper' functions to __get/put_v4l2_format32 media: v4l2-compat-ioctl32.c: avoid sizeof(type) media: v4l2-compat-ioctl32.c: copy m.userptr in put_v4l2_plane32 media: v4l2-compat-ioctl32.c: fix ctrl_is_pointer media: v4l2-compat-ioctl32.c: make ctrl_is_pointer work for subdevs media: v4l2-compat-ioctl32: Copy v4l2_window->global_alpha media: v4l2-compat-ioctl32.c: copy clip list in put_v4l2_window32 media: v4l2-compat-ioctl32.c: drop pr_info for unknown buffer type media: v4l2-compat-ioctl32.c: don't copy back the result for certain errors media: v4l2-compat-ioctl32.c: refactor compat ioctl32 logic crypto: caam - fix endless loop when DECO acquire fails arm: KVM: Fix SMCCC handling of unimplemented SMC/HVC calls KVM: nVMX: Fix races when sending nested PI while dest enters/leaves L2 watchdog: imx2_wdt: restore previous timeout after suspend+resume media: ts2020: avoid integer overflows on 32 bit machines media: cxusb, dib0700: ignore XC2028_I2C_FLUSH kernel/async.c: revert "async: simplify lowest_in_progress()" HID: quirks: Fix keyboard + touchpad on Toshiba Click Mini not working Bluetooth: btsdio: Do not bind to non-removable BCM43341 Revert "Bluetooth: btusb: fix QCA Rome suspend/resume" Bluetooth: btusb: Restore QCA Rome suspend/resume fix with a "rewritten" version signal/openrisc: Fix do_unaligned_access to send the proper signal signal/sh: Ensure si_signo is initialized in do_divide_error alpha: fix crash if pthread_create races with signal delivery alpha: fix reboot on Avanti platform xtensa: fix futex_atomic_cmpxchg_inatomic EDAC, octeon: Fix an uninitialized variable warning pktcdvd: Fix pkt_setup_dev() error path btrfs: Handle btrfs_set_extent_delalloc failure in fixup worker nvme: Fix managing degraded controllers ACPI: sbshc: remove raw pointer from printk() message ovl: fix failure to fsync lower dir mn10300/misalignment: Use SIGSEGV SEGV_MAPERR to report a failed user copy ftrace: Remove incorrect setting of glob search field Linux 4.4.116 Change-Id: Id000cb8d59b74de063902e9ad24dd07fe1b1694b Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
| * don't put symlink bodies in pagecache into highmemAl Viro2018-02-16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 21fc61c73c3903c4c312d0802da01ec2b323d174 upstream. kmap() in page_follow_link_light() needed to go - allowing to hold an arbitrary number of kmaps for long is a great way to deadlocking the system. new helper (inode_nohighmem(inode)) needs to be used for pagecache symlinks inodes; done for all in-tree cases. page_follow_link_light() instrumented to yell about anything missed. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Jin Qian <jinqian@google.com> Signed-off-by: Jin Qian <jinqian@android.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* | fscrypt: updates on 4.15-rc4Jaegeuk Kim2018-01-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Cherry-picked from origin/upstream-f2fs-stable-linux-4.4.y: ba1ade71012d fscrypt: resolve some cherry-pick bugs 9e32f17d241b fscrypt: move to generic async completion 4ecacbed6e1c crypto: introduce crypto wait for async op 42d89da82b25 fscrypt: lock mutex before checking for bounce page pool 2286508d17c2 fscrypt: new helper function - fscrypt_prepare_setattr() 5cbdd42ad248 fscrypt: new helper function - fscrypt_prepare_lookup() a31feba5c18f fscrypt: new helper function - fscrypt_prepare_rename() 95efafb6239d fscrypt: new helper function - fscrypt_prepare_link() 2b4b4f98dddf fscrypt: new helper function - fscrypt_file_open() 8c815f381cd6 fscrypt: new helper function - fscrypt_require_key() 272e43502577 fscrypt: remove unneeded empty fscrypt_operations structs 1034eeec516a fscrypt: remove ->is_encrypted() 32c0d3ae9d66 fscrypt: switch from ->is_encrypted() to IS_ENCRYPTED() a4781dd1f175 fs, fscrypt: add an S_ENCRYPTED inode flag ff0a3dbc9392 fscrypt: clean up include file mess bc4a61c60bea fscrypt: fix dereference of NULL user_key_payload a53dc7e00559 fscrypt: make ->dummy_context() return bool Change-Id: I461d742adc7b77177df91429a1fd9c8624a698d6 Signed-off-by: Jaegeuk Kim <jaegeuk@google.com>
* | f2fs: backport from (4c1fad64 - Merge tag 'for-f2fs-4.9' of ↵Jaegeuk Kim2017-09-25
| | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs) Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* | Merge 4.4.72 into android-4.4Greg Kroah-Hartman2017-06-14
|\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Changes in 4.4.72 bnx2x: Fix Multi-Cos ipv6: xfrm: Handle errors reported by xfrm6_find_1stfragopt() cxgb4: avoid enabling napi twice to the same queue tcp: disallow cwnd undo when switching congestion control vxlan: fix use-after-free on deletion ipv6: Fix leak in ipv6_gso_segment(). net: ping: do not abuse udp_poll() net: ethoc: enable NAPI before poll may be scheduled net: bridge: start hello timer only if device is up sparc64: mm: fix copy_tsb to correctly copy huge page TSBs sparc: Machine description indices can vary sparc64: reset mm cpumask after wrap sparc64: combine activate_mm and switch_mm sparc64: redefine first version sparc64: add per-cpu mm of secondary contexts sparc64: new context wrap sparc64: delete old wrap code arch/sparc: support NR_CPUS = 4096 serial: ifx6x60: fix use-after-free on module unload ptrace: Properly initialize ptracer_cred on fork KEYS: fix dereferencing NULL payload with nonzero length KEYS: fix freeing uninitialized memory in key_update() crypto: gcm - wait for crypto op not signal safe drm/amdgpu/ci: disable mclk switching for high refresh rates (v2) nfsd4: fix null dereference on replay nfsd: Fix up the "supattr_exclcreat" attributes kvm: async_pf: fix rcu_irq_enter() with irqs enabled KVM: cpuid: Fix read/write out-of-bounds vulnerability in cpuid emulation arm: KVM: Allow unaligned accesses at HYP KVM: async_pf: avoid async pf injection when in guest mode dmaengine: usb-dmac: Fix DMAOR AE bit definition dmaengine: ep93xx: Always start from BASE0 xen/privcmd: Support correctly 64KB page granularity when mapping memory xen-netfront: do not cast grant table reference to signed short xen-netfront: cast grant table reference first to type int ext4: fix SEEK_HOLE ext4: keep existing extra fields when inode expands ext4: fix fdatasync(2) after extent manipulation operations usb: gadget: f_mass_storage: Serialize wake and sleep execution usb: chipidea: udc: fix NULL pointer dereference if udc_start failed usb: chipidea: debug: check before accessing ci_role staging/lustre/lov: remove set_fs() call from lov_getstripe() iio: light: ltr501 Fix interchanged als/ps register field iio: proximity: as3935: fix AS3935_INT mask drivers: char: random: add get_random_long() random: properly align get_random_int_hash stackprotector: Increase the per-task stack canary's random range from 32 bits to 64 bits on 64-bit platforms cpufreq: cpufreq_register_driver() should return -ENODEV if init fails target: Re-add check to reject control WRITEs with overflow data drm/msm: Expose our reservation object when exporting a dmabuf. Input: elantech - add Fujitsu Lifebook E546/E557 to force crc_enabled cpuset: consider dying css as offline fs: add i_blocksize() ufs: restore proper tail allocation fix ufs_isblockset() ufs: restore maintaining ->i_blocks ufs: set correct ->s_maxsize ufs_extend_tail(): fix the braino in calling conventions of ufs_new_fragments() ufs_getfrag_block(): we only grab ->truncate_mutex on block creation path cxl: Fix error path on bad ioctl btrfs: use correct types for page indices in btrfs_page_exists_in_range btrfs: fix memory leak in update_space_info failure path KVM: arm/arm64: Handle possible NULL stage2 pud when ageing pages scsi: qla2xxx: don't disable a not previously enabled PCI device powerpc/eeh: Avoid use after free in eeh_handle_special_event() powerpc/numa: Fix percpu allocations to be NUMA aware powerpc/hotplug-mem: Fix missing endian conversion of aa_index perf/core: Drop kernel samples even though :u is specified drm/vmwgfx: Handle vmalloc() failure in vmw_local_fifo_reserve() drm/vmwgfx: limit the number of mip levels in vmw_gb_surface_define_ioctl() drm/vmwgfx: Make sure backup_handle is always valid drm/nouveau/tmr: fully separate alarm execution/pending lists ALSA: timer: Fix race between read and ioctl ALSA: timer: Fix missing queue indices reset at SNDRV_TIMER_IOCTL_SELECT ASoC: Fix use-after-free at card unregistration drivers: char: mem: Fix wraparound check to allow mappings up to the end tty: Drop krefs for interrupted tty lock serial: sh-sci: Fix panic when serial console and DMA are enabled net: better skb->sender_cpu and skb->napi_id cohabitation mm: consider memblock reservations for deferred memory initialization sizing NFS: Ensure we revalidate attributes before using execute_ok() NFSv4: Don't perform cached access checks before we've OPENed the file Make __xfs_xattr_put_listen preperly report errors. arm64: hw_breakpoint: fix watchpoint matching for tagged pointers arm64: entry: improve data abort handling of tagged pointers RDMA/qib,hfi1: Fix MR reference count leak on write with immediate usercopy: Adjust tests to deal with SMAP/PAN arm64: armv8_deprecated: ensure extension of addr arm64: ensure extension of smp_store_release value Linux 4.4.72 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
| * fs: add i_blocksize()Fabian Frederick2017-06-14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 93407472a21b82f39c955ea7787e5bc7da100642 upstream. Replace all 1 << inode->i_blkbits and (1 << inode->i_blkbits) in fs branch. This patch also fixes multiple checkpatch warnings: WARNING: Prefer 'unsigned int' to bare use of 'unsigned' Thanks to Andrew Morton for suggesting more appropriate function instead of macro. [geliangtang@gmail.com: truncate: use i_blocksize()] Link: http://lkml.kernel.org/r/9c8b2cd83c8f5653805d43debde9fa8817e02fc4.1484895804.git.geliangtang@gmail.com Link: http://lkml.kernel.org/r/1481319905-10126-1-git-send-email-fabf@skynet.be Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Geliang Tang <geliangtang@gmail.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* | Merge remote-tracking branch 'common/android-4.4' into android-4.4.yDmitry Shmidt2017-02-15
|\ \ | |/ |/| | | Change-Id: Icf907f5067fb6da5935ab0d3271df54b8d5df405
| * ANDROID: vfs: Add setattr2 for filesystems with per mount permissionsDaniel Rosenberg2017-01-26
| | | | | | | | | | | | | | | | | | | | This allows filesystems to use their mount private data to influence the permssions they use in setattr2. It has been separated into a new call to avoid disrupting current setattr users. Change-Id: I19959038309284448f1b7f232d579674ef546385 Signed-off-by: Daniel Rosenberg <drosen@google.com>
| * ANDROID: vfs: Add permission2 for filesystems with per mount permissionsDaniel Rosenberg2017-01-26
| | | | | | | | | | | | | | | | | | | | This allows filesystems to use their mount private data to influence the permssions they return in permission2. It has been separated into a new call to avoid disrupting current permission users. Change-Id: I9d416e3b8b6eca84ef3e336bd2af89ddd51df6ca Signed-off-by: Daniel Rosenberg <drosen@google.com>
| * ANDROID: vfs: Allow filesystems to access their private mount dataDaniel Rosenberg2017-01-26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Now we pass the vfsmount when mounting and remounting. This allows the filesystem to actually set up the mount specific data, although we can't quite do anything with it yet. show_options is expanded to include data that lives with the mount. To avoid changing existing filesystems, these have been added as new vfs functions. Change-Id: If80670bfad9f287abb8ac22457e1b034c9697097 Signed-off-by: Daniel Rosenberg <drosen@google.com>
| * ANDROID: mnt: Add filesystem private data to mount pointsDaniel Rosenberg2017-01-26
| | | | | | | | | | | | | | | | | | | | | | This starts to add private data associated directly to mount points. The intent is to give filesystems a sense of where they have come from, as a means of letting a filesystem take different actions based on this information. Change-Id: Ie769d7b3bb2f5972afe05c1bf16cf88c91647ab2 Signed-off-by: Daniel Rosenberg <drosen@google.com>
* | vfs: move permission checking into notify_change() for utimes(NULL)Miklos Szeredi2016-10-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit f2b20f6ee842313a0d681dbbf7f87b70291a6a3b upstream. This fixes a bug where the permission was not properly checked in overlayfs. The testcase is ltp/utimensat01. It is also cleaner and safer to do the permission checking in the vfs helper instead of the caller. This patch introduces an additional ia_valid flag ATTR_TOUCH (since touch(1) is the most obvious user of utimes(NULL)) that is passed into notify_change whenever the conditions for this special permission checking mode are met. Reported-by: Aihua Zhang <zhangaihua1@huawei.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Tested-by: Aihua Zhang <zhangaihua1@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* | wrappers for ->i_mutex accessAl Viro2016-09-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 5955102c9984fa081b2d570cfac75c97eecf8f3b upstream parallel to mutex_{lock,unlock,trylock,is_locked,lock_nested}, inode_foo(inode) being mutex_foo(&inode->i_mutex). Please, use those for access to ->i_mutex; over the coming cycle ->i_mutex will become rwsem, with ->lookup() done with it held only shared. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> [only the fs.h change included to make backports easier - gregkh] Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* | fs: add file_dentry()Miklos Szeredi2016-04-20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit d101a125954eae1d397adda94ca6319485a50493 upstream. This series fixes bugs in nfs and ext4 due to 4bacc9c9234c ("overlayfs: Make f_path always point to the overlay and f_inode to the underlay"). Regular files opened on overlayfs will result in the file being opened on the underlying filesystem, while f_path points to the overlayfs mount/dentry. This confuses filesystems which get the dentry from struct file and assume it's theirs. Add a new helper, file_dentry() [*], to get the filesystem's own dentry from the file. This checks file->f_path.dentry->d_flags against DCACHE_OP_REAL, and returns file->f_path.dentry if DCACHE_OP_REAL is not set (this is the common, non-overlayfs case). In the uncommon case it will call into overlayfs's ->d_real() to get the underlying dentry, matching file_inode(file). The reason we need to check against the inode is that if the file is copied up while being open, d_real() would return the upper dentry, while the open file comes from the lower dentry. [*] If possible, it's better simply to use file_inode() instead. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Tested-by: Goldwyn Rodrigues <rgoldwyn@suse.com> Reviewed-by: Trond Myklebust <trond.myklebust@primarydata.com> Cc: David Howells <dhowells@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Daniel Axtens <dja@axtens.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* | fs/coredump: prevent fsuid=0 dumps into user-controlled directoriesJann Horn2016-04-12
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 378c6520e7d29280f400ef2ceaf155c86f05a71a upstream. This commit fixes the following security hole affecting systems where all of the following conditions are fulfilled: - The fs.suid_dumpable sysctl is set to 2. - The kernel.core_pattern sysctl's value starts with "/". (Systems where kernel.core_pattern starts with "|/" are not affected.) - Unprivileged user namespace creation is permitted. (This is true on Linux >=3.8, but some distributions disallow it by default using a distro patch.) Under these conditions, if a program executes under secure exec rules, causing it to run with the SUID_DUMP_ROOT flag, then unshares its user namespace, changes its root directory and crashes, the coredump will be written using fsuid=0 and a path derived from kernel.core_pattern - but this path is interpreted relative to the root directory of the process, allowing the attacker to control where a coredump will be written with root privileges. To fix the security issue, always interpret core_pattern for dumps that are written under SUID_DUMP_ROOT relative to the root directory of init. Signed-off-by: Jann Horn <jann@thejh.net> Acked-by: Kees Cook <keescook@chromium.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* Merge branch 'for-linus-2' of ↵Linus Torvalds2015-11-11
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs update from Al Viro: - misc stable fixes - trivial kernel-doc and comment fixups - remove never-used block_page_mkwrite() wrapper function, and rename the function that is _actually_ used to not have double underscores. * 'for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: fs: 9p: cache.h: Add #define of include guard vfs: remove stale comment in inode_operations vfs: remove unused wrapper block_page_mkwrite() binfmt_elf: Correct `arch_check_elf's description fs: fix writeback.c kernel-doc warnings fs: fix inode.c kernel-doc warning fs/pipe.c: return error code rather than 0 in pipe_write() fs/pipe.c: preserve alloc_file() error code binfmt_elf: Don't clobber passed executable's file header FS-Cache: Handle a write to the page immediately beyond the EOF marker cachefiles: perform test on s_blocksize when opening cache file. FS-Cache: Don't override netfs's primary_index if registering failed FS-Cache: Increase reference of parent after registering, netfs success debugfs: fix refcount imbalance in start_creating
| * vfs: remove stale comment in inode_operationsRoss Zwisler2015-11-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The big warning comment that is currently at the end of struct inode_operations was added as part of this commit: 4aa7c6346be3 ("vfs: add i_op->dentry_open()") It was added to warn people not to use the newly added 'dentry_open' function pointer. This function pointer was removed as part of this commit: 4bacc9c9234c ("overlayfs: Make f_path always point to the overlay and f_inode to the underlay") The comment was left behind and now refers to nothing, so remove it. Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | Merge branch 'for-4.4/io-poll' of git://git.kernel.dk/linux-blockLinus Torvalds2015-11-10
|\ \ | |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull block IO poll support from Jens Axboe: "Various groups have been doing experimentation around IO polling for (really) fast devices. The code has been reviewed and has been sitting on the side for a few releases, but this is now good enough for coordinated benchmarking and further experimentation. Currently O_DIRECT sync read/write are supported. A framework is in the works that allows scalable stats tracking so we can auto-tune this. And we'll add libaio support as well soon. Fow now, it's an opt-in feature for test purposes" * 'for-4.4/io-poll' of git://git.kernel.dk/linux-block: direct-io: be sure to assign dio->bio_bdev for both paths directio: add block polling support NVMe: add blk polling support block: add block polling support blk-mq: return tag/queue combo in the make_request_fn handlers block: change ->make_request_fn() and users to return a queue cookie
| * block: change ->make_request_fn() and users to return a queue cookieJens Axboe2015-11-07
| | | | | | | | | | | | | | | | | | No functional changes in this patch, but it prepares us for returning a more useful cookie related to the IO that was queued up. Signed-off-by: Jens Axboe <axboe@fb.com> Acked-by: Christoph Hellwig <hch@lst.de> Acked-by: Keith Busch <keith.busch@intel.com>
* | Merge branch 'akpm' (patches from Andrew)Linus Torvalds2015-11-05
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Merge patch-bomb from Andrew Morton: - inotify tweaks - some ocfs2 updates (many more are awaiting review) - various misc bits - kernel/watchdog.c updates - Some of mm. I have a huge number of MM patches this time and quite a lot of it is quite difficult and much will be held over to next time. * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (162 commits) selftests: vm: add tests for lock on fault mm: mlock: add mlock flags to enable VM_LOCKONFAULT usage mm: introduce VM_LOCKONFAULT mm: mlock: add new mlock system call mm: mlock: refactor mlock, munlock, and munlockall code kasan: always taint kernel on report mm, slub, kasan: enable user tracking by default with KASAN=y kasan: use IS_ALIGNED in memory_is_poisoned_8() kasan: Fix a type conversion error lib: test_kasan: add some testcases kasan: update reference to kasan prototype repo kasan: move KASAN_SANITIZE in arch/x86/boot/Makefile kasan: various fixes in documentation kasan: update log messages kasan: accurately determine the type of the bad access kasan: update reported bug types for kernel memory accesses kasan: update reported bug types for not user nor kernel memory accesses mm/kasan: prevent deadlock in kasan reporting mm/kasan: don't use kasan shadow pointer in generic functions mm/kasan: MODULE_VADDR is not available on all archs ...
| * | mm/filemap.c: make global sync not clear error status of individual inodesJunichi Nomura2015-11-05
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | filemap_fdatawait() is a function to wait for on-going writeback to complete but also consume and clear error status of the mapping set during writeback. The latter functionality is critical for applications to detect writeback error with system calls like fsync(2)/fdatasync(2). However filemap_fdatawait() is also used by sync(2) or FIFREEZE ioctl, which don't check error status of individual mappings. As a result, fsync() may not be able to detect writeback error if events happen in the following order: Application System admin ---------------------------------------------------------- write data on page cache Run sync command writeback completes with error filemap_fdatawait() clears error fsync returns success (but the data is not on disk) This patch adds filemap_fdatawait_keep_errors() for call sites where writeback error is not handled so that they don't clear error status. Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Acked-by: Andi Kleen <ak@linux.intel.com> Reviewed-by: Tejun Heo <tj@kernel.org> Cc: Fengguang Wu <fengguang.wu@gmail.com> Cc: Dave Chinner <david@fromorbit.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | locks: cleanup posix_lock_inode_wait and flock_lock_inode_waitBenjamin Coddington2015-10-22
| | | | | | | | | | | | | | All callers use locks_lock_inode_wait() instead. Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
* | locks: introduce locks_lock_inode_wait()Benjamin Coddington2015-10-22
|/ | | | | | | | | | Users of the locks API commonly call either posix_lock_file_wait() or flock_lock_file_wait() depending upon the lock type. Add a new function locks_lock_inode_wait() which will check and call the correct function for the type of lock passed in. Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
* dax: move DAX-related functions to a new headerMatthew Wilcox2015-09-08
| | | | | | | | | | | | | | | | | | | In order to handle the !CONFIG_TRANSPARENT_HUGEPAGES case, we need to return VM_FAULT_FALLBACK from the inlined dax_pmd_fault(), which is defined in linux/mm.h. Given that we don't want to include <linux/mm.h> in <linux/fs.h>, the easiest solution is to move the DAX-related functions to a new header, <linux/dax.h>. We could also have moved VM_FAULT_* definitions to a new header, or a different header that isn't quite such a boil-the-ocean header as <linux/mm.h>, but this felt like the best option. Signed-off-by: Matthew Wilcox <willy@linux.intel.com> Cc: Hillf Danton <dhillf@gmail.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Theodore Ts'o <tytso@mit.edu> Cc: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Merge branch 'for-linus' of ↵Linus Torvalds2015-09-05
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs updates from Al Viro: "In this one: - d_move fixes (Eric Biederman) - UFS fixes (me; locking is mostly sane now, a bunch of bugs in error handling ought to be fixed) - switch of sb_writers to percpu rwsem (Oleg Nesterov) - superblock scalability (Josef Bacik and Dave Chinner) - swapon(2) race fix (Hugh Dickins)" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (65 commits) vfs: Test for and handle paths that are unreachable from their mnt_root dcache: Reduce the scope of i_lock in d_splice_alias dcache: Handle escaped paths in prepend_path mm: fix potential data race in SyS_swapon inode: don't softlockup when evicting inodes inode: rename i_wb_list to i_io_list sync: serialise per-superblock sync operations inode: convert inode_sb_list_lock to per-sb inode: add hlist_fake to avoid the inode hash lock in evict writeback: plug writeback at a high level change sb_writers to use percpu_rw_semaphore shift percpu_counter_destroy() into destroy_super_work() percpu-rwsem: kill CONFIG_PERCPU_RWSEM percpu-rwsem: introduce percpu_rwsem_release() and percpu_rwsem_acquire() percpu-rwsem: introduce percpu_down_read_trylock() document rwsem_release() in sb_wait_write() fix the broken lockdep logic in __sb_start_write() introduce __sb_writers_{acquired,release}() helpers ufs_inode_get{frag,block}(): get rid of 'phys' argument ufs_getfrag_block(): tidy up a bit ...
| * Merge branch 'superblock-scaling' of ↵Al Viro2015-08-21
| |\ | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/josef/btrfs-next into for-next Conflicts: include/linux/fs.h
| | * inode: rename i_wb_list to i_io_listDave Chinner2015-08-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There's a small consistency problem between the inode and writeback naming. Writeback calls the "for IO" inode queues b_io and b_more_io, but the inode calls these the "writeback list" or i_wb_list. This makes it hard to an new "under writeback" list to the inode, or call it an "under IO" list on the bdi because either way we'll have writeback on IO and IO on writeback and it'll just be confusing. I'm getting confused just writing this! So, rename the inode "for IO" list variable to i_io_list so we can add a new "writeback list" in a subsequent patch. Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Josef Bacik <jbacik@fb.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Christoph Hellwig <hch@lst.de> Tested-by: Dave Chinner <dchinner@redhat.com>
| | * sync: serialise per-superblock sync operationsDave Chinner2015-08-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When competing sync(2) calls walk the same filesystem, they need to walk the list of inodes on the superblock to find all the inodes that we need to wait for IO completion on. However, when multiple wait_sb_inodes() calls do this at the same time, they contend on the the inode_sb_list_lock and the contention causes system wide slowdowns. In effect, concurrent sync(2) calls can take longer and burn more CPU than if they were serialised. Stop the worst of the contention by adding a per-sb mutex to wrap around wait_sb_inodes() so that we only execute one sync(2) IO completion walk per superblock superblock at a time and hence avoid contention being triggered by concurrent sync(2) calls. Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Josef Bacik <jbacik@fb.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Christoph Hellwig <hch@lst.de> Tested-by: Dave Chinner <dchinner@redhat.com>
| | * inode: convert inode_sb_list_lock to per-sbDave Chinner2015-08-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The process of reducing contention on per-superblock inode lists starts with moving the locking to match the per-superblock inode list. This takes the global lock out of the picture and reduces the contention problems to within a single filesystem. This doesn't get rid of contention as the locks still have global CPU scope, but it does isolate operations on different superblocks form each other. Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Josef Bacik <jbacik@fb.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Christoph Hellwig <hch@lst.de> Tested-by: Dave Chinner <dchinner@redhat.com>
| | * inode: add hlist_fake to avoid the inode hash lock in evictJosef Bacik2015-08-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Some filesystems don't use the VFS inode hash and fake the fact they are hashed so that all the writeback code works correctly. However, this means the evict() path still tries to remove the inode from the hash, meaning that the inode_hash_lock() needs to be taken unnecessarily. Hence under certain workloads the inode_hash_lock can be contended even if the inode is never actually hashed. To avoid this add hlist_fake to test if the inode isn't actually hashed to avoid taking the hash lock on inodes that have never been hashed. Based on Dave Chinner's inode: add IOP_NOTHASHED to avoid inode hash lock in evict basd on Al's suggestions. Thanks, Signed-off-by: Josef Bacik <jbacik@fb.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Christoph Hellwig <hch@lst.de> Tested-by: Dave Chinner <dchinner@redhat.com>
| * | change sb_writers to use percpu_rw_semaphoreOleg Nesterov2015-08-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We can remove everything from struct sb_writers except frozen and add the array of percpu_rw_semaphore's instead. This patch doesn't remove sb_writers->wait_unfrozen yet, we keep it for get_super_thawed(). We will probably remove it later. This change tries to address the following problems: - Firstly, __sb_start_write() looks simply buggy. It does __sb_end_write() if it sees ->frozen, but if it migrates to another CPU before percpu_counter_dec(), sb_wait_write() can wrongly succeed if there is another task which holds the same "semaphore": sb_wait_write() can miss the result of the previous percpu_counter_inc() but see the result of this percpu_counter_dec(). - As Dave Hansen reports, it is suboptimal. The trivial microbenchmark that writes to a tmpfs file in a loop runs 12% faster if we change this code to rely on RCU and kill the memory barriers. - This code doesn't look simple. It would be better to rely on the generic locking code. According to Dave, this change adds the same performance improvement. Note: with this change both freeze_super() and thaw_super() will do synchronize_sched_expedited() 3 times. This is just ugly. But: - This will be "fixed" by the rcu_sync changes we are going to merge. After that freeze_super()->percpu_down_write() will use synchronize_sched(), and thaw_super() won't use synchronize() at all. This doesn't need any changes in fs/super.c. - Once we merge rcu_sync changes, we can also change super.c so that all wb_write->rw_sem's will share the single ->rss in struct sb_writes, then freeze_super() will need only one synchronize_sched(). Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Jan Kara <jack@suse.com>
| * | shift percpu_counter_destroy() into destroy_super_work()Oleg Nesterov2015-08-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Of course, this patch is ugly as hell. It will be (partially) reverted later. We add it to ensure that other WIP changes in percpu_rw_semaphore won't break fs/super.c. We do not even need this change right now, percpu_free_rwsem() is fine in atomic context. But we are going to change this, it will be might_sleep() after we merge the rcu_sync() patches. And even after that we do not really need destroy_super_work(), we will kill it in any case. Instead, destroy_super_rcu() should just check that rss->cb_state == CB_IDLE and do call_rcu() again in the (very unlikely) case this is not true. So this is just the temporary kludge which helps us to avoid the conflicts with the changes which will be (hopefully) routed via rcu tree. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Jan Kara <jack@suse.com>
| * | introduce __sb_writers_{acquired,release}() helpersOleg Nesterov2015-08-15
| |/ | | | | | | | | | | | | | | | | Preparation to hide the sb->s_writers internals from xfs and btrfs. Add 2 trivial define's they can use rather than play with ->s_writers directly. No changes in btrfs/transaction.o and xfs/xfs_aops.o. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Jan Kara <jack@suse.com>
* | Merge tag 'nfsd-4.3' of git://linux-nfs.org/~bfields/linuxLinus Torvalds2015-09-05
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull nfsd updates from Bruce Fields: "Nothing major, but: - Add Jeff Layton as an nfsd co-maintainer: no change to existing practice, just an acknowledgement of the status quo. - Two patches ("nfsd: ensure that...") for a race overlooked by the state locking rewrite, causing a crash noticed by multiple users. - Lots of smaller bugfixes all over from Kinglong Mee. - From Jeff, some cleanup of server rpc code in preparation for possible shift of nfsd threads to workqueues" * tag 'nfsd-4.3' of git://linux-nfs.org/~bfields/linux: (52 commits) nfsd: deal with DELEGRETURN racing with CB_RECALL nfsd: return CLID_INUSE for unexpected SETCLIENTID_CONFIRM case nfsd: ensure that delegation stateid hash references are only put once nfsd: ensure that the ol stateid hash reference is only put once net: sunrpc: fix tracepoint Warning: unknown op '->' nfsd: allow more than one laundry job to run at a time nfsd: don't WARN/backtrace for invalid container deployment. fs: fix fs/locks.c kernel-doc warning nfsd: Add Jeff Layton as co-maintainer NFSD: Return word2 bitmask if setting security label in OPEN/CREATE NFSD: Set the attributes used to store the verifier for EXCLUSIVE4_1 nfsd: SUPPATTR_EXCLCREAT must be encoded before SECURITY_LABEL. nfsd: Fix an FS_LAYOUT_TYPES/LAYOUT_TYPES encode bug NFSD: Store parent's stat in a separate value nfsd: Fix two typos in comments lockd: NLM grace period shouldn't block NFSv4 opens nfsd: include linux/nfs4.h in export.h sunrpc: Switch to using hash list instead single list sunrpc/nfsd: Remove redundant code by exports seq_operations functions sunrpc: Store cache_detail in seq_file's private directly ...
| * | lockd: NLM grace period shouldn't block NFSv4 opensJ. Bruce Fields2015-08-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | NLM locks don't conflict with NFSv4 share reservations, so we're not going to learn anything new by watiting for them. They do conflict with NFSv4 locks and with delegations. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
* | | mm: move ->mremap() from file_operations to vm_operations_structOleg Nesterov2015-09-04
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | vma->vm_ops->mremap() looks more natural and clean in move_vma(), and this way ->mremap() can have more users. Say, vdso. While at it, s/aio_ring_remap/aio_ring_mremap/. Note: this is the minimal change before ->mremap() finds another user in file_operations; this method should have more arguments, and it can be used to kill arch_remap(). Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Pavel Emelyanov <xemul@parallels.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: David Rientjes <rientjes@google.com> Cc: Benjamin LaHaise <bcrl@kvack.org> Cc: Hugh Dickins <hughd@google.com> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Laurent Dufour <ldufour@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | Merge branch 'for-linus' of ↵Linus Torvalds2015-09-01
|\ \ \ | |_|/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull user namespace updates from Eric Biederman: "This finishes up the changes to ensure proc and sysfs do not start implementing executable files, as the there are application today that are only secure because such files do not exist. It akso fixes a long standing misfeature of /proc/<pid>/mountinfo that did not show the proper source for files bind mounted from /proc/<pid>/ns/*. It also straightens out the handling of clone flags related to user namespaces, fixing an unnecessary failure of unshare(CLONE_NEWUSER) when files such as /proc/<pid>/environ are read while <pid> is calling unshare. This winds up fixing a minor bug in unshare flag handling that dates back to the first version of unshare in the kernel. Finally, this fixes a minor regression caused by the introduction of sysfs_create_mount_point, which broke someone's in house application, by restoring the size of /sys/fs/cgroup to 0 bytes. Apparently that application uses the directory size to determine if a tmpfs is mounted on /sys/fs/cgroup. The bind mount escape fixes are present in Al Viros for-next branch. and I expect them to come from there. The bind mount escape is the last of the user namespace related security bugs that I am aware of" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: fs: Set the size of empty dirs to 0. userns,pidns: Force thread group sharing, not signal handler sharing. unshare: Unsharing a thread does not require unsharing a vm nsfs: Add a show_path method to fix mountinfo mnt: fs_fully_visible enforce noexec and nosuid if !SB_I_NOEXEC vfs: Commit to never having exectuables on proc and sysfs.
| * | vfs: Commit to never having exectuables on proc and sysfs.Eric W. Biederman2015-07-10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Today proc and sysfs do not contain any executable files. Several applications today mount proc or sysfs without noexec and nosuid and then depend on there being no exectuables files on proc or sysfs. Having any executable files show on proc or sysfs would cause a user space visible regression, and most likely security problems. Therefore commit to never allowing executables on proc and sysfs by adding a new flag to mark them as filesystems without executables and enforce that flag. Test the flag where MNT_NOEXEC is tested today, so that the only user visible effect will be that exectuables will be treated as if the execute bit is cleared. The filesystems proc and sysfs do not currently incoporate any executable files so this does not result in any user visible effects. This makes it unnecessary to vet changes to proc and sysfs tightly for adding exectuable files or changes to chattr that would modify existing files, as no matter what the individual file say they will not be treated as exectuable files by the vfs. Not having to vet changes to closely is important as without this we are only one proc_create call (or another goof up in the implementation of notify_change) from having problematic executables on proc. Those mistakes are all too easy to make and would create a situation where there are security issues or the assumptions of some program having to be broken (and cause userspace regressions). Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
* | | fs, file table: reinit files_stat.max_files after deferred memory initialisationMel Gorman2015-08-07
| |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Dave Hansen reported the following; My laptop has been behaving strangely with 4.2-rc2. Once I log in to my X session, I start getting all kinds of strange errors from applications and see this in my dmesg: VFS: file-max limit 8192 reached The problem is that the file-max is calculated before memory is fully initialised and miscalculates how much memory the kernel is using. This patch recalculates file-max after deferred memory initialisation. Note that using memory hotplug infrastructure would not have avoided this problem as the value is not recalculated after memory hot-add. 4.1: files_stat.max_files = 6582781 4.2-rc2: files_stat.max_files = 8192 4.2-rc2 patched: files_stat.max_files = 6562467 Small differences with the patch applied and 4.1 but not enough to matter. Signed-off-by: Mel Gorman <mgorman@suse.de> Reported-by: Dave Hansen <dave.hansen@intel.com> Cc: Nicolai Stange <nicstange@gmail.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Alex Ng <alexng@microsoft.com> Cc: Fengguang Wu <fengguang.wu@intel.com> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | locks: inline posix_lock_file_wait and flock_lock_file_waitJeff Layton2015-07-13
| | | | | | | | | | | | | | They just call file_inode and then the corresponding *_inode_file_wait function. Just make them static inlines instead. Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
* | locks: new helpers - flock_lock_inode_wait and posix_lock_inode_waitJeff Layton2015-07-13
|/ | | | | | | | Allow callers to pass in an inode instead of a filp. Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Reviewed-by: "J. Bruce Fields" <bfields@fieldses.org> Tested-by: "J. Bruce Fields" <bfields@fieldses.org>
* Merge branch 'for-linus' of ↵Linus Torvalds2015-07-04
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull more vfs updates from Al Viro: "Assorted VFS fixes and related cleanups (IMO the most interesting in that part are f_path-related things and Eric's descriptor-related stuff). UFS regression fixes (it got broken last cycle). 9P fixes. fs-cache series, DAX patches, Jan's file_remove_suid() work" [ I'd say this is much more than "fixes and related cleanups". The file_table locking rule change by Eric Dumazet is a rather big and fundamental update even if the patch isn't huge. - Linus ] * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (49 commits) 9p: cope with bogus responses from server in p9_client_{read,write} p9_client_write(): avoid double p9_free_req() 9p: forgetting to cancel request on interrupted zero-copy RPC dax: bdev_direct_access() may sleep block: Add support for DAX reads/writes to block devices dax: Use copy_from_iter_nocache dax: Add block size note to documentation fs/file.c: __fget() and dup2() atomicity rules fs/file.c: don't acquire files->file_lock in fd_install() fs:super:get_anon_bdev: fix race condition could cause dev exceed its upper limitation vfs: avoid creation of inode number 0 in get_next_ino namei: make set_root_rcu() return void make simple_positive() public ufs: use dir_pages instead of ufs_dir_pages() pagemap.h: move dir_pages() over there remove the pointless include of lglock.h fs: cleanup slight list_entry abuse xfs: Correctly lock inode when removing suid and file capabilities fs: Call security_ops->inode_killpriv on truncate fs: Provide function telling whether file_remove_privs() will do anything ...
| * fs: Call security_ops->inode_killpriv on truncateJan Kara2015-06-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Comment in include/linux/security.h says that ->inode_killpriv() should be called when setuid bit is being removed and that similar security labels (in fact this applies only to file capabilities) should be removed at this time as well. However we don't call ->inode_killpriv() when we remove suid bit on truncate. We fix the problem by calling ->inode_need_killpriv() and subsequently ->inode_killpriv() on truncate the same way as we do it on file write. After this patch there's only one user of should_remove_suid() - ocfs2 - and indeed it's buggy because it doesn't call ->inode_killpriv() on write. However fixing it is difficult because of special locking constraints. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * fs: Provide function telling whether file_remove_privs() will do anythingJan Kara2015-06-23
| | | | | | | | | | | | | | | | | | Provide function telling whether file_remove_privs() will do anything. Currently we only have should_remove_suid() and that does something slightly different. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * fs: Rename file_remove_suid() to file_remove_privs()Jan Kara2015-06-23
| | | | | | | | | | | | | | | | | | | | file_remove_suid() is a misnomer since it removes also file capabilities stored in xattrs and sets S_NOSEC flag. Also should_remove_suid() tells something else than whether file_remove_suid() call is necessary which leads to bugs. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * vfs: add file_path() helperMiklos Szeredi2015-06-23
| | | | | | | | | | | | | | | | | | | | Turn d_path(&file->f_path, ...); into file_path(file, ...); Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * overlayfs: Make f_path always point to the overlay and f_inode to the underlayDavid Howells2015-06-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Make file->f_path always point to the overlay dentry so that the path in /proc/pid/fd is correct and to ensure that label-based LSMs have access to the overlay as well as the underlay (path-based LSMs probably don't need it). Using my union testsuite to set things up, before the patch I see: [root@andromeda union-testsuite]# bash 5</mnt/a/foo107 [root@andromeda union-testsuite]# ls -l /proc/$$/fd/ ... lr-x------. 1 root root 64 Jun 5 14:38 5 -> /a/foo107 [root@andromeda union-testsuite]# stat /mnt/a/foo107 ... Device: 23h/35d Inode: 13381 Links: 1 ... [root@andromeda union-testsuite]# stat -L /proc/$$/fd/5 ... Device: 23h/35d Inode: 13381 Links: 1 ... After the patch: [root@andromeda union-testsuite]# bash 5</mnt/a/foo107 [root@andromeda union-testsuite]# ls -l /proc/$$/fd/ ... lr-x------. 1 root root 64 Jun 5 14:22 5 -> /mnt/a/foo107 [root@andromeda union-testsuite]# stat /mnt/a/foo107 ... Device: 23h/35d Inode: 40346 Links: 1 ... [root@andromeda union-testsuite]# stat -L /proc/$$/fd/5 ... Device: 23h/35d Inode: 40346 Links: 1 ... Note the change in where /proc/$$/fd/5 points to in the ls command. It was pointing to /a/foo107 (which doesn't exist) and now points to /mnt/a/foo107 (which is correct). The inode accessed, however, is the lower layer. The union layer is on device 25h/37d and the upper layer on 24h/36d. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | Merge branch 'for-linus' of ↵Linus Torvalds2015-07-03
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull user namespace updates from Eric Biederman: "Long ago and far away when user namespaces where young it was realized that allowing fresh mounts of proc and sysfs with only user namespace permissions could violate the basic rule that only root gets to decide if proc or sysfs should be mounted at all. Some hacks were put in place to reduce the worst of the damage could be done, and the common sense rule was adopted that fresh mounts of proc and sysfs should allow no more than bind mounts of proc and sysfs. Unfortunately that rule has not been fully enforced. There are two kinds of gaps in that enforcement. Only filesystems mounted on empty directories of proc and sysfs should be ignored but the test for empty directories was insufficient. So in my tree directories on proc, sysctl and sysfs that will always be empty are created specially. Every other technique is imperfect as an ordinary directory can have entries added even after a readdir returns and shows that the directory is empty. Special creation of directories for mount points makes the code in the kernel a smidge clearer about it's purpose. I asked container developers from the various container projects to help test this and no holes were found in the set of mount points on proc and sysfs that are created specially. This set of changes also starts enforcing the mount flags of fresh mounts of proc and sysfs are consistent with the existing mount of proc and sysfs. I expected this to be the boring part of the work but unfortunately unprivileged userspace winds up mounting fresh copies of proc and sysfs with noexec and nosuid clear when root set those flags on the previous mount of proc and sysfs. So for now only the atime, read-only and nodev attributes which userspace happens to keep consistent are enforced. Dealing with the noexec and nosuid attributes remains for another time. This set of changes also addresses an issue with how open file descriptors from /proc/<pid>/ns/* are displayed. Recently readlink of /proc/<pid>/fd has been triggering a WARN_ON that has not been meaningful since it was added (as all of the code in the kernel was converted) and is not now actively wrong. There is also a short list of issues that have not been fixed yet that I will mention briefly. It is possible to rename a directory from below to above a bind mount. At which point any directory pointers below the renamed directory can be walked up to the root directory of the filesystem. With user namespaces enabled a bind mount of the bind mount can be created allowing the user to pick a directory whose children they can rename to outside of the bind mount. This is challenging to fix and doubly so because all obvious solutions must touch code that is in the performance part of pathname resolution. As mentioned above there is also a question of how to ensure that developers by accident or with purpose do not introduce exectuable files on sysfs and proc and in doing so introduce security regressions in the current userspace that will not be immediately obvious and as such are likely to require breaking userspace in painful ways once they are recognized" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: vfs: Remove incorrect debugging WARN in prepend_path mnt: Update fs_fully_visible to test for permanently empty directories sysfs: Create mountpoints with sysfs_create_mount_point sysfs: Add support for permanently empty directories to serve as mount points. kernfs: Add support for always empty directories. proc: Allow creating permanently empty directories that serve as mount points sysctl: Allow creating permanently empty directories that serve as mountpoints. fs: Add helper functions for permanently empty directories. vfs: Ignore unlocked mounts in fs_fully_visible mnt: Modify fs_fully_visible to deal with locked ro nodev and atime mnt: Refactor the logic for mounting sysfs and proc in a user namespace