summaryrefslogtreecommitdiff
path: root/fs/open.c (follow)
Commit message (Collapse)AuthorAge
* Merge 4.4.187 into android-4.4Greg Kroah-Hartman2019-08-04
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Changes in 4.4.187 MIPS: ath79: fix ar933x uart parity mode MIPS: fix build on non-linux hosts dmaengine: imx-sdma: fix use-after-free on probe error path ath10k: Do not send probe response template for mesh ath9k: Check for errors when reading SREV register ath6kl: add some bounds checking ath: DFS JP domain W56 fixed pulse type 3 RADAR detection batman-adv: fix for leaked TVLV handler. media: dvb: usb: fix use after free in dvb_usb_device_exit crypto: talitos - fix skcipher failure due to wrong output IV media: marvell-ccic: fix DMA s/g desc number calculation media: vpss: fix a potential NULL pointer dereference net: stmmac: dwmac1000: Clear unused address entries signal/pid_namespace: Fix reboot_pid_ns to use send_sig not force_sig af_key: fix leaks in key_pol_get_resp and dump_sp. xfrm: Fix xfrm sel prefix length validation media: staging: media: davinci_vpfe: - Fix for memory leak if decoder initialization fails. net: phy: Check against net_device being NULL tua6100: Avoid build warnings. locking/lockdep: Fix merging of hlocks with non-zero references media: wl128x: Fix some error handling in fm_v4l2_init_video_device() cpupower : frequency-set -r option misses the last cpu in related cpu list net: fec: Do not use netdev messages too early net: axienet: Fix race condition causing TX hang s390/qdio: handle PENDING state for QEBSM devices perf test 6: Fix missing kvm module load for s390 gpio: omap: fix lack of irqstatus_raw0 for OMAP4 gpio: omap: ensure irq is enabled before wakeup regmap: fix bulk writes on paged registers bpf: silence warning messages in core rcu: Force inlining of rcu_read_lock() xfrm: fix sa selector validation perf evsel: Make perf_evsel__name() accept a NULL argument vhost_net: disable zerocopy by default EDAC/sysfs: Fix memory leak when creating a csrow object media: i2c: fix warning same module names ntp: Limit TAI-UTC offset timer_list: Guard procfs specific code acpi/arm64: ignore 5.1 FADTs that are reported as 5.0 media: coda: fix mpeg2 sequence number handling media: coda: increment sequence offset for the last returned frame mt7601u: do not schedule rx_tasklet when the device has been disconnected x86/build: Add 'set -e' to mkcapflags.sh to delete broken capflags.c mt7601u: fix possible memory leak when the device is disconnected ath10k: fix PCIE device wake up failed rslib: Fix decoding of shortened codes rslib: Fix handling of of caller provided syndrome ixgbe: Check DDM existence in transceiver before access EDAC: Fix global-out-of-bounds write when setting edac_mc_poll_msec bcache: check c->gc_thread by IS_ERR_OR_NULL in cache_set_flush() Bluetooth: hci_bcsp: Fix memory leak in rx_skb Bluetooth: 6lowpan: search for destination address in all peers Bluetooth: Check state in l2cap_disconnect_rsp Bluetooth: validate BLE connection interval updates crypto: ghash - fix unaligned memory access in ghash_setkey() crypto: arm64/sha1-ce - correct digest for empty data in finup crypto: arm64/sha2-ce - correct digest for empty data in finup Input: gtco - bounds check collection indent level regulator: s2mps11: Fix buck7 and buck8 wrong voltages tracing/snapshot: Resize spare buffer if size changed NFSv4: Handle the special Linux file open access mode lib/scatterlist: Fix mapping iterator when sg->offset is greater than PAGE_SIZE ALSA: seq: Break too long mutex context in the write loop media: v4l2: Test type instead of cfg->type in v4l2_ctrl_new_custom() media: coda: Remove unbalanced and unneeded mutex unlock KVM: x86/vPMU: refine kvm_pmu err msg when event creation failed drm/nouveau/i2c: Enable i2c pads & busses during preinit padata: use smp_mb in padata_reorder to avoid orphaned padata jobs 9p/virtio: Add cleanup path in p9_virtio_init PCI: Do not poll for PME if the device is in D3cold take floppy compat ioctls to sodding floppy.c floppy: fix div-by-zero in setup_format_params floppy: fix out-of-bounds read in next_valid_format floppy: fix invalid pointer dereference in drive_name floppy: fix out-of-bounds read in copy_buffer coda: pass the host file in vma->vm_file on mmap gpu: ipu-v3: ipu-ic: Fix saturation bit offset in TPMEM parisc: Fix kernel panic due invalid values in IAOQ0 or IAOQ1 powerpc/32s: fix suspend/resume when IBATs 4-7 are used powerpc/watchpoint: Restore NV GPRs while returning from exception eCryptfs: fix a couple type promotion bugs intel_th: msu: Fix single mode with disabled IOMMU Bluetooth: Add SMP workaround Microsoft Surface Precision Mouse bug usb: Handle USB3 remote wakeup for LPM enabled devices correctly dm bufio: fix deadlock with loop device bnx2x: Prevent load reordering in tx completion processing caif-hsi: fix possible deadlock in cfhsi_exit_module() ipv4: don't set IPv6 only flags to IPv4 addresses net: bcmgenet: use promisc for unsupported filters net: neigh: fix multiple neigh timer scheduling nfc: fix potential illegal memory access sky2: Disable MSI on ASUS P6T netrom: fix a memory leak in nr_rx_frame() netrom: hold sock when setting skb->destructor tcp: Reset bytes_acked and bytes_received when disconnecting bonding: validate ip header before check IPPROTO_IGMP net: bridge: mcast: fix stale nsrcs pointer in igmp3/mld2 report handling net: bridge: mcast: fix stale ipv6 hdr pointer when handling v6 query net: bridge: stp: don't cache eth dest pointer before skb pull elevator: fix truncation of icq_cache_name NFSv4: Fix open create exclusive when the server reboots nfsd: increase DRC cache limit nfsd: give out fewer session slots as limit approaches nfsd: fix performance-limiting session calculation nfsd: Fix overflow causing non-working mounts on 1 TB machines drm/panel: simple: Fix panel_simple_dsi_probe usb: core: hub: Disable hub-initiated U1/U2 tty: max310x: Fix invalid baudrate divisors calculator pinctrl: rockchip: fix leaked of_node references tty: serial: cpm_uart - fix init when SMC is relocated memstick: Fix error cleanup path of memstick_init tty/serial: digicolor: Fix digicolor-usart already registered warning tty: serial: msm_serial: avoid system lockup condition drm/virtio: Add memory barriers for capset cache. phy: renesas: rcar-gen2: Fix memory leak at error paths usb: gadget: Zero ffs_io_data powerpc/pci/of: Fix OF flags parsing for 64bit BARs PCI: sysfs: Ignore lockdep for remove attribute iio: iio-utils: Fix possible incorrect mask calculation recordmcount: Fix spurious mcount entries on powerpc mfd: core: Set fwnode for created devices mfd: arizona: Fix undefined behavior um: Silence lockdep complaint about mmap_sem powerpc/4xx/uic: clear pending interrupt after irq type/pol change serial: sh-sci: Fix TX DMA buffer flushing and workqueue races kallsyms: exclude kasan local symbols on s390 perf test mmap-thread-lookup: Initialize variable to suppress memory sanitizer warning f2fs: avoid out-of-range memory access mailbox: handle failed named mailbox channel request powerpc/eeh: Handle hugepages in ioremap space sh: prevent warnings when using iounmap mm/kmemleak.c: fix check for softirq context 9p: pass the correct prototype to read_cache_page mm/mmu_notifier: use hlist_add_head_rcu() locking/lockdep: Fix lock used or unused stats error locking/lockdep: Hide unused 'class' variable usb: wusbcore: fix unbalanced get/put cluster_id usb: pci-quirks: Correct AMD PLL quirk detection x86/sysfb_efi: Add quirks for some devices with swapped width and height x86/speculation/mds: Apply more accurate check on hypervisor platform hpet: Fix division by zero in hpet_time_div() ALSA: line6: Fix wrong altsetting for LINE6_PODHD500_1 ALSA: hda - Add a conexant codec entry to let mute led work powerpc/tm: Fix oops on sigreturn on systems without TM access: avoid the RCU grace period for the temporary subjective credentials vmstat: Remove BUG_ON from vmstat_update mm, vmstat: make quiet_vmstat lighter ipv6: check sk sk_type and protocol early in ip_mroute_set/getsockopt tcp: reset sk_send_head in tcp_write_queue_purge ISDN: hfcsusb: checking idx of ep configuration media: cpia2_usb: first wake up, then free in disconnect media: radio-raremono: change devm_k*alloc to k*alloc Bluetooth: hci_uart: check for missing tty operations sched/fair: Don't free p->numa_faults with concurrent readers drivers/pps/pps.c: clear offset flags in PPS_SETPARAMS ioctl ceph: hold i_ceph_lock when removing caps for freeing inode Linux 4.4.187 Change-Id: Id03e619b24750a6b3faaff02166469569f5deb4f Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
| * access: avoid the RCU grace period for the temporary subjective credentialsLinus Torvalds2019-08-04
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit d7852fbd0f0423937fa287a598bfde188bb68c22 upstream. It turns out that 'access()' (and 'faccessat()') can cause a lot of RCU work because it installs a temporary credential that gets allocated and freed for each system call. The allocation and freeing overhead is mostly benign, but because credentials can be accessed under the RCU read lock, the freeing involves a RCU grace period. Which is not a huge deal normally, but if you have a lot of access() calls, this causes a fair amount of seconday damage: instead of having a nice alloc/free patterns that hits in hot per-CPU slab caches, you have all those delayed free's, and on big machines with hundreds of cores, the RCU overhead can end up being enormous. But it turns out that all of this is entirely unnecessary. Exactly because access() only installs the credential as the thread-local subjective credential, the temporary cred pointer doesn't actually need to be RCU free'd at all. Once we're done using it, we can just free it synchronously and avoid all the RCU overhead. So add a 'non_rcu' flag to 'struct cred', which can be set by users that know they only use it in non-RCU context (there are other potential users for this). We can make it a union with the rcu freeing list head that we need for the RCU case, so this doesn't need any extra storage. Note that this also makes 'get_current_cred()' clear the new non_rcu flag, in case we have filesystems that take a long-term reference to the cred and then expect the RCU delayed freeing afterwards. It's not entirely clear that this is required, but it makes for clear semantics: the subjective cred remains non-RCU as long as you only access it synchronously using the thread-local accessors, but you _can_ use it as a generic cred if you want to. It is possible that we should just remove the whole RCU markings for ->cred entirely. Only ->real_cred is really supposed to be accessed through RCU, and the long-term cred copies that nfs uses might want to explicitly re-enable RCU freeing if required, rather than have get_current_cred() do it implicitly. But this is a "minimal semantic changes" change for the immediate problem. Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Eric Dumazet <edumazet@google.com> Acked-by: Paul E. McKenney <paulmck@linux.ibm.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Jan Glauber <jglauber@marvell.com> Cc: Jiri Kosina <jikos@kernel.org> Cc: Jayachandran Chandrasekharan Nair <jnair@marvell.com> Cc: Greg KH <greg@kroah.com> Cc: Kees Cook <keescook@chromium.org> Cc: David Howells <dhowells@redhat.com> Cc: Miklos Szeredi <miklos@szeredi.hu> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* | Merge 4.4.181 into android-4.4Greg Kroah-Hartman2019-06-11
|\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Changes in 4.4.181 x86/speculation/mds: Revert CPU buffer clear on double fault exit x86/speculation/mds: Improve CPU buffer clear documentation ARM: exynos: Fix a leaked reference by adding missing of_node_put crypto: vmx - fix copy-paste error in CTR mode crypto: crct10dif-generic - fix use via crypto_shash_digest() crypto: x86/crct10dif-pcl - fix use via crypto_shash_digest() ALSA: usb-audio: Fix a memory leak bug ALSA: hda/hdmi - Consider eld_valid when reporting jack event ALSA: hda/realtek - EAPD turn on later ASoC: max98090: Fix restore of DAPM Muxes ASoC: RT5677-SPI: Disable 16Bit SPI Transfers mm/mincore.c: make mincore() more conservative ocfs2: fix ocfs2 read inode data panic in ocfs2_iget mfd: da9063: Fix OTP control register names to match datasheets for DA9063/63L tty/vt: fix write/write race in ioctl(KDSKBSENT) handler ext4: actually request zeroing of inode table after grow ext4: fix ext4_show_options for file systems w/o journal Btrfs: do not start a transaction at iterate_extent_inodes() bcache: fix a race between cache register and cacheset unregister bcache: never set KEY_PTRS of journal key to 0 in journal_reclaim() ipmi:ssif: compare block number correctly for multi-part return messages crypto: gcm - Fix error return code in crypto_gcm_create_common() crypto: gcm - fix incompatibility between "gcm" and "gcm_base" crypto: chacha20poly1305 - set cra_name correctly crypto: salsa20 - don't access already-freed walk.iv crypto: arm/aes-neonbs - don't access already-freed walk.iv writeback: synchronize sync(2) against cgroup writeback membership switches fs/writeback.c: use rcu_barrier() to wait for inflight wb switches going into workqueue when umount ext4: zero out the unused memory region in the extent tree block ALSA: hda/realtek - Fix for Lenovo B50-70 inverted internal microphone bug KVM: x86: Skip EFER vs. guest CPUID checks for host-initiated writes net: avoid weird emergency message net/mlx4_core: Change the error print to info print ppp: deflate: Fix possible crash in deflate_init tipc: switch order of device registration to fix a crash tipc: fix modprobe tipc failed after switch order of device registration stm class: Fix channel free in stm output free path md: add mddev->pers to avoid potential NULL pointer dereference intel_th: msu: Fix single mode with IOMMU of: fix clang -Wunsequenced for be32_to_cpu() cifs: fix strcat buffer overflow and reduce raciness in smb21_set_oplock_level() media: ov6650: Fix sensor possibly not detected on probe NFS4: Fix v4.0 client state corruption when mount clk: tegra: Fix PLLM programming on Tegra124+ when PMC overrides divider fuse: fix writepages on 32bit fuse: honor RLIMIT_FSIZE in fuse_file_fallocate iommu/tegra-smmu: Fix invalid ASID bits on Tegra30/114 ceph: flush dirty inodes before proceeding with remount tracing: Fix partial reading of trace event's id file memory: tegra: Fix integer overflow on tick value calculation perf intel-pt: Fix instructions sampling rate perf intel-pt: Fix improved sample timestamp perf intel-pt: Fix sample timestamp wrt non-taken branches fbdev: sm712fb: fix brightness control on reboot, don't set SR30 fbdev: sm712fb: fix VRAM detection, don't set SR70/71/74/75 fbdev: sm712fb: fix white screen of death on reboot, don't set CR3B-CR3F fbdev: sm712fb: fix boot screen glitch when sm712fb replaces VGA fbdev: sm712fb: fix crashes during framebuffer writes by correctly mapping VRAM fbdev: sm712fb: fix support for 1024x768-16 mode fbdev: sm712fb: use 1024x768 by default on non-MIPS, fix garbled display fbdev: sm712fb: fix crashes and garbled display during DPMS modesetting PCI: Mark Atheros AR9462 to avoid bus reset dm delay: fix a crash when invalid device is specified xfrm: policy: Fix out-of-bound array accesses in __xfrm_policy_unlink xfrm6_tunnel: Fix potential panic when unloading xfrm6_tunnel module vti4: ipip tunnel deregistration fixes. xfrm4: Fix uninitialized memory read in _decode_session4 KVM: arm/arm64: Ensure vcpu target is unset on reset failure power: supply: sysfs: prevent endless uevent loop with CONFIG_POWER_SUPPLY_DEBUG ufs: fix braino in ufs_get_inode_gid() for solaris UFS flavour perf bench numa: Add define for RUSAGE_THREAD if not present Revert "Don't jump to compute_result state from check_result state" md/raid: raid5 preserve the writeback action after the parity check btrfs: Honour FITRIM range constraints during free space trim fbdev: sm712fb: fix memory frequency by avoiding a switch/case fallthrough ext4: do not delete unlinked inode from orphan list on failed truncate KVM: x86: fix return value for reserved EFER bio: fix improper use of smp_mb__before_atomic() Revert "scsi: sd: Keep disk read-only when re-reading partition" crypto: vmx - CTR: always increment IV as quadword gfs2: Fix sign extension bug in gfs2_update_stats Btrfs: fix race between ranged fsync and writeback of adjacent ranges btrfs: sysfs: don't leak memory when failing add fsid fbdev: fix divide error in fb_var_to_videomode hugetlb: use same fault hash key for shared and private mappings fbdev: fix WARNING in __alloc_pages_nodemask bug media: cpia2: Fix use-after-free in cpia2_exit media: vivid: use vfree() instead of kfree() for dev->bitmap_cap ssb: Fix possible NULL pointer dereference in ssb_host_pcmcia_exit at76c50x-usb: Don't register led_trigger if usb_register_driver failed perf tools: No need to include bitops.h in util.h tools include: Adopt linux/bits.h gfs2: Fix lru_count going negative cxgb4: Fix error path in cxgb4_init_module mmc: core: Verify SD bus width powerpc/boot: Fix missing check of lseek() return value ASoC: imx: fix fiq dependencies spi: pxa2xx: fix SCR (divisor) calculation brcm80211: potential NULL dereference in brcmf_cfg80211_vndr_cmds_dcmd_handler() rtc: 88pm860x: prevent use-after-free on device remove w1: fix the resume command API dmaengine: pl330: _stop: clear interrupt status mac80211/cfg80211: update bss channel on channel switch ASoC: fsl_sai: Update is_slave_mode with correct value mwifiex: prevent an array overflow net: cw1200: fix a NULL pointer dereference bcache: return error immediately in bch_journal_replay() bcache: fix failure in journal relplay bcache: add failure check to run_cache_set() for journal replay bcache: avoid clang -Wunintialized warning x86/build: Move _etext to actual end of .text smpboot: Place the __percpu annotation correctly x86/mm: Remove in_nmi() warning from 64-bit implementation of vmalloc_fault() mm/uaccess: Use 'unsigned long' to placate UBSAN warnings on older GCC versions HID: logitech-hidpp: use RAP instead of FAP to get the protocol version pinctrl: pistachio: fix leaked of_node references dmaengine: at_xdmac: remove BUG_ON macro in tasklet media: coda: clear error return value before picture run media: ov6650: Move v4l2_clk_get() to ov6650_video_probe() helper media: au0828: stop video streaming only when last user stops media: ov2659: make S_FMT succeed even if requested format doesn't match audit: fix a memory leak bug media: au0828: Fix NULL pointer dereference in au0828_analog_stream_enable() media: pvrusb2: Prevent a buffer overflow powerpc/numa: improve control of topology updates sched/core: Check quota and period overflow at usec to nsec conversion sched/core: Handle overflow in cpu_shares_write_u64 USB: core: Don't unbind interfaces following device reset failure x86/irq/64: Limit IST stack overflow check to #DB stack i40e: don't allow changes to HW VLAN stripping on active port VLANs RDMA/cxgb4: Fix null pointer dereference on alloc_skb failure hwmon: (vt1211) Use request_muxed_region for Super-IO accesses hwmon: (smsc47m1) Use request_muxed_region for Super-IO accesses hwmon: (smsc47b397) Use request_muxed_region for Super-IO accesses hwmon: (pc87427) Use request_muxed_region for Super-IO accesses hwmon: (f71805f) Use request_muxed_region for Super-IO accesses scsi: libsas: Do discovery on empty PHY to update PHY info mmc_spi: add a status check for spi_sync_locked mmc: sdhci-of-esdhc: add erratum eSDHC5 support mmc: sdhci-of-esdhc: add erratum eSDHC-A001 and A-008358 support PM / core: Propagate dev->power.wakeup_path when no callbacks extcon: arizona: Disable mic detect if running when driver is removed s390: cio: fix cio_irb declaration cpufreq: ppc_cbe: fix possible object reference leak cpufreq/pasemi: fix possible object reference leak cpufreq: pmac32: fix possible object reference leak x86/build: Keep local relocations with ld.lld iio: ad_sigma_delta: Properly handle SPI bus locking vs CS assertion iio: hmc5843: fix potential NULL pointer dereferences iio: common: ssp_sensors: Initialize calculated_time in ssp_common_process_data rtlwifi: fix a potential NULL pointer dereference brcmfmac: fix missing checks for kmemdup b43: shut up clang -Wuninitialized variable warning brcmfmac: convert dev_init_lock mutex to completion brcmfmac: fix race during disconnect when USB completion is in progress scsi: ufs: Fix regulator load and icc-level configuration scsi: ufs: Avoid configuring regulator with undefined voltage range arm64: cpu_ops: fix a leaked reference by adding missing of_node_put x86/ia32: Fix ia32_restore_sigcontext() AC leak chardev: add additional check for minor range overlap HID: core: move Usage Page concatenation to Main item ASoC: eukrea-tlv320: fix a leaked reference by adding missing of_node_put ASoC: fsl_utils: fix a leaked reference by adding missing of_node_put cxgb3/l2t: Fix undefined behaviour spi: tegra114: reset controller on probe media: wl128x: prevent two potential buffer overflows virtio_console: initialize vtermno value for ports tty: ipwireless: fix missing checks for ioremap rcutorture: Fix cleanup path for invalid torture_type strings usb: core: Add PM runtime calls to usb_hcd_platform_shutdown scsi: qla4xxx: avoid freeing unallocated dma memory media: m88ds3103: serialize reset messages in m88ds3103_set_frontend media: go7007: avoid clang frame overflow warning with KASAN media: saa7146: avoid high stack usage with clang scsi: lpfc: Fix SLI3 commands being issued on SLI4 devices spi : spi-topcliff-pch: Fix to handle empty DMA buffers spi: rspi: Fix sequencer reset during initialization spi: Fix zero length xfer bug ASoC: davinci-mcasp: Fix clang warning without CONFIG_PM ipv6: Consider sk_bound_dev_if when binding a raw socket to an address llc: fix skb leak in llc_build_and_send_ui_pkt() net-gro: fix use-after-free read in napi_gro_frags() net: stmmac: fix reset gpio free missing usbnet: fix kernel crash after disconnect tipc: Avoid copying bytes beyond the supplied data bnxt_en: Fix aggregation buffer leak under OOM condition. net: mvpp2: fix bad MVPP2_TXQ_SCHED_TOKEN_CNTR_REG queue value crypto: vmx - ghash: do nosimd fallback manually xen/pciback: Don't disable PCI_COMMAND on PCI device reset. Revert "tipc: fix modprobe tipc failed after switch order of device registration" tipc: fix modprobe tipc failed after switch order of device registration -v2 sparc64: Fix regression in non-hypervisor TLB flush xcall include/linux/bitops.h: sanitize rotate primitives xhci: Convert xhci_handshake() to use readl_poll_timeout_atomic() usb: xhci: avoid null pointer deref when bos field is NULL USB: Fix slab-out-of-bounds write in usb_get_bos_descriptor USB: sisusbvga: fix oops in error path of sisusb_probe USB: Add LPM quirk for Surface Dock GigE adapter USB: rio500: refuse more than one device at a time USB: rio500: fix memory leak in close after disconnect media: usb: siano: Fix general protection fault in smsusb media: usb: siano: Fix false-positive "uninitialized variable" warning media: smsusb: better handle optional alignment scsi: zfcp: fix missing zfcp_port reference put on -EBUSY from port_remove scsi: zfcp: fix to prevent port_remove with pure auto scan LUNs (only sdevs) Btrfs: fix race updating log root item during fsync ALSA: hda/realtek - Set default power save node to 0 drm/nouveau/i2c: Disable i2c bus access after ->fini() tty: serial: msm_serial: Fix XON/XOFF tty: max310x: Fix external crystal register setup memcg: make it work on sparse non-0-node systems kernel/signal.c: trace_signal_deliver when signal_group_exit CIFS: cifs_read_allocate_pages: don't iterate through whole page array on ENOMEM binder: Replace "%p" with "%pK" for stable binder: replace "%p" with "%pK" net: create skb_gso_validate_mac_len() bnx2x: disable GSO where gso_size is too big for hardware brcmfmac: Add length checks on firmware events brcmfmac: screening firmware event packet brcmfmac: revise handling events in receive path brcmfmac: fix incorrect event channel deduction brcmfmac: add length checks in scheduled scan result handler brcmfmac: add subtype check for event handling in data path userfaultfd: don't pin the user memory in userfaultfd_file_create() Revert "x86/build: Move _etext to actual end of .text" net: cdc_ncm: GetNtbFormat endian fix usb: gadget: fix request length error for isoc transfer media: uvcvideo: Fix uvc_alloc_entity() allocation alignment ethtool: fix potential userspace buffer overflow neighbor: Call __ipv4_neigh_lookup_noref in neigh_xmit net/mlx4_en: ethtool, Remove unsupported SFP EEPROM high pages query net: rds: fix memory leak in rds_ib_flush_mr_pool pktgen: do not sleep with the thread lock held. rcu: locking and unlocking need to always be at least barriers parisc: Use implicit space register selection for loading the coherence index of I/O pdirs fuse: fallocate: fix return with locked inode MIPS: pistachio: Build uImage.gz by default genwqe: Prevent an integer overflow in the ioctl drm/gma500/cdv: Check vbt config bits when detecting lvds panels fs: stream_open - opener for stream-like files so that read and write can run simultaneously without deadlock fuse: Add FOPEN_STREAM to use stream_open() ipv4: Define __ipv4_neigh_lookup_noref when CONFIG_INET is disabled ethtool: check the return value of get_regs_len Linux 4.4.181 Change-Id: Ibadc58ab76330698ff36ffdc0ca8c9d52ce36f9e Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
| * fs: stream_open - opener for stream-like files so that read and write can ↵Kirill Smelkov2019-06-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | run simultaneously without deadlock commit 10dce8af34226d90fa56746a934f8da5dcdba3df upstream. Commit 9c225f2655e3 ("vfs: atomic f_pos accesses as per POSIX") added locking for file.f_pos access and in particular made concurrent read and write not possible - now both those functions take f_pos lock for the whole run, and so if e.g. a read is blocked waiting for data, write will deadlock waiting for that read to complete. This caused regression for stream-like files where previously read and write could run simultaneously, but after that patch could not do so anymore. See e.g. commit 581d21a2d02a ("xenbus: fix deadlock on writes to /proc/xen/xenbus") which fixes such regression for particular case of /proc/xen/xenbus. The patch that added f_pos lock in 2014 did so to guarantee POSIX thread safety for read/write/lseek and added the locking to file descriptors of all regular files. In 2014 that thread-safety problem was not new as it was already discussed earlier in 2006. However even though 2006'th version of Linus's patch was adding f_pos locking "only for files that are marked seekable with FMODE_LSEEK (thus avoiding the stream-like objects like pipes and sockets)", the 2014 version - the one that actually made it into the tree as 9c225f2655e3 - is doing so irregardless of whether a file is seekable or not. See https://lore.kernel.org/lkml/53022DB1.4070805@gmail.com/ https://lwn.net/Articles/180387 https://lwn.net/Articles/180396 for historic context. The reason that it did so is, probably, that there are many files that are marked non-seekable, but e.g. their read implementation actually depends on knowing current position to correctly handle the read. Some examples: kernel/power/user.c snapshot_read fs/debugfs/file.c u32_array_read fs/fuse/control.c fuse_conn_waiting_read + ... drivers/hwmon/asus_atk0110.c atk_debugfs_ggrp_read arch/s390/hypfs/inode.c hypfs_read_iter ... Despite that, many nonseekable_open users implement read and write with pure stream semantics - they don't depend on passed ppos at all. And for those cases where read could wait for something inside, it creates a situation similar to xenbus - the write could be never made to go until read is done, and read is waiting for some, potentially external, event, for potentially unbounded time -> deadlock. Besides xenbus, there are 14 such places in the kernel that I've found with semantic patch (see below): drivers/xen/evtchn.c:667:8-24: ERROR: evtchn_fops: .read() can deadlock .write() drivers/isdn/capi/capi.c:963:8-24: ERROR: capi_fops: .read() can deadlock .write() drivers/input/evdev.c:527:1-17: ERROR: evdev_fops: .read() can deadlock .write() drivers/char/pcmcia/cm4000_cs.c:1685:7-23: ERROR: cm4000_fops: .read() can deadlock .write() net/rfkill/core.c:1146:8-24: ERROR: rfkill_fops: .read() can deadlock .write() drivers/s390/char/fs3270.c:488:1-17: ERROR: fs3270_fops: .read() can deadlock .write() drivers/usb/misc/ldusb.c:310:1-17: ERROR: ld_usb_fops: .read() can deadlock .write() drivers/hid/uhid.c:635:1-17: ERROR: uhid_fops: .read() can deadlock .write() net/batman-adv/icmp_socket.c:80:1-17: ERROR: batadv_fops: .read() can deadlock .write() drivers/media/rc/lirc_dev.c:198:1-17: ERROR: lirc_fops: .read() can deadlock .write() drivers/leds/uleds.c:77:1-17: ERROR: uleds_fops: .read() can deadlock .write() drivers/input/misc/uinput.c:400:1-17: ERROR: uinput_fops: .read() can deadlock .write() drivers/infiniband/core/user_mad.c:985:7-23: ERROR: umad_fops: .read() can deadlock .write() drivers/gnss/core.c:45:1-17: ERROR: gnss_fops: .read() can deadlock .write() In addition to the cases above another regression caused by f_pos locking is that now FUSE filesystems that implement open with FOPEN_NONSEEKABLE flag, can no longer implement bidirectional stream-like files - for the same reason as above e.g. read can deadlock write locking on file.f_pos in the kernel. FUSE's FOPEN_NONSEEKABLE was added in 2008 in a7c1b990f715 ("fuse: implement nonseekable open") to support OSSPD. OSSPD implements /dev/dsp in userspace with FOPEN_NONSEEKABLE flag, with corresponding read and write routines not depending on current position at all, and with both read and write being potentially blocking operations: See https://github.com/libfuse/osspd https://lwn.net/Articles/308445 https://github.com/libfuse/osspd/blob/14a9cff0/osspd.c#L1406 https://github.com/libfuse/osspd/blob/14a9cff0/osspd.c#L1438-L1477 https://github.com/libfuse/osspd/blob/14a9cff0/osspd.c#L1479-L1510 Corresponding libfuse example/test also describes FOPEN_NONSEEKABLE as "somewhat pipe-like files ..." with read handler not using offset. However that test implements only read without write and cannot exercise the deadlock scenario: https://github.com/libfuse/libfuse/blob/fuse-3.4.2-3-ga1bff7d/example/poll.c#L124-L131 https://github.com/libfuse/libfuse/blob/fuse-3.4.2-3-ga1bff7d/example/poll.c#L146-L163 https://github.com/libfuse/libfuse/blob/fuse-3.4.2-3-ga1bff7d/example/poll.c#L209-L216 I've actually hit the read vs write deadlock for real while implementing my FUSE filesystem where there is /head/watch file, for which open creates separate bidirectional socket-like stream in between filesystem and its user with both read and write being later performed simultaneously. And there it is semantically not easy to split the stream into two separate read-only and write-only channels: https://lab.nexedi.com/kirr/wendelin.core/blob/f13aa600/wcfs/wcfs.go#L88-169 Let's fix this regression. The plan is: 1. We can't change nonseekable_open to include &~FMODE_ATOMIC_POS - doing so would break many in-kernel nonseekable_open users which actually use ppos in read/write handlers. 2. Add stream_open() to kernel to open stream-like non-seekable file descriptors. Read and write on such file descriptors would never use nor change ppos. And with that property on stream-like files read and write will be running without taking f_pos lock - i.e. read and write could be running simultaneously. 3. With semantic patch search and convert to stream_open all in-kernel nonseekable_open users for which read and write actually do not depend on ppos and where there is no other methods in file_operations which assume @offset access. 4. Add FOPEN_STREAM to fs/fuse/ and open in-kernel file-descriptors via steam_open if that bit is present in filesystem open reply. It was tempting to change fs/fuse/ open handler to use stream_open instead of nonseekable_open on just FOPEN_NONSEEKABLE flags, but grepping through Debian codesearch shows users of FOPEN_NONSEEKABLE, and in particular GVFS which actually uses offset in its read and write handlers https://codesearch.debian.net/search?q=-%3Enonseekable+%3D https://gitlab.gnome.org/GNOME/gvfs/blob/1.40.0-6-gcbc54396/client/gvfsfusedaemon.c#L1080 https://gitlab.gnome.org/GNOME/gvfs/blob/1.40.0-6-gcbc54396/client/gvfsfusedaemon.c#L1247-1346 https://gitlab.gnome.org/GNOME/gvfs/blob/1.40.0-6-gcbc54396/client/gvfsfusedaemon.c#L1399-1481 so if we would do such a change it will break a real user. 5. Add stream_open and FOPEN_STREAM handling to stable kernels starting from v3.14+ (the kernel where 9c225f2655 first appeared). This will allow to patch OSSPD and other FUSE filesystems that provide stream-like files to return FOPEN_STREAM | FOPEN_NONSEEKABLE in their open handler and this way avoid the deadlock on all kernel versions. This should work because fs/fuse/ ignores unknown open flags returned from a filesystem and so passing FOPEN_STREAM to a kernel that is not aware of this flag cannot hurt. In turn the kernel that is not aware of FOPEN_STREAM will be < v3.14 where just FOPEN_NONSEEKABLE is sufficient to implement streams without read vs write deadlock. This patch adds stream_open, converts /proc/xen/xenbus to it and adds semantic patch to automatically locate in-kernel places that are either required to be converted due to read vs write deadlock, or that are just safe to be converted because read and write do not use ppos and there are no other funky methods in file_operations. Regarding semantic patch I've verified each generated change manually - that it is correct to convert - and each other nonseekable_open instance left - that it is either not correct to convert there, or that it is not converted due to current stream_open.cocci limitations. The script also does not convert files that should be valid to convert, but that currently have .llseek = noop_llseek or generic_file_llseek for unknown reason despite file being opened with nonseekable_open (e.g. drivers/input/mousedev.c) Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Yongzhi Pan <panyongzhi@gmail.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: David Vrabel <david.vrabel@citrix.com> Cc: Juergen Gross <jgross@suse.com> Cc: Miklos Szeredi <miklos@szeredi.hu> Cc: Tejun Heo <tj@kernel.org> Cc: Kirill Tkhai <ktkhai@virtuozzo.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christoph Hellwig <hch@lst.de> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Julia Lawall <Julia.Lawall@lip6.fr> Cc: Nikolaus Rath <Nikolaus@rath.org> Cc: Han-Wen Nienhuys <hanwen@google.com> [ backport to 4.4: actually fixed deadlock on /proc/xen/xenbus as 581d21a2d02a was not backported to 4.4 ] Signed-off-by: Kirill Smelkov <kirr@nexedi.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* | Merge 4.4.77 into android-4.4Greg Kroah-Hartman2017-07-15
|\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Changes in 4.4.77 fs: add a VALID_OPEN_FLAGS fs: completely ignore unknown open flags driver core: platform: fix race condition with driver_override bgmac: reset & enable Ethernet core before using it mm: fix classzone_idx underflow in shrink_zones() tracing/kprobes: Allow to create probe with a module name starting with a digit drm/virtio: don't leak bo on drm_gem_object_init failure usb: dwc3: replace %p with %pK USB: serial: cp210x: add ID for CEL EM3588 USB ZigBee stick Add USB quirk for HVR-950q to avoid intermittent device resets usb: usbip: set buffer pointers to NULL after free usb: Fix typo in the definition of Endpoint[out]Request mac80211_hwsim: Replace bogus hrtimer clockid sysctl: don't print negative flag for proc_douintvec sysctl: report EINVAL if value is larger than UINT_MAX for proc_douintvec pinctrl: sh-pfc: r8a7791: Fix SCIF2 pinmux data pinctrl: meson: meson8b: fix the NAND DQS pins pinctrl: sunxi: Fix SPDIF function name for A83T pinctrl: mxs: atomically switch mux and drive strength config pinctrl: sh-pfc: Update info pointer after SoC-specific init USB: serial: option: add two Longcheer device ids USB: serial: qcserial: new Sierra Wireless EM7305 device ID gfs2: Fix glock rhashtable rcu bug x86/tools: Fix gcc-7 warning in relocs.c x86/uaccess: Optimize copy_user_enhanced_fast_string() for short strings ath10k: override CE5 config for QCA9377 KEYS: Fix an error code in request_master_key() RDMA/uverbs: Check port number supplied by user verbs cmds mqueue: fix a use-after-free in sys_mq_notify() tools include: Add a __fallthrough statement tools string: Use __fallthrough in perf_atoll() tools strfilter: Use __fallthrough perf top: Use __fallthrough perf intel-pt: Use __fallthrough perf thread_map: Correctly size buffer used with dirent->dt_name perf scripting perl: Fix compile error with some perl5 versions perf tests: Avoid possible truncation with dirent->d_name + snprintf perf bench numa: Avoid possible truncation when using snprintf() perf tools: Use readdir() instead of deprecated readdir_r() perf thread_map: Use readdir() instead of deprecated readdir_r() perf script: Use readdir() instead of deprecated readdir_r() perf tools: Remove duplicate const qualifier perf annotate browser: Fix behaviour of Shift-Tab with nothing focussed perf pmu: Fix misleadingly indented assignment (whitespace) perf dwarf: Guard !x86_64 definitions under #ifdef else clause perf trace: Do not process PERF_RECORD_LOST twice perf tests: Remove wrong semicolon in while loop in CQM test perf tools: Use readdir() instead of deprecated readdir_r() again md: fix incorrect use of lexx_to_cpu in does_sb_need_changing md: fix super_offset endianness in super_1_rdev_size_change tcp: fix tcp_mark_head_lost to check skb len before fragmenting staging: vt6556: vnt_start Fix missing call to vnt_key_init_table. staging: comedi: fix clean-up of comedi_class in comedi_init() ext4: check return value of kstrtoull correctly in reserved_clusters_store x86/mm/pat: Don't report PAT on CPUs that don't support it saa7134: fix warm Medion 7134 EEPROM read Linux 4.4.77 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
| * fs: completely ignore unknown open flagsChristoph Hellwig2017-07-15
| | | | | | | | | | | | | | | | | | | | | | | | | | commit 629e014bb8349fcf7c1e4df19a842652ece1c945 upstream. Currently we just stash anything we got into file->f_flags, and the report it in fcntl(F_GETFD). This patch just clears out all unknown flags so that we don't pass them to the fs or report them. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* | Merge remote-tracking branch 'common/android-4.4' into android-4.4.yDmitry Shmidt2017-02-15
|\ \ | |/ |/| | | Change-Id: Icf907f5067fb6da5935ab0d3271df54b8d5df405
| * ANDROID: vfs: Add setattr2 for filesystems with per mount permissionsDaniel Rosenberg2017-01-26
| | | | | | | | | | | | | | | | | | | | This allows filesystems to use their mount private data to influence the permssions they use in setattr2. It has been separated into a new call to avoid disrupting current setattr users. Change-Id: I19959038309284448f1b7f232d579674ef546385 Signed-off-by: Daniel Rosenberg <drosen@google.com>
| * ANDROID: vfs: Add permission2 for filesystems with per mount permissionsDaniel Rosenberg2017-01-26
| | | | | | | | | | | | | | | | | | | | This allows filesystems to use their mount private data to influence the permssions they return in permission2. It has been separated into a new call to avoid disrupting current permission users. Change-Id: I9d416e3b8b6eca84ef3e336bd2af89ddd51df6ca Signed-off-by: Daniel Rosenberg <drosen@google.com>
* | vfs: add vfs_select_inode() helperMiklos Szeredi2016-05-18
| | | | | | | | | | | | | | | | commit 54d5ca871e72f2bb172ec9323497f01cd5091ec7 upstream. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* | fs/coredump: prevent fsuid=0 dumps into user-controlled directoriesJann Horn2016-04-12
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 378c6520e7d29280f400ef2ceaf155c86f05a71a upstream. This commit fixes the following security hole affecting systems where all of the following conditions are fulfilled: - The fs.suid_dumpable sysctl is set to 2. - The kernel.core_pattern sysctl's value starts with "/". (Systems where kernel.core_pattern starts with "|/" are not affected.) - Unprivileged user namespace creation is permitted. (This is true on Linux >=3.8, but some distributions disallow it by default using a distro patch.) Under these conditions, if a program executes under secure exec rules, causing it to run with the SUID_DUMP_ROOT flag, then unshares its user namespace, changes its root directory and crashes, the coredump will be written using fsuid=0 and a path derived from kernel.core_pattern - but this path is interpreted relative to the root directory of the process, allowing the attacker to control where a coredump will be written with root privileges. To fix the security issue, always interpret core_pattern for dumps that are written under SUID_DUMP_ROOT relative to the root directory of init. Signed-off-by: Jann Horn <jann@thejh.net> Acked-by: Kees Cook <keescook@chromium.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* vfs: Commit to never having exectuables on proc and sysfs.Eric W. Biederman2015-07-10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Today proc and sysfs do not contain any executable files. Several applications today mount proc or sysfs without noexec and nosuid and then depend on there being no exectuables files on proc or sysfs. Having any executable files show on proc or sysfs would cause a user space visible regression, and most likely security problems. Therefore commit to never allowing executables on proc and sysfs by adding a new flag to mark them as filesystems without executables and enforce that flag. Test the flag where MNT_NOEXEC is tested today, so that the only user visible effect will be that exectuables will be treated as if the execute bit is cleared. The filesystems proc and sysfs do not currently incoporate any executable files so this does not result in any user visible effects. This makes it unnecessary to vet changes to proc and sysfs tightly for adding exectuable files or changes to chattr that would modify existing files, as no matter what the individual file say they will not be treated as exectuable files by the vfs. Not having to vet changes to closely is important as without this we are only one proc_create call (or another goof up in the implementation of notify_change) from having problematic executables on proc. Those mistakes are all too easy to make and would create a situation where there are security issues or the assumptions of some program having to be broken (and cause userspace regressions). Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
* fs: Call security_ops->inode_killpriv on truncateJan Kara2015-06-23
| | | | | | | | | | | | | | | | | | | Comment in include/linux/security.h says that ->inode_killpriv() should be called when setuid bit is being removed and that similar security labels (in fact this applies only to file capabilities) should be removed at this time as well. However we don't call ->inode_killpriv() when we remove suid bit on truncate. We fix the problem by calling ->inode_need_killpriv() and subsequently ->inode_killpriv() on truncate the same way as we do it on file write. After this patch there's only one user of should_remove_suid() - ocfs2 - and indeed it's buggy because it doesn't call ->inode_killpriv() on write. However fixing it is difficult because of special locking constraints. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* vfs: add file_path() helperMiklos Szeredi2015-06-23
| | | | | | | | | | Turn d_path(&file->f_path, ...); into file_path(file, ...); Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* overlayfs: Make f_path always point to the overlay and f_inode to the underlayDavid Howells2015-06-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Make file->f_path always point to the overlay dentry so that the path in /proc/pid/fd is correct and to ensure that label-based LSMs have access to the overlay as well as the underlay (path-based LSMs probably don't need it). Using my union testsuite to set things up, before the patch I see: [root@andromeda union-testsuite]# bash 5</mnt/a/foo107 [root@andromeda union-testsuite]# ls -l /proc/$$/fd/ ... lr-x------. 1 root root 64 Jun 5 14:38 5 -> /a/foo107 [root@andromeda union-testsuite]# stat /mnt/a/foo107 ... Device: 23h/35d Inode: 13381 Links: 1 ... [root@andromeda union-testsuite]# stat -L /proc/$$/fd/5 ... Device: 23h/35d Inode: 13381 Links: 1 ... After the patch: [root@andromeda union-testsuite]# bash 5</mnt/a/foo107 [root@andromeda union-testsuite]# ls -l /proc/$$/fd/ ... lr-x------. 1 root root 64 Jun 5 14:22 5 -> /mnt/a/foo107 [root@andromeda union-testsuite]# stat /mnt/a/foo107 ... Device: 23h/35d Inode: 40346 Links: 1 ... [root@andromeda union-testsuite]# stat -L /proc/$$/fd/5 ... Device: 23h/35d Inode: 40346 Links: 1 ... Note the change in where /proc/$$/fd/5 points to in the ls command. It was pointing to /a/foo107 (which doesn't exist) and now points to /mnt/a/foo107 (which is correct). The inode accessed, however, is the lower layer. The union layer is on device 25h/37d and the upper layer on 24h/36d. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* VFS: Handle lower layer dentry/inode in pathwalkDavid Howells2015-05-11
| | | | | | | Make use of d_backing_inode() in pathwalk to gain access to an inode or dentry that's on a lower layer. Signed-off-by: David Howells <dhowells@redhat.com>
* Merge tag 'xfs-for-linus-4.1-rc1' of ↵Linus Torvalds2015-04-24
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs Pull xfs update from Dave Chinner: "This update contains: - RENAME_WHITEOUT support - conversion of per-cpu superblock accounting to use generic counters - new inode mmap lock so that we can lock page faults out of truncate, hole punch and other direct extent manipulation functions to avoid racing mmap writes from causing data corruption - rework of direct IO submission and completion to solve data corruption issue when running concurrent extending DIO writes. Also solves problem of running IO completion transactions in interrupt context during size extending AIO writes. - FALLOC_FL_INSERT_RANGE support for inserting holes into a file via direct extent manipulation to avoid needing to copy data within the file - attribute block header field overflow fix for 64k block size filesystems - Lots of changes to log messaging to be more informative and concise when errors occur. Also prevent a lot of unnecessary log spamming due to cascading failures in error conditions. - lots of cleanups and bug fixes One thing of note is the direct IO fixes that we merged last week after the window opened. Even though a little late, they fix a user reported data corruption and have been pretty well tested. I figured there was not much point waiting another 2 weeks for -rc1 to be released just so I could send them to you..." * tag 'xfs-for-linus-4.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs: (49 commits) xfs: using generic_file_direct_write() is unnecessary xfs: direct IO EOF zeroing needs to drain AIO xfs: DIO write completion size updates race xfs: DIO writes within EOF don't need an ioend xfs: handle DIO overwrite EOF update completion correctly xfs: DIO needs an ioend for writes xfs: move DIO mapping size calculation xfs: factor DIO write mapping from get_blocks xfs: unlock i_mutex in xfs_break_layouts xfs: kill unnecessary firstused overflow check on attr3 leaf removal xfs: use larger in-core attr firstused field and detect overflow xfs: pass attr geometry to attr leaf header conversion functions xfs: disallow ro->rw remount on norecovery mount xfs: xfs_shift_file_space can be static xfs: Add support FALLOC_FL_INSERT_RANGE for fallocate fs: Add support FALLOC_FL_INSERT_RANGE for fallocate xfs: Fix incorrect positive ENOMEM return xfs: xfs_mru_cache_insert() should use GFP_NOFS xfs: %pF is only for function pointers xfs: fix shadow warning in xfs_da3_root_split() ...
| * fs: Add support FALLOC_FL_INSERT_RANGE for fallocateNamjae Jeon2015-03-25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | FALLOC_FL_INSERT_RANGE command is the opposite command of FALLOC_FL_COLLAPSE_RANGE that is needed for someone who wants to add some data in the middle of file. FALLOC_FL_INSERT_RANGE will create space for writing new data within a file after shifting extents to right as given length. This command also has same limitations as FALLOC_FL_COLLAPSE_RANGE in that operations need to be filesystem block boundary aligned and cannot cross the current EOF. Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Ashish Sangwan <a.sangwan@samsung.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
* | ->aio_read and ->aio_write removedAl Viro2015-04-11
| | | | | | | | | | | | no remaining users Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | NFS: fix BUG() crash in notify_change() with patch to chown_common()Andrew Elble2015-04-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We have observed a BUG() crash in fs/attr.c:notify_change(). The crash occurs during an rsync into a filesystem that is exported via NFS. 1.) fs/attr.c:notify_change() modifies the caller's version of attr. 2.) 6de0ec00ba8d ("VFS: make notify_change pass ATTR_KILL_S*ID to setattr operations") introduced a BUG() restriction such that "no function will ever call notify_change() with both ATTR_MODE and ATTR_KILL_S*ID set". Under some circumstances though, it will have assisted in setting the caller's version of attr to this very combination. 3.) 27ac0ffeac80 ("locks: break delegations on any attribute modification") introduced code to handle breaking delegations. This can result in notify_change() being re-called. attr _must_ be explicitly reset to avoid triggering the BUG() established in #2. 4.) The path that that triggers this is via fs/open.c:chmod_common(). The combination of attr flags set here and in the first call to notify_change() along with a later failed break_deleg_wait() results in notify_change() being called again via retry_deleg without resetting attr. Solution is to move retry_deleg in chmod_common() a bit further up to ensure attr is completely reset. There are other places where this seemingly could occur, such as fs/utimes.c:utimes_common(), but the attr flags are not initially set in such a way to trigger this. Fixes: 27ac0ffeac80 ("locks: break delegations on any attribute modification") Reported-by: Eric Meddaugh <etmsys@rit.edu> Tested-by: Eric Meddaugh <etmsys@rit.edu> Signed-off-by: Andrew Elble <aweits@rit.edu> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | drop bogus check in file_open_root()Al Viro2015-04-11
|/ | | | | | | | For one thing, LOOKUP_DIRECTORY will be dealt with in do_last(). For another, name can be an empty string, but not NULL - no callers pass that and it would oops immediately if they would. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* Merge branch 'getname2' of ↵Linus Torvalds2015-02-17
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull getname/putname updates from Al Viro: "Rework of getname/getname_kernel/etc., mostly from Paul Moore. Gets rid of quite a pile of kludges between namei and audit..." * 'getname2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: audit: replace getname()/putname() hacks with reference counters audit: fix filename matching in __audit_inode() and __audit_inode_child() audit: enable filename recording via getname_kernel() simpler calling conventions for filename_mountpoint() fs: create proper filename objects using getname_kernel() fs: rework getname_kernel to handle up to PATH_MAX sized filenames cut down the number of do_path_lookup() callers
| * fs: create proper filename objects using getname_kernel()Paul Moore2015-01-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are several areas in the kernel that create temporary filename objects using the following pattern: int func(const char *name) { struct filename *file = { .name = name }; ... return 0; } ... which for the most part works okay, but it causes havoc within the audit subsystem as the filename object does not persist beyond the lifetime of the function. This patch converts all of these temporary filename objects into proper filename objects using getname_kernel() and putname() which ensure that the filename object persists until the audit subsystem is finished with it. Also, a special thanks to Al Viro, Guenter Roeck, and Sabrina Dubroca for helping resolve a difficult kernel panic on boot related to a use-after-free problem in kern_path_create(); the thread can be seen at the link below: * https://lkml.org/lkml/2015/1/20/710 This patch includes code that was either based on, or directly written by Al in the above thread. CC: viro@zeniv.linux.org.uk CC: linux@roeck-us.net CC: sd@queasysnail.net CC: linux-fsdevel@vger.kernel.org Signed-off-by: Paul Moore <pmoore@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | vfs: remove get_xip_memMatthew Wilcox2015-02-16
|/ | | | | | | | | | | | | | | | | | | | | | All callers of get_xip_mem() are now gone. Remove checks for it, initialisers of it, documentation of it and the only implementation of it. Also remove mm/filemap_xip.c as it is now empty. Also remove documentation of the long-gone get_xip_page(). Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com> Cc: Andreas Dilger <andreas.dilger@intel.com> Cc: Boaz Harrosh <boaz@plexistor.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Dave Chinner <david@fromorbit.com> Cc: Jan Kara <jack@suse.cz> Cc: Jens Axboe <axboe@kernel.dk> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Merge branch 'for-3.19' of git://linux-nfs.org/~bfields/linuxLinus Torvalds2014-12-16
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull nfsd updates from Bruce Fields: "A comparatively quieter cycle for nfsd this time, but still with two larger changes: - RPC server scalability improvements from Jeff Layton (using RCU instead of a spinlock to find idle threads). - server-side NFSv4.2 ALLOCATE/DEALLOCATE support from Anna Schumaker, enabling fallocate on new clients" * 'for-3.19' of git://linux-nfs.org/~bfields/linux: (32 commits) nfsd4: fix xdr4 count of server in fs_location4 nfsd4: fix xdr4 inclusion of escaped char sunrpc/cache: convert to use string_escape_str() sunrpc: only call test_bit once in svc_xprt_received fs: nfsd: Fix signedness bug in compare_blob sunrpc: add some tracepoints around enqueue and dequeue of svc_xprt sunrpc: convert to lockless lookup of queued server threads sunrpc: fix potential races in pool_stats collection sunrpc: add a rcu_head to svc_rqst and use kfree_rcu to free it sunrpc: require svc_create callers to pass in meaningful shutdown routine sunrpc: have svc_wake_up only deal with pool 0 sunrpc: convert sp_task_pending flag to use atomic bitops sunrpc: move rq_cachetype field to better optimize space sunrpc: move rq_splice_ok flag into rq_flags sunrpc: move rq_dropme flag into rq_flags sunrpc: move rq_usedeferral flag to rq_flags sunrpc: move rq_local field to rq_flags sunrpc: add a generic rq_flags field to svc_rqst and move rq_secure to it nfsd: minor off by one checks in __write_versions() sunrpc: release svc_pool_map reference when serv allocation fails ...
| * merge nfs bugfixes into nfsd for-3.19 branchJ. Bruce Fields2014-11-19
| |\ | | | | | | | | | | | | In addition to nfsd bugfixes, there are some fixes in -rc5 for client bugs that can interfere with my testing.
| * | VFS: Rename do_fallocate() to vfs_fallocate()Anna Schumaker2014-11-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This function needs to be exported so it can be used by the NFSD module when responding to the new ALLOCATE and DEALLOCATE operations in NFS v4.2. Christoph Hellwig suggested renaming the function to stay consistent with how other vfs functions are named. Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
* | | fallocate: create FAN_MODIFY and IN_MODIFY eventsHeinrich Schuchardt2014-12-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The fanotify and the inotify API can be used to monitor changes of the file system. System call fallocate() modifies files. Hence it should trigger the corresponding fanotify (FAN_MODIFY) and inotify (IN_MODIFY) events. The most interesting case is FALLOC_FL_COLLAPSE_RANGE because this value allows to create arbitrary file content from random data. This patch adds the missing call to fsnotify_modify(). The FAN_MODIFY and IN_MODIFY event will be created when fallocate() succeeds. It will even be created if the file length remains unchanged, e.g. when calling fanotify with flag FALLOC_FL_KEEP_SIZE. This logic was primarily chosen to keep the coding simple. It resembles the logic of the write() system call. When we call write() we always create a FAN_MODIFY event, even in the case of overwriting with identical data. Events FAN_MODIFY and IN_MODIFY do not provide any guarantee that data was actually changed. Furthermore even if if the filesize remains unchanged, fallocate() may influence whether a subsequent write() will succeed and hence the fallocate() call may be considered a modification. The fallocate(2) man page teaches: After a successful call, subsequent writes into the range specified by offset and len are guaranteed not to fail because of lack of disk space. So calling fallocate(fd, FALLOC_FL_KEEP_SIZE, offset, len) may result in different outcomes of a subsequent write depending on the values of offset and len. Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Jan Kara <jack@suse.cz> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Eric Paris <eparis@parisplace.org> Cc: John McCutchan <john@johnmccutchan.com> Cc: Robert Love <rlove@rlove.org> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Theodore Ts'o <tytso@mit.edu> Cc: Dave Chinner <david@fromorbit.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | new helper: audit_file()Al Viro2014-11-19
| |/ |/| | | | | | | | | | | | | | | ... for situations when we don't have any candidate in pathnames - basically, in descriptor-based syscalls. [Folded the build fix for !CONFIG_AUDITSYSCALL configs from Chen Gang] Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | vfs: add i_op->dentry_open()Miklos Szeredi2014-10-24
|/ | | | | | | Add a new inode operation i_op->dentry_open(). This is for stacked filesystems that want to return a struct file from a different filesystem. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
* vfs: fix check for fallocate on active swapfileEric Biggers2014-08-01
| | | | | | | | | Fix the broken check for calling sys_fallocate() on an active swapfile, introduced by commit 0790b31b69374ddadefe ("fs: disallow all fallocate operation on active swapfile"). Signed-off-by: Eric Biggers <ebiggers3@gmail.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* new methods: ->read_iter() and ->write_iter()Al Viro2014-05-06
| | | | | | | | | | | | | | | | | | Beginning to introduce those. Just the callers for now, and it's clumsier than it'll eventually become; once we finish converting aio_read and aio_write instances, the things will get nicer. For now, these guys are in parallel to ->aio_read() and ->aio_write(); they take iocb and iov_iter, with everything in iov_iter already validated. File offset is passed in iocb->ki_pos, iov/nr_segs - in iov_iter. Main concerns in that series are stack footprint and ability to split the damn thing cleanly. [fix from Peter Ujfalusi <peter.ujfalusi@ti.com> folded] Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* replace checking for ->read/->aio_read presence with check in ->f_modeAl Viro2014-05-06
| | | | | | | | | | | Since we are about to introduce new methods (read_iter/write_iter), the tests in a bunch of places would have to grow inconveniently. Check once (at open() time) and store results in ->f_mode as FMODE_CAN_READ and FMODE_CAN_WRITE resp. It might end up being a temporary measure - once everything switches from ->aio_{read,write} to ->{read,write}_iter it might make sense to return to open-coded checks. We'll see... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* Merge tag 'ext4_for_linus_stable' of ↵Linus Torvalds2014-04-20
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 fixes from Ted Ts'o: "These are regression and bug fixes for ext4. We had a number of new features in ext4 during this merge window (ZERO_RANGE and COLLAPSE_RANGE fallocate modes, renameat, etc.) so there were many more regression and bug fixes this time around. It didn't help that xfstests hadn't been fully updated to fully stress test COLLAPSE_RANGE until after -rc1" * tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (31 commits) ext4: disable COLLAPSE_RANGE for bigalloc ext4: fix COLLAPSE_RANGE failure with 1KB block size ext4: use EINVAL if not a regular file in ext4_collapse_range() ext4: enforce we are operating on a regular file in ext4_zero_range() ext4: fix extent merging in ext4_ext_shift_path_extents() ext4: discard preallocations after removing space ext4: no need to truncate pagecache twice in collapse range ext4: fix removing status extents in ext4_collapse_range() ext4: use filemap_write_and_wait_range() correctly in collapse range ext4: use truncate_pagecache() in collapse range ext4: remove temporary shim used to merge COLLAPSE_RANGE and ZERO_RANGE ext4: fix ext4_count_free_clusters() with EXT4FS_DEBUG and bigalloc enabled ext4: always check ext4_ext_find_extent result ext4: fix error handling in ext4_ext_shift_extents ext4: silence sparse check warning for function ext4_trim_extent ext4: COLLAPSE_RANGE only works on extent-based files ext4: fix byte order problems introduced by the COLLAPSE_RANGE patches ext4: use i_size_read in ext4_unaligned_aio() fs: disallow all fallocate operation on active swapfile fs: move falloc collapse range check into the filesystem methods ...
| * fs: disallow all fallocate operation on active swapfileLukas Czerner2014-04-12
| | | | | | | | | | | | | | | | | | | | Currently some file system have IS_SWAPFILE check in their fallocate implementations and some do not. However we should really prevent any fallocate operation on swapfile so move the check to vfs and remove the redundant checks from the file systems fallocate implementations. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * fs: move falloc collapse range check into the filesystem methodsLukas Czerner2014-04-12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently in do_fallocate in collapse range case we're checking whether offset + len is not bigger than i_size. However there is nothing which would prevent i_size from changing so the check is pointless. It should be done in the file system itself and the file system needs to make sure that i_size is not going to change. The i_size check for the other fallocate modes are also done in the filesystems. As it is now we can easily crash the kernel by having two processes doing truncate and fallocate collapse range at the same time. This can be reproduced on ext4 and it is theoretically possible on xfs even though I was not able to trigger it with this simple test. This commit removes the check from do_fallocate and adds it to the file system. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Acked-by: Dave Chinner <david@fromorbit.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
| * fs: prevent doing FALLOC_FL_ZERO_RANGE on append only fileLukas Czerner2014-04-12
| | | | | | | | | | | | | | | | | | Currently punch hole and collapse range fallocate operation are not allowed on append only file. This should be case for zero range as well. Fix it by allowing only pure fallocate (possibly with keep size set). Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
* | Merge branch 'for-linus' of ↵Linus Torvalds2014-04-12
|\ \ | |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs updates from Al Viro: "The first vfs pile, with deep apologies for being very late in this window. Assorted cleanups and fixes, plus a large preparatory part of iov_iter work. There's a lot more of that, but it'll probably go into the next merge window - it *does* shape up nicely, removes a lot of boilerplate, gets rid of locking inconsistencie between aio_write and splice_write and I hope to get Kent's direct-io rewrite merged into the same queue, but some of the stuff after this point is having (mostly trivial) conflicts with the things already merged into mainline and with some I want more testing. This one passes LTP and xfstests without regressions, in addition to usual beating. BTW, readahead02 in ltp syscalls testsuite has started giving failures since "mm/readahead.c: fix readahead failure for memoryless NUMA nodes and limit readahead pages" - might be a false positive, might be a real regression..." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (63 commits) missing bits of "splice: fix racy pipe->buffers uses" cifs: fix the race in cifs_writev() ceph_sync_{,direct_}write: fix an oops on ceph_osdc_new_request() failure kill generic_file_buffered_write() ocfs2_file_aio_write(): switch to generic_perform_write() ceph_aio_write(): switch to generic_perform_write() xfs_file_buffered_aio_write(): switch to generic_perform_write() export generic_perform_write(), start getting rid of generic_file_buffer_write() generic_file_direct_write(): get rid of ppos argument btrfs_file_aio_write(): get rid of ppos kill the 5th argument of generic_file_buffered_write() kill the 4th argument of __generic_file_aio_write() lustre: don't open-code kernel_recvmsg() ocfs2: don't open-code kernel_recvmsg() drbd: don't open-code kernel_recvmsg() constify blk_rq_map_user_iov() and friends lustre: switch to kernel_sendmsg() ocfs2: don't open-code kernel_sendmsg() take iov_iter stuff to mm/iov_iter.c process_vm_access: tidy up a bit ...
| * tidy do_dentry_open() up a bitAl Viro2014-04-01
| | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * mark struct file that had write access grabbed by open()Al Viro2014-04-01
| | | | | | | | | | | | | | | | | | new flag in ->f_mode - FMODE_WRITER. Set by do_dentry_open() in case when it has grabbed write access, checked by __fput() to decide whether it wants to drop the sucker. Allows to stop bothering with mnt_clone_write() in alloc_file(), along with fewer special_file() checks. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * fold __get_file_write_access() into its only callerAl Viro2014-04-01
| | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * get rid of DEBUG_WRITECOUNTAl Viro2014-04-01
| | | | | | | | | | | | it only makes control flow in __fput() and friends more convoluted. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * don't bother with {get,put}_write_access() on non-regular filesAl Viro2014-04-01
| | | | | | | | | | | | | | | | | | it's pointless and actually leads to wrong behaviour in at least one moderately convoluted case (pipe(), close one end, try to get to another via /proc/*/fd and run into ETXTBUSY). Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | Merge tag 'xfs-for-linus-3.15-rc1' of git://oss.sgi.com/xfs/xfsLinus Torvalds2014-04-04
|\ \ | |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull xfs update from Dave Chinner: "There are a couple of new fallocate features in this request - it was decided that it was easiest to push them through the XFS tree using topic branches and have the ext4 support be based on those branches. Hence you may see some overlap with the ext4 tree merge depending on how they including those topic branches into their tree. Other than that, there is O_TMPFILE support, some cleanups and bug fixes. The main changes in the XFS tree for 3.15-rc1 are: - O_TMPFILE support - allowing AIO+DIO writes beyond EOF - FALLOC_FL_COLLAPSE_RANGE support for fallocate syscall and XFS implementation - FALLOC_FL_ZERO_RANGE support for fallocate syscall and XFS implementation - IO verifier cleanup and rework - stack usage reduction changes - vm_map_ram NOIO context fixes to remove lockdep warings - various bug fixes and cleanups" * tag 'xfs-for-linus-3.15-rc1' of git://oss.sgi.com/xfs/xfs: (34 commits) xfs: fix directory hash ordering bug xfs: extra semi-colon breaks a condition xfs: Add support for FALLOC_FL_ZERO_RANGE fs: Introduce FALLOC_FL_ZERO_RANGE flag for fallocate xfs: inode log reservations are still too small xfs: xfs_check_page_type buffer checks need help xfs: avoid AGI/AGF deadlock scenario for inode chunk allocation xfs: use NOIO contexts for vm_map_ram xfs: don't leak EFSBADCRC to userspace xfs: fix directory inode iolock lockdep false positive xfs: allocate xfs_da_args to reduce stack footprint xfs: always do log forces via the workqueue xfs: modify verifiers to differentiate CRC from other errors xfs: print useful caller information in xfs_error_report xfs: add xfs_verifier_error() xfs: add helper for updating checksums on xfs_bufs xfs: add helper for verifying checksums on xfs_bufs xfs: Use defines for CRC offsets in all cases xfs: skip pointless CRC updates after verifier failures xfs: Add support FALLOC_FL_COLLAPSE_RANGE for fallocate ...
| * fs: Introduce FALLOC_FL_ZERO_RANGE flag for fallocateLukas Czerner2014-03-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Introduce new FALLOC_FL_ZERO_RANGE flag for fallocate. This has the same functionality as xfs ioctl XFS_IOC_ZERO_RANGE. It can be used to convert a range of file to zeros preferably without issuing data IO. Blocks should be preallocated for the regions that span holes in the file, and the entire range is preferable converted to unwritten extents - even though file system may choose to zero out the extent or do whatever which will result in reading zeros from the range while the range remains allocated for the file. This can be also used to preallocate blocks past EOF in the same way as with fallocate. Flag FALLOC_FL_KEEP_SIZE which should cause the inode size to remain the same. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
| * fs: Add new flag(FALLOC_FL_COLLAPSE_RANGE) for fallocateNamjae Jeon2014-02-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch is in response of the following post: http://lwn.net/Articles/556136/ "ext4: introduce two new ioctls" Dave chinner suggested that truncate_block_range (which was one of the ioctls name) should be a fallocate operation and not any fs specific ioctl, hence we add this functionality to new flags of fallocate. This new functionality of collapsing range could be used by media editing tools which does non linear editing to quickly purge and edit parts of a media file. This will immensely improve the performance of these operations. The limitation of fs block size aligned offsets can be easily handled by media codecs which are encapsulated in a conatiner as they have to just change the offset to next keyframe value to match the proper alignment. Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Ashish Sangwan <a.sangwan@samsung.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
* | vfs: atomic f_pos accesses as per POSIXLinus Torvalds2014-03-10
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Our write() system call has always been atomic in the sense that you get the expected thread-safe contiguous write, but we haven't actually guaranteed that concurrent writes are serialized wrt f_pos accesses, so threads (or processes) that share a file descriptor and use "write()" concurrently would quite likely overwrite each others data. This violates POSIX.1-2008/SUSv4 Section XSI 2.9.7 that says: "2.9.7 Thread Interactions with Regular File Operations All of the following functions shall be atomic with respect to each other in the effects specified in POSIX.1-2008 when they operate on regular files or symbolic links: [...]" and one of the effects is the file position update. This unprotected file position behavior is not new behavior, and nobody has ever cared. Until now. Yongzhi Pan reported unexpected behavior to Michael Kerrisk that was due to this. This resolves the issue with a f_pos-specific lock that is taken by read/write/lseek on file descriptors that may be shared across threads or processes. Reported-by: Yongzhi Pan <panyongzhi@gmail.com> Reported-by: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* locks: break delegations on any attribute modificationJ. Bruce Fields2013-11-09
| | | | | | | | | | | | | NFSv4 uses leases to guarantee that clients can cache metadata as well as data. Cc: Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz> Cc: David Howells <dhowells@redhat.com> Cc: Tyler Hicks <tyhicks@canonical.com> Cc: Dustin Kirkland <dustin.kirkland@gazzang.com> Acked-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* get rid of s_files and files_lockAl Viro2013-11-09
| | | | | | | | The only thing we need it for is alt-sysrq-r (emergency remount r/o) and these days we can do just as well without going through the list of files. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* file->f_op is never NULL...Al Viro2013-10-24
| | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>