summaryrefslogtreecommitdiff
path: root/mm/backing-dev.c (follow)
Commit message (Collapse)AuthorAge
* Merge remote-tracking branch 'common/android-4.4-p' into ↵Michael Bestas2021-12-27
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | lineage-18.1-caf-msm8998 * common/android-4.4-p: Linux 4.4.296 xen/netback: don't queue unlimited number of packages xen/console: harden hvc_xen against event channel storms xen/netfront: harden netfront against event channel storms xen/blkfront: harden blkfront against event channel storms Input: touchscreen - avoid bitwise vs logical OR warning ARM: 8805/2: remove unneeded naked function usage net: lan78xx: Avoid unnecessary self assignment net: systemport: Add global locking for descriptor lifecycle timekeeping: Really make sure wall_to_monotonic isn't positive USB: serial: option: add Telit FN990 compositions PCI/MSI: Clear PCI_MSIX_FLAGS_MASKALL on error USB: gadget: bRequestType is a bitfield, not a enum igbvf: fix double free in `igbvf_probe` soc/tegra: fuse: Fix bitwise vs. logical OR warning nfsd: fix use-after-free due to delegation race dm btree remove: fix use after free in rebalance_children() recordmcount.pl: look for jgnop instruction as well as bcrl on s390 mac80211: send ADDBA requests using the tid/queue of the aggregation session hwmon: (dell-smm) Fix warning on /proc/i8k creation error net: netlink: af_netlink: Prevent empty skb by adding a check on len. i2c: rk3x: Handle a spurious start completion interrupt flag parisc/agp: Annotate parisc agp init functions with __init nfc: fix segfault in nfc_genl_dump_devices_done FROMGIT: USB: gadget: bRequestType is a bitfield, not a enum Linux 4.4.295 irqchip: nvic: Fix offset for Interrupt Priority Offsets irqchip/irq-gic-v3-its.c: Force synchronisation when issuing INVALL iio: accel: kxcjk-1013: Fix possible memory leak in probe and remove iio: itg3200: Call iio_trigger_notify_done() on error iio: ltr501: Don't return error code in trigger handler iio: mma8452: Fix trigger reference couting iio: stk3310: Don't return error code in interrupt handler usb: core: config: fix validation of wMaxPacketValue entries USB: gadget: zero allocate endpoint 0 buffers USB: gadget: detect too-big endpoint 0 requests net/qla3xxx: fix an error code in ql_adapter_up() net, neigh: clear whole pneigh_entry at alloc time net: fec: only clear interrupt of handling queue in fec_enet_rx_queue() net: altera: set a couple error code in probe() net: cdc_ncm: Allow for dwNtbOutMaxSize to be unset or zero block: fix ioprio_get(IOPRIO_WHO_PGRP) vs setuid(2) tracefs: Set all files to the same group ownership as the mount option signalfd: use wake_up_pollfree() binder: use wake_up_pollfree() wait: add wake_up_pollfree() libata: add horkage for ASMedia 1092 can: pch_can: pch_can_rx_normal: fix use after free tracefs: Have new files inherit the ownership of their parent ALSA: pcm: oss: Handle missing errors in snd_pcm_oss_change_params*() ALSA: pcm: oss: Limit the period size to 16MB ALSA: pcm: oss: Fix negative period/buffer sizes ALSA: ctl: Fix copy of updated id with element read/write mm: bdi: initialize bdi_min_ratio when bdi is unregistered nfc: fix potential NULL pointer deref in nfc_genl_dump_ses_done can: sja1000: fix use after free in ems_pcmcia_add_card() HID: check for valid USB device for many HID drivers HID: wacom: fix problems when device is not a valid USB device HID: add USB_HID dependancy on some USB HID drivers HID: add USB_HID dependancy to hid-chicony HID: add USB_HID dependancy to hid-prodikeys HID: add hid_is_usb() function to make it simpler for USB detection HID: introduce hid_is_using_ll_driver UPSTREAM: USB: gadget: zero allocate endpoint 0 buffers UPSTREAM: USB: gadget: detect too-big endpoint 0 requests Linux 4.4.294 serial: pl011: Add ACPI SBSA UART match id tty: serial: msm_serial: Deactivate RX DMA for polling support vgacon: Propagate console boot parameters before calling `vc_resize' parisc: Fix "make install" on newer debian releases siphash: use _unaligned version by default net: qlogic: qlcnic: Fix a NULL pointer dereference in qlcnic_83xx_add_rings() natsemi: xtensa: fix section mismatch warnings fget: check that the fd still exists after getting a ref to it fs: add fget_many() and fput_many() sata_fsl: fix warning in remove_proc_entry when rmmod sata_fsl sata_fsl: fix UAF in sata_fsl_port_stop when rmmod sata_fsl kprobes: Limit max data_size of the kretprobe instances net: ethernet: dec: tulip: de4x5: fix possible array overflows in type3_infoblock() net: tulip: de4x5: fix the problem that the array 'lp->phy[8]' may be out of bound scsi: iscsi: Unblock session then wake up error handler s390/setup: avoid using memblock_enforce_memory_limit platform/x86: thinkpad_acpi: Fix WWAN device disabled issue after S3 deep net: return correct error code hugetlb: take PMD sharing into account when flushing tlb/caches tty: hvc: replace BUG_ON() with negative return value xen/netfront: don't trust the backend response data blindly xen/netfront: disentangle tx_skb_freelist xen/netfront: don't read data from request on the ring page xen/netfront: read response from backend only once xen/blkfront: don't trust the backend response data blindly xen/blkfront: don't take local copy of a request from the ring page xen/blkfront: read response from backend only once xen: sync include/xen/interface/io/ring.h with Xen's newest version shm: extend forced shm destroy to support objects from several IPC nses fuse: release pipe buf after last use fuse: fix page stealing NFC: add NCI_UNREG flag to eliminate the race proc/vmcore: fix clearing user buffer by properly using clear_user() hugetlbfs: flush TLBs correctly after huge_pmd_unshare tracing: Check pid filtering when creating events tcp_cubic: fix spurious Hystart ACK train detections for not-cwnd-limited flows scsi: mpt3sas: Fix kernel panic during drive powercycle test ARM: socfpga: Fix crash with CONFIG_FORTIRY_SOURCE NFSv42: Don't fail clone() unless the OP_CLONE operation failed net: ieee802154: handle iftypes as u32 ASoC: topology: Add missing rwsem around snd_ctl_remove() calls ARM: dts: BCM5301X: Add interrupt properties to GPIO node xen: detect uninitialized xenbus in xenbus_init xen: don't continue xenstore initialization in case of errors staging: rtl8192e: Fix use after free in _rtl92e_pci_disconnect() ALSA: ctxfi: Fix out-of-range access binder: fix test regression due to sender_euid change usb: hub: Fix locking issues with address0_mutex usb: hub: Fix usb enumeration issue due to address0 race USB: serial: option: add Fibocom FM101-GL variants USB: serial: option: add Telit LE910S1 0x9200 composition staging: ion: Prevent incorrect reference counting behavour Change-Id: Iadf9f213915d2a02b27ceb3b2144eac827ade329
| * mm: bdi: initialize bdi_min_ratio when bdi is unregisteredManjong Lee2021-12-14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 3c376dfafbf7a8ea0dea212d095ddd83e93280bb upstream. Initialize min_ratio if it is set during bdi unregistration. This can prevent problems that may occur a when bdi is removed without resetting min_ratio. For example. 1) insert external sdcard 2) set external sdcard's min_ratio 70 3) remove external sdcard without setting min_ratio 0 4) insert external sdcard 5) set external sdcard's min_ratio 70 << error occur(can't set) Because when an sdcard is removed, the present bdi_min_ratio value will remain. Currently, the only way to reset bdi_min_ratio is to reboot. [akpm@linux-foundation.org: tweak comment and coding style] Link: https://lkml.kernel.org/r/20211021161942.5983-1-mj0123.lee@samsung.com Signed-off-by: Manjong Lee <mj0123.lee@samsung.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Changheun Lee <nanich.lee@samsung.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Christoph Hellwig <hch@infradead.org> Cc: Matthew Wilcox <willy@infradead.org> Cc: <seunghwan.hyun@samsung.com> Cc: <sookwan7.kim@samsung.com> Cc: <yt0928.kim@samsung.com> Cc: <junho89.kim@samsung.com> Cc: <jisoo2146.oh@samsung.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* | Merge branch 'android-4.4-p' of ↵Michael Bestas2021-02-28
|\| | | | | | | | | | | | | | | | | | | https://android.googlesource.com/kernel/common into lineage-18.1-caf-msm8998 This brings LA.UM.9.2.r1-02500-SDMxx0.0 up to date with https://android.googlesource.com/kernel/common/ android-4.4-p at commit: 4fd124d1546d8 Merge 4.4.258 into android-4.4-p Change-Id: Idbae7489bc1d831a378dd60993f46139e5e28c4c
| * memcg: fix a crash in wb_workfn when a device disappearsTheodore Ts'o2021-02-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 68f23b89067fdf187763e75a56087550624fdbee ] Without memcg, there is a one-to-one mapping between the bdi and bdi_writeback structures. In this world, things are fairly straightforward; the first thing bdi_unregister() does is to shutdown the bdi_writeback structure (or wb), and part of that writeback ensures that no other work queued against the wb, and that the wb is fully drained. With memcg, however, there is a one-to-many relationship between the bdi and bdi_writeback structures; that is, there are multiple wb objects which can all point to a single bdi. There is a refcount which prevents the bdi object from being released (and hence, unregistered). So in theory, the bdi_unregister() *should* only get called once its refcount goes to zero (bdi_put will drop the refcount, and when it is zero, release_bdi gets called, which calls bdi_unregister). Unfortunately, del_gendisk() in block/gen_hd.c never got the memo about the Brave New memcg World, and calls bdi_unregister directly. It does this without informing the file system, or the memcg code, or anything else. This causes the root wb associated with the bdi to be unregistered, but none of the memcg-specific wb's are shutdown. So when one of these wb's are woken up to do delayed work, they try to dereference their wb->bdi->dev to fetch the device name, but unfortunately bdi->dev is now NULL, thanks to the bdi_unregister() called by del_gendisk(). As a result, *boom*. Fortunately, it looks like the rest of the writeback path is perfectly happy with bdi->dev and bdi->owner being NULL, so the simplest fix is to create a bdi_dev_name() function which can handle bdi->dev being NULL. This also allows us to bulletproof the writeback tracepoints to prevent them from dereferencing a NULL pointer and crashing the kernel if one is tracing with memcg's enabled, and an iSCSI device dies or a USB storage stick is pulled. The most common way of triggering this will be hotremoval of a device while writeback with memcg enabled is going on. It was triggering several times a day in a heavily loaded production environment. Google Bug Id: 145475544 Link: https://lore.kernel.org/r/20191227194829.150110-1-tytso@mit.edu Link: http://lkml.kernel.org/r/20191228005211.163952-1-tytso@mit.edu Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: Chris Mason <clm@fb.com> Cc: Tejun Heo <tj@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
* | Merge android-4.4.181 (bd858d7) into msm-4.4Srinivasarao P2019-06-12
|\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * refs/heads/tmp-bd858d7 Linux 4.4.181 ethtool: check the return value of get_regs_len ipv4: Define __ipv4_neigh_lookup_noref when CONFIG_INET is disabled fuse: Add FOPEN_STREAM to use stream_open() fs: stream_open - opener for stream-like files so that read and write can run simultaneously without deadlock drm/gma500/cdv: Check vbt config bits when detecting lvds panels genwqe: Prevent an integer overflow in the ioctl MIPS: pistachio: Build uImage.gz by default fuse: fallocate: fix return with locked inode parisc: Use implicit space register selection for loading the coherence index of I/O pdirs rcu: locking and unlocking need to always be at least barriers pktgen: do not sleep with the thread lock held. net: rds: fix memory leak in rds_ib_flush_mr_pool net/mlx4_en: ethtool, Remove unsupported SFP EEPROM high pages query neighbor: Call __ipv4_neigh_lookup_noref in neigh_xmit ethtool: fix potential userspace buffer overflow media: uvcvideo: Fix uvc_alloc_entity() allocation alignment usb: gadget: fix request length error for isoc transfer net: cdc_ncm: GetNtbFormat endian fix Revert "x86/build: Move _etext to actual end of .text" userfaultfd: don't pin the user memory in userfaultfd_file_create() brcmfmac: add subtype check for event handling in data path brcmfmac: add length checks in scheduled scan result handler brcmfmac: fix incorrect event channel deduction brcmfmac: revise handling events in receive path brcmfmac: screening firmware event packet brcmfmac: Add length checks on firmware events bnx2x: disable GSO where gso_size is too big for hardware net: create skb_gso_validate_mac_len() binder: replace "%p" with "%pK" binder: Replace "%p" with "%pK" for stable CIFS: cifs_read_allocate_pages: don't iterate through whole page array on ENOMEM kernel/signal.c: trace_signal_deliver when signal_group_exit memcg: make it work on sparse non-0-node systems tty: max310x: Fix external crystal register setup tty: serial: msm_serial: Fix XON/XOFF drm/nouveau/i2c: Disable i2c bus access after ->fini() ALSA: hda/realtek - Set default power save node to 0 Btrfs: fix race updating log root item during fsync scsi: zfcp: fix to prevent port_remove with pure auto scan LUNs (only sdevs) scsi: zfcp: fix missing zfcp_port reference put on -EBUSY from port_remove media: smsusb: better handle optional alignment media: usb: siano: Fix false-positive "uninitialized variable" warning media: usb: siano: Fix general protection fault in smsusb USB: rio500: fix memory leak in close after disconnect USB: rio500: refuse more than one device at a time USB: Add LPM quirk for Surface Dock GigE adapter USB: sisusbvga: fix oops in error path of sisusb_probe USB: Fix slab-out-of-bounds write in usb_get_bos_descriptor usb: xhci: avoid null pointer deref when bos field is NULL xhci: Convert xhci_handshake() to use readl_poll_timeout_atomic() include/linux/bitops.h: sanitize rotate primitives sparc64: Fix regression in non-hypervisor TLB flush xcall tipc: fix modprobe tipc failed after switch order of device registration -v2 Revert "tipc: fix modprobe tipc failed after switch order of device registration" xen/pciback: Don't disable PCI_COMMAND on PCI device reset. crypto: vmx - ghash: do nosimd fallback manually net: mvpp2: fix bad MVPP2_TXQ_SCHED_TOKEN_CNTR_REG queue value bnxt_en: Fix aggregation buffer leak under OOM condition. tipc: Avoid copying bytes beyond the supplied data usbnet: fix kernel crash after disconnect net: stmmac: fix reset gpio free missing net-gro: fix use-after-free read in napi_gro_frags() llc: fix skb leak in llc_build_and_send_ui_pkt() ipv6: Consider sk_bound_dev_if when binding a raw socket to an address ASoC: davinci-mcasp: Fix clang warning without CONFIG_PM spi: Fix zero length xfer bug spi: rspi: Fix sequencer reset during initialization spi : spi-topcliff-pch: Fix to handle empty DMA buffers scsi: lpfc: Fix SLI3 commands being issued on SLI4 devices media: saa7146: avoid high stack usage with clang media: go7007: avoid clang frame overflow warning with KASAN media: m88ds3103: serialize reset messages in m88ds3103_set_frontend scsi: qla4xxx: avoid freeing unallocated dma memory usb: core: Add PM runtime calls to usb_hcd_platform_shutdown rcutorture: Fix cleanup path for invalid torture_type strings tty: ipwireless: fix missing checks for ioremap virtio_console: initialize vtermno value for ports media: wl128x: prevent two potential buffer overflows spi: tegra114: reset controller on probe cxgb3/l2t: Fix undefined behaviour ASoC: fsl_utils: fix a leaked reference by adding missing of_node_put ASoC: eukrea-tlv320: fix a leaked reference by adding missing of_node_put HID: core: move Usage Page concatenation to Main item chardev: add additional check for minor range overlap x86/ia32: Fix ia32_restore_sigcontext() AC leak arm64: cpu_ops: fix a leaked reference by adding missing of_node_put scsi: ufs: Avoid configuring regulator with undefined voltage range scsi: ufs: Fix regulator load and icc-level configuration brcmfmac: fix race during disconnect when USB completion is in progress brcmfmac: convert dev_init_lock mutex to completion b43: shut up clang -Wuninitialized variable warning brcmfmac: fix missing checks for kmemdup rtlwifi: fix a potential NULL pointer dereference iio: common: ssp_sensors: Initialize calculated_time in ssp_common_process_data iio: hmc5843: fix potential NULL pointer dereferences iio: ad_sigma_delta: Properly handle SPI bus locking vs CS assertion x86/build: Keep local relocations with ld.lld cpufreq: pmac32: fix possible object reference leak cpufreq/pasemi: fix possible object reference leak cpufreq: ppc_cbe: fix possible object reference leak s390: cio: fix cio_irb declaration extcon: arizona: Disable mic detect if running when driver is removed PM / core: Propagate dev->power.wakeup_path when no callbacks mmc: sdhci-of-esdhc: add erratum eSDHC-A001 and A-008358 support mmc: sdhci-of-esdhc: add erratum eSDHC5 support mmc_spi: add a status check for spi_sync_locked scsi: libsas: Do discovery on empty PHY to update PHY info hwmon: (f71805f) Use request_muxed_region for Super-IO accesses hwmon: (pc87427) Use request_muxed_region for Super-IO accesses hwmon: (smsc47b397) Use request_muxed_region for Super-IO accesses hwmon: (smsc47m1) Use request_muxed_region for Super-IO accesses hwmon: (vt1211) Use request_muxed_region for Super-IO accesses RDMA/cxgb4: Fix null pointer dereference on alloc_skb failure i40e: don't allow changes to HW VLAN stripping on active port VLANs x86/irq/64: Limit IST stack overflow check to #DB stack USB: core: Don't unbind interfaces following device reset failure sched/core: Handle overflow in cpu_shares_write_u64 sched/core: Check quota and period overflow at usec to nsec conversion powerpc/numa: improve control of topology updates media: pvrusb2: Prevent a buffer overflow media: au0828: Fix NULL pointer dereference in au0828_analog_stream_enable() audit: fix a memory leak bug media: ov2659: make S_FMT succeed even if requested format doesn't match media: au0828: stop video streaming only when last user stops media: ov6650: Move v4l2_clk_get() to ov6650_video_probe() helper media: coda: clear error return value before picture run dmaengine: at_xdmac: remove BUG_ON macro in tasklet pinctrl: pistachio: fix leaked of_node references HID: logitech-hidpp: use RAP instead of FAP to get the protocol version mm/uaccess: Use 'unsigned long' to placate UBSAN warnings on older GCC versions x86/mm: Remove in_nmi() warning from 64-bit implementation of vmalloc_fault() smpboot: Place the __percpu annotation correctly x86/build: Move _etext to actual end of .text bcache: avoid clang -Wunintialized warning bcache: add failure check to run_cache_set() for journal replay bcache: fix failure in journal relplay bcache: return error immediately in bch_journal_replay() net: cw1200: fix a NULL pointer dereference mwifiex: prevent an array overflow ASoC: fsl_sai: Update is_slave_mode with correct value mac80211/cfg80211: update bss channel on channel switch dmaengine: pl330: _stop: clear interrupt status w1: fix the resume command API rtc: 88pm860x: prevent use-after-free on device remove brcm80211: potential NULL dereference in brcmf_cfg80211_vndr_cmds_dcmd_handler() spi: pxa2xx: fix SCR (divisor) calculation ASoC: imx: fix fiq dependencies powerpc/boot: Fix missing check of lseek() return value mmc: core: Verify SD bus width cxgb4: Fix error path in cxgb4_init_module gfs2: Fix lru_count going negative tools include: Adopt linux/bits.h perf tools: No need to include bitops.h in util.h at76c50x-usb: Don't register led_trigger if usb_register_driver failed ssb: Fix possible NULL pointer dereference in ssb_host_pcmcia_exit media: vivid: use vfree() instead of kfree() for dev->bitmap_cap media: cpia2: Fix use-after-free in cpia2_exit fbdev: fix WARNING in __alloc_pages_nodemask bug hugetlb: use same fault hash key for shared and private mappings fbdev: fix divide error in fb_var_to_videomode btrfs: sysfs: don't leak memory when failing add fsid Btrfs: fix race between ranged fsync and writeback of adjacent ranges gfs2: Fix sign extension bug in gfs2_update_stats crypto: vmx - CTR: always increment IV as quadword Revert "scsi: sd: Keep disk read-only when re-reading partition" bio: fix improper use of smp_mb__before_atomic() KVM: x86: fix return value for reserved EFER ext4: do not delete unlinked inode from orphan list on failed truncate fbdev: sm712fb: fix memory frequency by avoiding a switch/case fallthrough btrfs: Honour FITRIM range constraints during free space trim md/raid: raid5 preserve the writeback action after the parity check Revert "Don't jump to compute_result state from check_result state" perf bench numa: Add define for RUSAGE_THREAD if not present ufs: fix braino in ufs_get_inode_gid() for solaris UFS flavour power: supply: sysfs: prevent endless uevent loop with CONFIG_POWER_SUPPLY_DEBUG KVM: arm/arm64: Ensure vcpu target is unset on reset failure xfrm4: Fix uninitialized memory read in _decode_session4 vti4: ipip tunnel deregistration fixes. xfrm6_tunnel: Fix potential panic when unloading xfrm6_tunnel module xfrm: policy: Fix out-of-bound array accesses in __xfrm_policy_unlink dm delay: fix a crash when invalid device is specified PCI: Mark Atheros AR9462 to avoid bus reset fbdev: sm712fb: fix crashes and garbled display during DPMS modesetting fbdev: sm712fb: use 1024x768 by default on non-MIPS, fix garbled display fbdev: sm712fb: fix support for 1024x768-16 mode fbdev: sm712fb: fix crashes during framebuffer writes by correctly mapping VRAM fbdev: sm712fb: fix boot screen glitch when sm712fb replaces VGA fbdev: sm712fb: fix white screen of death on reboot, don't set CR3B-CR3F fbdev: sm712fb: fix VRAM detection, don't set SR70/71/74/75 fbdev: sm712fb: fix brightness control on reboot, don't set SR30 perf intel-pt: Fix sample timestamp wrt non-taken branches perf intel-pt: Fix improved sample timestamp perf intel-pt: Fix instructions sampling rate memory: tegra: Fix integer overflow on tick value calculation tracing: Fix partial reading of trace event's id file ceph: flush dirty inodes before proceeding with remount iommu/tegra-smmu: Fix invalid ASID bits on Tegra30/114 fuse: honor RLIMIT_FSIZE in fuse_file_fallocate fuse: fix writepages on 32bit clk: tegra: Fix PLLM programming on Tegra124+ when PMC overrides divider NFS4: Fix v4.0 client state corruption when mount media: ov6650: Fix sensor possibly not detected on probe cifs: fix strcat buffer overflow and reduce raciness in smb21_set_oplock_level() of: fix clang -Wunsequenced for be32_to_cpu() intel_th: msu: Fix single mode with IOMMU md: add mddev->pers to avoid potential NULL pointer dereference stm class: Fix channel free in stm output free path tipc: fix modprobe tipc failed after switch order of device registration tipc: switch order of device registration to fix a crash ppp: deflate: Fix possible crash in deflate_init net/mlx4_core: Change the error print to info print net: avoid weird emergency message KVM: x86: Skip EFER vs. guest CPUID checks for host-initiated writes ALSA: hda/realtek - Fix for Lenovo B50-70 inverted internal microphone bug ext4: zero out the unused memory region in the extent tree block fs/writeback.c: use rcu_barrier() to wait for inflight wb switches going into workqueue when umount writeback: synchronize sync(2) against cgroup writeback membership switches crypto: arm/aes-neonbs - don't access already-freed walk.iv crypto: salsa20 - don't access already-freed walk.iv crypto: chacha20poly1305 - set cra_name correctly crypto: gcm - fix incompatibility between "gcm" and "gcm_base" crypto: gcm - Fix error return code in crypto_gcm_create_common() ipmi:ssif: compare block number correctly for multi-part return messages bcache: never set KEY_PTRS of journal key to 0 in journal_reclaim() bcache: fix a race between cache register and cacheset unregister Btrfs: do not start a transaction at iterate_extent_inodes() ext4: fix ext4_show_options for file systems w/o journal ext4: actually request zeroing of inode table after grow tty/vt: fix write/write race in ioctl(KDSKBSENT) handler mfd: da9063: Fix OTP control register names to match datasheets for DA9063/63L ocfs2: fix ocfs2 read inode data panic in ocfs2_iget mm/mincore.c: make mincore() more conservative ASoC: RT5677-SPI: Disable 16Bit SPI Transfers ASoC: max98090: Fix restore of DAPM Muxes ALSA: hda/realtek - EAPD turn on later ALSA: hda/hdmi - Consider eld_valid when reporting jack event ALSA: usb-audio: Fix a memory leak bug crypto: x86/crct10dif-pcl - fix use via crypto_shash_digest() crypto: crct10dif-generic - fix use via crypto_shash_digest() crypto: vmx - fix copy-paste error in CTR mode ARM: exynos: Fix a leaked reference by adding missing of_node_put x86/speculation/mds: Improve CPU buffer clear documentation x86/speculation/mds: Revert CPU buffer clear on double fault exit f2fs: link f2fs quota ops for sysfile fs: sdcardfs: Add missing option to show_options Conflicts: drivers/scsi/sd.c drivers/scsi/ufs/ufshcd.c Change-Id: If6679c7cc8c3fee323c749ac359353fbebfd12d9 Signed-off-by: Srinivasarao P <spathi@codeaurora.org>
| * writeback: synchronize sync(2) against cgroup writeback membership switchesTejun Heo2019-06-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 7fc5854f8c6efae9e7624970ab49a1eac2faefb1 upstream. sync_inodes_sb() can race against cgwb (cgroup writeback) membership switches and fail to writeback some inodes. For example, if an inode switches to another wb while sync_inodes_sb() is in progress, the new wb might not be visible to bdi_split_work_to_wbs() at all or the inode might jump from a wb which hasn't issued writebacks yet to one which already has. This patch adds backing_dev_info->wb_switch_rwsem to synchronize cgwb switch path against sync_inodes_sb() so that sync_inodes_sb() is guaranteed to see all the target wbs and inodes can't jump wbs to escape syncing. v2: Fixed misplaced rwsem init. Spotted by Jiufei. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Jiufei Xue <xuejiufei@gmail.com> Link: http://lkml.kernel.org/r/dc694ae2-f07f-61e1-7097-7c8411cee12d@gmail.com Acked-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* | Merge android-4.4.127 (d6bbe8b) into msm-4.4Srinivasarao P2018-04-20
|\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * refs/heads/tmp-d6bbe8b Linux 4.4.127 Revert "ip6_vti: adjust vti mtu according to mtu of lower device" net: cavium: liquidio: fix up "Avoid dma_unmap_single on uninitialized ndata" spi: davinci: fix up dma_mapping_error() incorrect patch Revert "mtip32xx: use runtime tag to initialize command header" Revert "cpufreq: Fix governor module removal race" Revert "ARM: dts: omap3-n900: Fix the audio CODEC's reset pin" Revert "ARM: dts: am335x-pepper: Fix the audio CODEC's reset pin" Revert "PCI/MSI: Stop disabling MSI/MSI-X in pci_device_shutdown()" nospec: Kill array_index_nospec_mask_check() nospec: Move array_index_nospec() parameter checking into separate macro net: hns: Fix ethtool private flags md/raid10: reset the 'first' at the end of loop ARM: dts: am57xx-beagle-x15-common: Add overide powerhold property ARM: dts: dra7: Add power hold and power controller properties to palmas Documentation: pinctrl: palmas: Add ti,palmas-powerhold-override property definition vt: change SGR 21 to follow the standards Input: i8042 - enable MUX on Sony VAIO VGN-CS series to fix touchpad Input: i8042 - add Lenovo ThinkPad L460 to i8042 reset list staging: comedi: ni_mio_common: ack ai fifo error interrupts. fs/proc: Stop trying to report thread stacks crypto: x86/cast5-avx - fix ECB encryption when long sg follows short one crypto: ahash - Fix early termination in hash walk parport_pc: Add support for WCH CH382L PCI-E single parallel port card. media: usbtv: prevent double free in error case mei: remove dev_err message on an unsupported ioctl USB: serial: cp210x: add ELDAT Easywave RX09 id USB: serial: ftdi_sio: add support for Harman FirmwareHubEmulator USB: serial: ftdi_sio: add RT Systems VX-8 cable usb: dwc2: Improve gadget state disconnection handling scsi: virtio_scsi: always read VPD pages for multiqueue too llist: clang: introduce member_address_is_nonnull() Bluetooth: Fix missing encryption refresh on Security Request netfilter: x_tables: add and use xt_check_proc_name netfilter: bridge: ebt_among: add more missing match size checks xfrm: Refuse to insert 32 bit userspace socket policies on 64 bit systems net: xfrm: use preempt-safe this_cpu_read() in ipcomp_alloc_tfms() RDMA/ucma: Introduce safer rdma_addr_size() variants RDMA/ucma: Don't allow join attempts for unsupported AF family RDMA/ucma: Check that device exists prior to accessing it RDMA/ucma: Check that device is connected prior to access it RDMA/ucma: Ensure that CM_ID exists prior to access it RDMA/ucma: Fix use-after-free access in ucma_close RDMA/ucma: Check AF family prior resolving address xfrm_user: uncoditionally validate esn replay attribute struct arm64: avoid overflow in VA_START and PAGE_OFFSET selinux: Remove redundant check for unknown labeling behavior netfilter: ctnetlink: Make some parameters integer to avoid enum mismatch tty: provide tty_name() even without CONFIG_TTY audit: add tty field to LOGIN event frv: declare jiffies to be located in the .data section jiffies.h: declare jiffies and jiffies_64 with ____cacheline_aligned_in_smp fs: compat: Remove warning from COMPATIBLE_IOCTL selinux: Remove unnecessary check of array base in selinux_set_mapping() cpumask: Add helper cpumask_available() genirq: Use cpumask_available() for check of cpumask variable netfilter: nf_nat_h323: fix logical-not-parentheses warning Input: mousedev - fix implicit conversion warning dm ioctl: remove double parentheses PCI: Make PCI_ROM_ADDRESS_MASK a 32-bit constant writeback: fix the wrong congested state variable definition ACPI, PCI, irq: remove redundant check for null string pointer kprobes/x86: Fix to set RWX bits correctly before releasing trampoline usb: gadget: f_hid: fix: Prevent accessing released memory usb: gadget: align buffer size when allocating for OUT endpoint usb: gadget: fix usb_ep_align_maybe endianness and new usb_ep_align usb: gadget: change len to size_t on alloc_ep_req() usb: gadget: define free_ep_req as universal function partitions/msdos: Unable to mount UFS 44bsd partitions perf/hwbp: Simplify the perf-hwbp code, fix documentation ALSA: pcm: potential uninitialized return values ALSA: pcm: Use dma_bytes as size parameter in dma_mmap_coherent() mtd: jedec_probe: Fix crash in jedec_read_mfr() Replace #define with enum for better compilation errors. Add missing include to drivers/tty/goldfish.c Fix whitespace in drivers/tty/goldfish.c ANDROID: fuse: Add null terminator to path in canonical path to avoid issue ANDROID: sdcardfs: Fix sdcardfs to stop creating cases-sensitive duplicate entries. ANDROID: add missing include to pdev_bus ANDROID: pdev_bus: replace writel with gf_write_ptr ANDROID: Cleanup type casting in goldfish.h ANDROID: Include missing headers in goldfish.h ANDROID: cpufreq: times: skip printing invalid frequencies ANDROID: xt_qtaguid: Remove unnecessary null checks to device's name ANDROID: xt_qtaguid: Remove unnecessary null checks to ifa_label ANDROID: cpufreq: times: allocate enough space for a uid_entry Linux 4.4.126 net: systemport: Rewrite __bcm_sysport_tx_reclaim() net: fec: Fix unbalanced PM runtime calls ieee802154: 6lowpan: fix possible NULL deref in lowpan_device_event() s390/qeth: on channel error, reject further cmd requests s390/qeth: lock read device while queueing next buffer s390/qeth: when thread completes, wake up all waiters s390/qeth: free netdevice when removing a card team: Fix double free in error path skbuff: Fix not waking applications when errors are enqueued net: Only honor ifindex in IP_PKTINFO if non-0 netlink: avoid a double skb free in genlmsg_mcast() net/iucv: Free memory obtained by kzalloc net: ethernet: ti: cpsw: add check for in-band mode setting with RGMII PHY interface net: ethernet: arc: Fix a potential memory leak if an optional regulator is deferred l2tp: do not accept arbitrary sockets ipv6: fix access to non-linear packet in ndisc_fill_redirect_hdr_option() dccp: check sk for closed state in dccp_sendmsg() net: Fix hlist corruptions in inet_evict_bucket() Revert "genirq: Use irqd_get_trigger_type to compare the trigger type for shared IRQs" scsi: sg: don't return bogus Sg_requests Revert "genirq: Use irqd_get_trigger_type to compare the trigger type for shared IRQs" UPSTREAM: drm: virtio-gpu: set atomic flag UPSTREAM: drm: virtio-gpu: transfer dumb buffers to host on plane update UPSTREAM: drm: virtio-gpu: ensure plane is flushed to host on atomic update UPSTREAM: drm: virtio-gpu: get the fb from the plane state for atomic updates Linux 4.4.125 bpf, x64: increase number of passes bpf: skip unnecessary capability check kbuild: disable clang's default use of -fmerge-all-constants staging: lustre: ptlrpc: kfree used instead of kvfree perf/x86/intel: Don't accidentally clear high bits in bdw_limit_period() x86/entry/64: Don't use IST entry for #BP stack x86/boot/64: Verify alignment of the LOAD segment x86/build/64: Force the linker to use 2MB page size kvm/x86: fix icebp instruction handling tty: vt: fix up tabstops properly can: cc770: Fix use after free in cc770_tx_interrupt() can: cc770: Fix queue stall & dropped RTR reply can: cc770: Fix stalls on rt-linux, remove redundant IRQ ack staging: ncpfs: memory corruption in ncp_read_kernel() mtd: nand: fsl_ifc: Fix nand waitfunc return value tracing: probeevent: Fix to support minus offset from symbol rtlwifi: rtl8723be: Fix loss of signal brcmfmac: fix P2P_DEVICE ethernet address generation acpi, numa: fix pxm to online numa node associations drm: udl: Properly check framebuffer mmap offsets drm/radeon: Don't turn off DP sink when disconnected drm/vmwgfx: Fix a destoy-while-held mutex problem. x86/mm: implement free pmd/pte page interfaces mm/vmalloc: add interfaces to free unmapped page table libata: Modify quirks for MX100 to limit NCQ_TRIM quirk to MU01 version libata: Make Crucial BX100 500GB LPM quirk apply to all firmware versions libata: Apply NOLPM quirk to Crucial M500 480 and 960GB SSDs libata: Enable queued TRIM for Samsung SSD 860 libata: disable LPM for Crucial BX100 SSD 500GB drive libata: Apply NOLPM quirk to Crucial MX100 512GB SSDs libata: remove WARN() for DMA or PIO command without data libata: fix length validation of ATAPI-relayed SCSI commands Bluetooth: btusb: Fix quirk for Atheros 1525/QCA6174 clk: bcm2835: Protect sections updating shared registers ahci: Add PCI-id for the Highpoint Rocketraid 644L card PCI: Add function 1 DMA alias quirk for Highpoint RocketRAID 644L mmc: dw_mmc: fix falling from idmac to PIO mode when dw_mci_reset occurs ALSA: hda/realtek - Always immediately update mute LED with pin VREF ALSA: aloop: Fix access to not-yet-ready substream via cable ALSA: aloop: Sync stale timer before release ALSA: usb-audio: Fix parsing descriptor of UAC2 processing unit iio: st_pressure: st_accel: pass correct platform data to init MIPS: ralink: Remove ralink_halt() ANDROID: cpufreq: times: fix proc_time_in_state_show dtc: turn off dtc unit address warnings by default Linux 4.4.124 RDMA/ucma: Fix access to non-initialized CM_ID object dmaengine: ti-dma-crossbar: Fix event mapping for TPCC_EVT_MUX_60_63 clk: si5351: Rename internal plls to avoid name collisions nfsd4: permit layoutget of executable-only files RDMA/ocrdma: Fix permissions for OCRDMA_RESET_STATS ip6_vti: adjust vti mtu according to mtu of lower device iommu/vt-d: clean up pr_irq if request_threaded_irq fails pinctrl: Really force states during suspend/resume coresight: Fix disabling of CoreSight TPIU pty: cancel pty slave port buf's work in tty_release drm/omap: DMM: Check for DMM readiness after successful transaction commit vgacon: Set VGA struct resource types IB/umem: Fix use of npages/nmap fields RDMA/cma: Use correct size when writing netlink stats IB/ipoib: Avoid memory leak if the SA returns a different DGID mmc: avoid removing non-removable hosts during suspend platform/chrome: Use proper protocol transfer function cros_ec: fix nul-termination for firmware build info media: [RESEND] media: dvb-frontends: Add delay to Si2168 restart media: bt8xx: Fix err 'bt878_probe()' rtlwifi: rtl_pci: Fix the bug when inactiveps is enabled. RDMA/iwpm: Fix uninitialized error code in iwpm_send_mapinfo() drm/msm: fix leak in failed get_pages media: c8sectpfe: fix potential NULL pointer dereference in c8sectpfe_timer_interrupt Bluetooth: hci_qca: Avoid setup failure on missing rampatch perf tests kmod-path: Don't fail if compressed modules aren't supported rtc: ds1374: wdt: Fix stop/start ioctl always returning -EINVAL rtc: ds1374: wdt: Fix issue with timeout scaling from secs to wdt ticks cifs: small underflow in cnvrtDosUnixTm() net: hns: fix ethtool_get_strings overflow in hns driver sm501fb: don't return zero on failure path in sm501fb_start() video: fbdev: udlfb: Fix buffer on stack tcm_fileio: Prevent information leak for short reads ia64: fix module loading for gcc-5.4 md/raid10: skip spare disk as 'first' disk Input: twl4030-pwrbutton - use correct device for irq request power: supply: pda_power: move from timer to delayed_work bnx2x: Align RX buffers drm/nouveau/kms: Increase max retries in scanout position queries. ACPI / PMIC: xpower: Fix power_table addresses ipmi/watchdog: fix wdog hang on panic waiting for ipmi response ARM: DRA7: clockdomain: Change the CLKTRCTRL of CM_PCIE_CLKSTCTRL to SW_WKUP mmc: sdhci-of-esdhc: limit SD clock for ls1012a/ls1046a staging: wilc1000: fix unchecked return value staging: unisys: visorhba: fix s-Par to boot with option CONFIG_VMAP_STACK set to y mtip32xx: use runtime tag to initialize command header mfd: palmas: Reset the POWERHOLD mux during power off mac80211: don't parse encrypted management frames in ieee80211_frame_acked Btrfs: send, fix file hole not being preserved due to inline extent rndis_wlan: add return value validation mt7601u: check return value of alloc_skb iio: st_pressure: st_accel: Initialise sensor platform data properly NFS: don't try to cross a mountpount when there isn't one there. infiniband/uverbs: Fix integer overflows scsi: mac_esp: Replace bogus memory barrier with spinlock qlcnic: fix unchecked return value wan: pc300too: abort path on failure mmc: host: omap_hsmmc: checking for NULL instead of IS_ERR() openvswitch: Delete conntrack entry clashing with an expectation. netfilter: xt_CT: fix refcnt leak on error path Fix driver usage of 128B WQEs when WQ_CREATE is V1. ASoC: Intel: Skylake: Uninitialized variable in probe_codec() IB/mlx4: Change vma from shared to private IB/mlx4: Take write semaphore when changing the vma struct HSI: ssi_protocol: double free in ssip_pn_xmit() IB/ipoib: Update broadcast object if PKey value was changed in index 0 IB/ipoib: Fix deadlock between ipoib_stop and mcast join flow ALSA: hda - Fix headset microphone detection for ASUS N551 and N751 e1000e: fix timing for 82579 Gigabit Ethernet controller tcp: remove poll() flakes with FastOpen NFS: Fix missing pg_cleanup after nfs_pageio_cond_complete() md/raid10: wait up frozen array in handle_write_completed iommu/omap: Register driver before setting IOMMU ops ARM: 8668/1: ftrace: Fix dynamic ftrace with DEBUG_RODATA and !FRAME_POINTER KVM: PPC: Book3S PR: Exit KVM on failed mapping scsi: virtio_scsi: Always try to read VPD pages clk: ns2: Correct SDIO bits ath: Fix updating radar flags for coutry code India spi: dw: Disable clock after unregistering the host media/dvb-core: Race condition when writing to CAM net: ipv6: send unsolicited NA on admin up i2c: i2c-scmi: add a MS HID genirq: Use irqd_get_trigger_type to compare the trigger type for shared IRQs cpufreq/sh: Replace racy task affinity logic ACPI/processor: Replace racy task affinity logic ACPI/processor: Fix error handling in __acpi_processor_start() time: Change posix clocks ops interfaces to use timespec64 Input: ar1021_i2c - fix too long name in driver's device table rtc: cmos: Do not assume irq 8 for rtc when there are no legacy irqs x86: i8259: export legacy_pic symbol regulator: anatop: set default voltage selector for pcie platform/x86: asus-nb-wmi: Add wapf4 quirk for the X302UA staging: android: ashmem: Fix possible deadlock in ashmem_ioctl CIFS: Enable encryption during session setup phase SMB3: Validate negotiate request must always be signed tpm_tis: fix potential buffer overruns caused by bit glitches on the bus tpm: fix potential buffer overruns caused by bit glitches on the bus BACKPORT, FROMLIST: crypto: arm64/speck - add NEON-accelerated implementation of Speck-XTS Linux 4.4.123 bpf: fix incorrect sign extension in check_alu_op() usb: gadget: bdc: 64-bit pointer capability check USB: gadget: udc: Add missing platform_device_put() on error in bdc_pci_probe() btrfs: Fix use-after-free when cleaning up fs_devs with a single stale device btrfs: alloc_chunk: fix DUP stripe size handling ARM: dts: LogicPD Torpedo: Fix I2C1 pinmux scsi: sg: only check for dxfer_len greater than 256M scsi: sg: fix static checker warning in sg_is_valid_dxfer scsi: sg: fix SG_DXFER_FROM_DEV transfers irqchip/gic-v3-its: Ensure nr_ites >= nr_lpis fs/aio: Use RCU accessors for kioctx_table->table[] fs/aio: Add explicit RCU grace period when freeing kioctx lock_parent() needs to recheck if dentry got __dentry_kill'ed under it fs: Teach path_connected to handle nfs filesystems with multiple roots. drm/amdgpu/dce: Don't turn off DP sink when disconnected ALSA: seq: Clear client entry before deleting else at closing ALSA: seq: Fix possible UAF in snd_seq_check_queue() ALSA: hda - Revert power_save option default value ALSA: pcm: Fix UAF in snd_pcm_oss_get_formats() x86/mm: Fix vmalloc_fault to use pXd_large x86/vm86/32: Fix POPF emulation selftests/x86/entry_from_vm86: Add test cases for POPF selftests/x86: Add tests for the STR and SLDT instructions selftests/x86: Add tests for User-Mode Instruction Prevention selftests/x86/entry_from_vm86: Exit with 1 if we fail ima: relax requiring a file signature for new files with zero length rcutorture/configinit: Fix build directory error message ipvlan: add L2 check for packets arriving via virtual devices ASoC: nuc900: Fix a loop timeout test mac80211: remove BUG() when interface type is invalid mac80211_hwsim: enforce PS_MANUAL_POLL to be set after PS_ENABLED agp/intel: Flush all chipset writes after updating the GGTT drm/amdkfd: Fix memory leaks in kfd topology veth: set peer GSO values media: cpia2: Fix a couple off by one bugs scsi: dh: add new rdac devices scsi: devinfo: apply to HP XP the same flags as Hitachi VSP scsi: core: scsi_get_device_flags_keyed(): Always return device flags spi: sun6i: disable/unprepare clocks on remove tools/usbip: fixes build with musl libc toolchain ath10k: fix invalid STS_CAP_OFFSET_MASK clk: qcom: msm8916: fix mnd_width for codec_digcodec cpufreq: Fix governor module removal race ath10k: update tdls teardown state to target ARM: dts: omap3-n900: Fix the audio CODEC's reset pin ARM: dts: am335x-pepper: Fix the audio CODEC's reset pin mtd: nand: fix interpretation of NAND_CMD_NONE in nand_command[_lp]() net: xfrm: allow clearing socket xfrm policies. test_firmware: fix setting old custom fw path back on exit sched: Stop resched_cpu() from sending IPIs to offline CPUs sched: Stop switched_to_rt() from sending IPIs to offline CPUs ARM: dts: exynos: Correct Trats2 panel reset line HID: elo: clear BTN_LEFT mapping video/hdmi: Allow "empty" HDMI infoframes drm/edid: set ELD connector type in drm_edid_to_eld() wil6210: fix memory access violation in wil_memcpy_from/toio_32 pwm: tegra: Increase precision in PWM rate calculation kprobes/x86: Set kprobes pages read-only kprobes/x86: Fix kprobe-booster not to boost far call instructions scsi: sg: close race condition in sg_remove_sfp_usercontext() scsi: sg: check for valid direction before starting the request perf session: Don't rely on evlist in pipe mode perf inject: Copy events when reordering events in pipe mode drivers/perf: arm_pmu: handle no platform_device usb: gadget: dummy_hcd: Fix wrong power status bit clear/reset in dummy_hub_control() usb: dwc2: Make sure we disconnect the gadget state md/raid6: Fix anomily when recovering a single device in RAID6. regulator: isl9305: fix array size MIPS: r2-on-r6-emu: Clear BLTZALL and BGEZALL debugfs counters MIPS: r2-on-r6-emu: Fix BLEZL and BGTZL identification MIPS: BPF: Fix multiple problems in JIT skb access helpers. MIPS: BPF: Quit clobbering callee saved registers in JIT code. coresight: Fixes coresight DT parse to get correct output port ID. drm/amdgpu: Fail fb creation from imported dma-bufs. (v2) drm/radeon: Fail fb creation from imported dma-bufs. video: ARM CLCD: fix dma allocation size iommu/iova: Fix underflow bug in __alloc_and_insert_iova_range apparmor: Make path_max parameter readonly scsi: ses: don't get power status of SES device slot on probe fm10k: correctly check if interface is removed ALSA: firewire-digi00x: handle all MIDI messages on streaming packets reiserfs: Make cancel_old_flush() reliable ARM: dts: koelsch: Correct clock frequency of X2 DU clock input net/faraday: Add missing include of of.h powerpc: Avoid taking a data miss on every userspace instruction miss ARM: dts: r8a7791: Correct parent of SSI[0-9] clocks ARM: dts: r8a7790: Correct parent of SSI[0-9] clocks NFC: nfcmrvl: double free on error path NFC: nfcmrvl: Include unaligned.h instead of access_ok.h vxlan: vxlan dev should inherit lowerdev's gso_max_size drm/vmwgfx: Fixes to vmwgfx_fb braille-console: Fix value returned by _braille_console_setup bonding: refine bond_fold_stats() wrap detection f2fs: relax node version check for victim data in gc blk-throttle: make sure expire time isn't too big mm: Fix false-positive VM_BUG_ON() in page_cache_{get,add}_speculative() driver: (adm1275) set the m,b and R coefficients correctly for power dmaengine: imx-sdma: add 1ms delay to ensure SDMA channel is stopped tcp: sysctl: Fix a race to avoid unexpected 0 window from space spi: omap2-mcspi: poll OMAP2_MCSPI_CHSTAT_RXS for PIO transfer ASoC: rcar: ssi: don't set SSICR.CKDV = 000 with SSIWSR.CONT sched: act_csum: don't mangle TCP and UDP GSO packets Input: qt1070 - add OF device ID table sysrq: Reset the watchdog timers while displaying high-resolution timers timers, sched_clock: Update timeout for clock wrap media: i2c/soc_camera: fix ov6650 sensor getting wrong clock scsi: ipr: Fix missed EH wakeup solo6x10: release vb2 buffers in solo_stop_streaming() of: fix of_device_get_modalias returned length when truncating buffers batman-adv: handle race condition for claims between gateways ARM: dts: Adjust moxart IRQ controller and flags net/8021q: create device with all possible features in wanted_features HID: clamp input to logical range if no null state perf probe: Return errno when not hitting any event ath10k: disallow DFS simulation if DFS channel is not enabled drm: Defer disabling the vblank IRQ until the next interrupt (for instant-off) drivers: net: xgene: Fix hardware checksum setting perf tools: Make perf_event__synthesize_mmap_events() scale i40e: fix ethtool to get EEPROM data from X722 interface i40e: Acquire NVM lock before reads on all devices perf sort: Fix segfault with basic block 'cycles' sort dimension selinux: check for address length in selinux_socket_bind() PCI/MSI: Stop disabling MSI/MSI-X in pci_device_shutdown() ath10k: fix a warning during channel switch with multiple vaps drm: qxl: Don't alloc fbdev if emulation is not supported HID: reject input outside logical range only if null state is set staging: wilc1000: add check for kmalloc allocation failure. staging: speakup: Replace BUG_ON() with WARN_ON(). Input: tsc2007 - check for presence and power down tsc2007 during probe blkcg: fix double free of new_blkg in blkcg_init_queue ANDROID: cpufreq: times: avoid prematurely freeing uid_entry ANDROID: Use standard logging functions in goldfish_pipe ANDROID: Fix whitespace in goldfish staging: android: ashmem: Fix possible deadlock in ashmem_ioctl llist: clang: introduce member_address_is_nonnull() Linux 4.4.122 fixup: sctp: verify size of a new chunk in _sctp_make_chunk() serial: 8250_pci: Add Brainboxes UC-260 4 port serial device usb: gadget: f_fs: Fix use-after-free in ffs_fs_kill_sb() usb: usbmon: Read text within supplied buffer size USB: usbmon: remove assignment from IS_ERR argument usb: quirks: add control message delay for 1b1c:1b20 USB: storage: Add JMicron bridge 152d:2567 to unusual_devs.h staging: android: ashmem: Fix lockdep issue during llseek staging: comedi: fix comedi_nsamples_left. uas: fix comparison for error code tty/serial: atmel: add new version check for usart serial: sh-sci: prevent lockup on full TTY buffers x86: Treat R_X86_64_PLT32 as R_X86_64_PC32 x86/module: Detect and skip invalid relocations Revert "ARM: dts: LogicPD Torpedo: Fix I2C1 pinmux" NFS: Fix an incorrect type in struct nfs_direct_req scsi: qla2xxx: Replace fcport alloc with qla2x00_alloc_fcport ubi: Fix race condition between ubi volume creation and udev ext4: inplace xattr block update fails to deduplicate blocks netfilter: x_tables: pack percpu counter allocations netfilter: x_tables: pass xt_counters struct to counter allocator netfilter: x_tables: pass xt_counters struct instead of packet counter netfilter: use skb_to_full_sk in ip_route_me_harder netfilter: ipv6: fix use-after-free Write in nf_nat_ipv6_manip_pkt netfilter: bridge: ebt_among: add missing match size checks netfilter: ebtables: CONFIG_COMPAT: don't trust userland offsets netfilter: IDLETIMER: be syzkaller friendly netfilter: nat: cope with negative port range netfilter: x_tables: fix missing timer initialization in xt_LED netfilter: add back stackpointer size checks tc358743: fix register i2c_rd/wr function fix Input: tca8418_keypad - remove double read of key event register ARM: omap2: hide omap3_save_secure_ram on non-OMAP3 builds netfilter: nfnetlink_queue: fix timestamp attribute watchdog: hpwdt: fix unused variable warning watchdog: hpwdt: Check source of NMI watchdog: hpwdt: SMBIOS check nospec: Include <asm/barrier.h> dependency ALSA: hda: add dock and led support for HP ProBook 640 G2 ALSA: hda: add dock and led support for HP EliteBook 820 G3 ALSA: seq: More protection for concurrent write and ioctl races ALSA: seq: Don't allow resizing pool in use ALSA: hda/realtek - Fix dock line-out volume on Dell Precision 7520 x86/MCE: Serialize sysfs changes bcache: don't attach backing with duplicate UUID kbuild: Handle builtin dtb file names containing hyphens loop: Fix lost writes caused by missing flag Input: matrix_keypad - fix race when disabling interrupts MIPS: OCTEON: irq: Check for null return on kzalloc allocation MIPS: ath25: Check for kzalloc allocation failure MIPS: BMIPS: Do not mask IPIs during suspend drm/amdgpu: fix KV harvesting drm/radeon: fix KV harvesting drm/amdgpu: Notify sbios device ready before send request drm/amdgpu: Fix deadlock on runtime suspend drm/radeon: Fix deadlock on runtime suspend drm/nouveau: Fix deadlock on runtime suspend drm: Allow determining if current task is output poll worker workqueue: Allow retrieval of current task's work struct scsi: qla2xxx: Fix NULL pointer crash due to active timer for ABTS RDMA/mlx5: Fix integer overflow while resizing CQ RDMA/ucma: Check that user doesn't overflow QP state RDMA/ucma: Limit possible option size ANDROID: ranchu: 32 bit framebuffer support ANDROID: Address checkpatch warnings in goldfishfb ANDROID: Address checkpatch.pl warnings in goldfish_pipe ANDROID: sdcardfs: fix lock issue on 32 bit/SMP architectures ANDROID: goldfish: Fix typo in goldfish_cmd_locked() call ANDROID: Address checkpatch.pl warnings in goldfish_pipe_v2 FROMLIST: f2fs: don't put dentry page in pagecache into highmem Linux 4.4.121 btrfs: preserve i_mode if __btrfs_set_acl() fails bpf, x64: implement retpoline for tail call dm io: fix duplicate bio completion due to missing ref count mpls, nospec: Sanitize array index in mpls_label_ok() net: mpls: Pull common label check into helper sctp: verify size of a new chunk in _sctp_make_chunk() s390/qeth: fix IPA command submission race s390/qeth: fix SETIP command handling sctp: fix dst refcnt leak in sctp_v6_get_dst() sctp: fix dst refcnt leak in sctp_v4_get_dst udplite: fix partial checksum initialization ppp: prevent unregistered channels from connecting to PPP units netlink: ensure to loop over all netns in genlmsg_multicast_allns() net: ipv4: don't allow setting net.ipv4.route.min_pmtu below 68 net: fix race on decreasing number of TX queues ipv6 sit: work around bogus gcc-8 -Wrestrict warning hdlc_ppp: carrier detect ok, don't turn off negotiation fib_semantics: Don't match route with mismatching tclassid bridge: check brport attr show in brport_show Revert "led: core: Fix brightness setting when setting delay_off=0" x86/spectre: Fix an error message leds: do not overflow sysfs buffer in led_trigger_show x86/apic/vector: Handle legacy irq data correctly ARM: dts: LogicPD Torpedo: Fix I2C1 pinmux btrfs: Don't clear SGID when inheriting ACLs x86/syscall: Sanitize syscall table de-references under speculation fix KVM: mmu: Fix overlap between public and private memslots ARM: mvebu: Fix broken PL310_ERRATA_753970 selects nospec: Allow index argument to have const-qualified type media: m88ds3103: don't call a non-initalized function cpufreq: s3c24xx: Fix broken s3c_cpufreq_init() ALSA: hda: Add a power_save blacklist ALSA: usb-audio: Add a quirck for B&W PX headphones tpm_i2c_nuvoton: fix potential buffer overruns caused by bit glitches on the bus tpm_i2c_infineon: fix potential buffer overruns caused by bit glitches on the bus tpm: st33zp24: fix potential buffer overruns caused by bit glitches on the bus ANDROID: Delete the goldfish_nand driver. ANDROID: Add input support for Android Wear. ANDROID: proc: fix config & includes for /proc/uid FROMLIST: ARM: amba: Don't read past the end of sysfs "driver_override" buffer UPSTREAM: ANDROID: binder: remove WARN() for redundant txn error ANDROID: cpufreq: times: Add missing includes ANDROID: cpufreq: Add time_in_state to /proc/uid directories ANDROID: proc: Add /proc/uid directory ANDROID: cpufreq: times: track per-uid time in state ANDROID: cpufreq: track per-task time in state Conflicts: drivers/gpu/drm/msm/msm_gem.c drivers/net/wireless/ath/regd.c kernel/sched/core.c Change-Id: I9bb7b5a062415da6925a5a56a34e6eb066a53320 Signed-off-by: Srinivasarao P <spathi@codeaurora.org>
| * writeback: fix the wrong congested state variable definitionKaixu Xia2018-04-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | commit c877ef8ae7b8edaedccad0fc8c23d4d1de7e2480 upstream. The right variable definition should be wb_congested_state that include WB_async_congested and WB_sync_congested. So fix it. Signed-off-by: Kaixu Xia <xiakaixu@huawei.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@fb.com> Cc: Nathan Chancellor <natechancellor@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* | block: Dynamically allocate and refcount backing_dev_infoJan Kara2017-10-18
|/ | | | | | | | | | | | | | | | Instead of storing backing_dev_info inside struct request_queue, allocate it dynamically, reference count it, and free it when the last reference is dropped. Currently only request_queue holds the reference but in the following patch we add other users referencing backing_dev_info. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@fb.com> Change-Id: Ibcee7b4c014018f9243cd3edbfd9c4a8877c3862 Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git Git-commit: d03f6cdc1fc422accb734c7c07a661a0018d8631 [riteshh@codeaurora.org: resolved merge conflicts] Signed-off-by: Ritesh Harjani <riteshh@codeaurora.org>
* block: fix double-free in the failure path of cgwb_bdi_init()Tejun Heo2017-02-26
| | | | | | | | | | | | | | | | | | | | | | | | | commit 5f478e4ea5c5560b4e40eb136991a09f9389f331 upstream. When !CONFIG_CGROUP_WRITEBACK, bdi has single bdi_writeback_congested at bdi->wb_congested. cgwb_bdi_init() allocates it with kzalloc() and doesn't do further initialization. This usually works fine as the reference count gets bumped to 1 by wb_init() and the put from wb_exit() releases it. However, when wb_init() fails, it puts the wb base ref automatically freeing the wb and the explicit kfree() in cgwb_bdi_init() error path ends up trying to free the same pointer the second time causing a double-free. Fix it by explicitly initilizing the refcnt to 1 and putting the base ref from cgwb_bdi_destroy(). Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Dmitry Vyukov <dvyukov@google.com> Fixes: a13f35e87140 ("writeback: don't embed root bdi_writeback_congested in bdi_writeback") Signed-off-by: Jens Axboe <axboe@fb.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* block: fix bdi vs gendisk lifetime mismatchDan Williams2016-08-20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit df08c32ce3be5be138c1dbfcba203314a3a7cd6f upstream. The name for a bdi of a gendisk is derived from the gendisk's devt. However, since the gendisk is destroyed before the bdi it leaves a window where a new gendisk could dynamically reuse the same devt while a bdi with the same name is still live. Arrange for the bdi to hold a reference against its "owner" disk device while it is registered. Otherwise we can hit sysfs duplicate name collisions like the following: WARNING: CPU: 10 PID: 2078 at fs/sysfs/dir.c:31 sysfs_warn_dup+0x64/0x80 sysfs: cannot create duplicate filename '/devices/virtual/bdi/259:1' Hardware name: HP ProLiant DL580 Gen8, BIOS P79 05/06/2015 0000000000000286 0000000002c04ad5 ffff88006f24f970 ffffffff8134caec ffff88006f24f9c0 0000000000000000 ffff88006f24f9b0 ffffffff8108c351 0000001f0000000c ffff88105d236000 ffff88105d1031e0 ffff8800357427f8 Call Trace: [<ffffffff8134caec>] dump_stack+0x63/0x87 [<ffffffff8108c351>] __warn+0xd1/0xf0 [<ffffffff8108c3cf>] warn_slowpath_fmt+0x5f/0x80 [<ffffffff812a0d34>] sysfs_warn_dup+0x64/0x80 [<ffffffff812a0e1e>] sysfs_create_dir_ns+0x7e/0x90 [<ffffffff8134faaa>] kobject_add_internal+0xaa/0x320 [<ffffffff81358d4e>] ? vsnprintf+0x34e/0x4d0 [<ffffffff8134ff55>] kobject_add+0x75/0xd0 [<ffffffff816e66b2>] ? mutex_lock+0x12/0x2f [<ffffffff8148b0a5>] device_add+0x125/0x610 [<ffffffff8148b788>] device_create_groups_vargs+0xd8/0x100 [<ffffffff8148b7cc>] device_create_vargs+0x1c/0x20 [<ffffffff811b775c>] bdi_register+0x8c/0x180 [<ffffffff811b7877>] bdi_register_dev+0x27/0x30 [<ffffffff813317f5>] add_disk+0x175/0x4a0 Reported-by: Yi Zhang <yizhan@redhat.com> Tested-by: Yi Zhang <yizhan@redhat.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Fixed up missing 0 return in bdi_register_owner(). Signed-off-by: Jens Axboe <axboe@fb.com>
* mm, vmstat: fix wrong WQ sleep when memory reclaim doesn't make any progressTetsuo Handa2016-02-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 564e81a57f9788b1475127012e0fd44e9049e342 upstream. Jan Stancek has reported that system occasionally hanging after "oom01" testcase from LTP triggers OOM. Guessing from a result that there is a kworker thread doing memory allocation and the values between "Node 0 Normal free:" and "Node 0 Normal:" differs when hanging, vmstat is not up-to-date for some reason. According to commit 373ccbe59270 ("mm, vmstat: allow WQ concurrency to discover memory reclaim doesn't make any progress"), it meant to force the kworker thread to take a short sleep, but it by error used schedule_timeout(1). We missed that schedule_timeout() in state TASK_RUNNING doesn't do anything. Fix it by using schedule_timeout_uninterruptible(1) which forces the kworker thread to take a short sleep in order to make sure that vmstat is up-to-date. Fixes: 373ccbe59270 ("mm, vmstat: allow WQ concurrency to discover memory reclaim doesn't make any progress") Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Reported-by: Jan Stancek <jstancek@redhat.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Tejun Heo <tj@kernel.org> Cc: Cristopher Lameter <clameter@sgi.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Arkadiusz Miskiewicz <arekm@maven.pl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* mm, vmstat: allow WQ concurrency to discover memory reclaim doesn't make any ↵Michal Hocko2015-12-12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | progress Tetsuo Handa has reported that the system might basically livelock in OOM condition without triggering the OOM killer. The issue is caused by internal dependency of the direct reclaim on vmstat counter updates (via zone_reclaimable) which are performed from the workqueue context. If all the current workers get assigned to an allocation request, though, they will be looping inside the allocator trying to reclaim memory but zone_reclaimable can see stalled numbers so it will consider a zone reclaimable even though it has been scanned way too much. WQ concurrency logic will not consider this situation as a congested workqueue because it relies that worker would have to sleep in such a situation. This also means that it doesn't try to spawn new workers or invoke the rescuer thread if the one is assigned to the queue. In order to fix this issue we need to do two things. First we have to let wq concurrency code know that we are in trouble so we have to do a short sleep. In order to prevent from issues handled by 0e093d99763e ("writeback: do not sleep on the congestion queue if there are no congested BDIs or if significant congestion is not being encountered in the current zone") we limit the sleep only to worker threads which are the ones of the interest anyway. The second thing to do is to create a dedicated workqueue for vmstat and mark it WQ_MEM_RECLAIM to note it participates in the reclaim and to have a spare worker thread for it. Signed-off-by: Michal Hocko <mhocko@suse.com> Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Tejun Heo <tj@kernel.org> Cc: Cristopher Lameter <clameter@sgi.com> Cc: Joonsoo Kim <js1304@gmail.com> Cc: Arkadiusz Miskiewicz <arekm@maven.pl> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm, page_alloc: distinguish between being unable to sleep, unwilling to ↵Mel Gorman2015-11-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | sleep and avoiding waking kswapd __GFP_WAIT has been used to identify atomic context in callers that hold spinlocks or are in interrupts. They are expected to be high priority and have access one of two watermarks lower than "min" which can be referred to as the "atomic reserve". __GFP_HIGH users get access to the first lower watermark and can be called the "high priority reserve". Over time, callers had a requirement to not block when fallback options were available. Some have abused __GFP_WAIT leading to a situation where an optimisitic allocation with a fallback option can access atomic reserves. This patch uses __GFP_ATOMIC to identify callers that are truely atomic, cannot sleep and have no alternative. High priority users continue to use __GFP_HIGH. __GFP_DIRECT_RECLAIM identifies callers that can sleep and are willing to enter direct reclaim. __GFP_KSWAPD_RECLAIM to identify callers that want to wake kswapd for background reclaim. __GFP_WAIT is redefined as a caller that is willing to enter direct reclaim and wake kswapd for background reclaim. This patch then converts a number of sites o __GFP_ATOMIC is used by callers that are high priority and have memory pools for those requests. GFP_ATOMIC uses this flag. o Callers that have a limited mempool to guarantee forward progress clear __GFP_DIRECT_RECLAIM but keep __GFP_KSWAPD_RECLAIM. bio allocations fall into this category where kswapd will still be woken but atomic reserves are not used as there is a one-entry mempool to guarantee progress. o Callers that are checking if they are non-blocking should use the helper gfpflags_allow_blocking() where possible. This is because checking for __GFP_WAIT as was done historically now can trigger false positives. Some exceptions like dm-crypt.c exist where the code intent is clearer if __GFP_DIRECT_RECLAIM is used instead of the helper due to flag manipulations. o Callers that built their own GFP flags instead of starting with GFP_KERNEL and friends now also need to specify __GFP_KSWAPD_RECLAIM. The first key hazard to watch out for is callers that removed __GFP_WAIT and was depending on access to atomic reserves for inconspicuous reasons. In some cases it may be appropriate for them to use __GFP_HIGH. The second key hazard is callers that assembled their own combination of GFP flags instead of starting with something like GFP_KERNEL. They may now wish to specify __GFP_KSWAPD_RECLAIM. It's almost certainly harmless if it's missed in most cases as other activity will wake kswapd. Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Acked-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Cc: Vitaly Wool <vitalywool@gmail.com> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* writeback: remove broken rbtree_postorder_for_each_entry_safe() usage in ↵Tejun Heo2015-10-21
| | | | | | | | | | | | | | | | cgwb_bdi_destroy() a20135ffbc44 ("writeback: don't drain bdi_writeback_congested on bdi destruction") added rbtree_postorder_for_each_entry_safe() which is used to remove all entries; however, according to Cody, the iterator isn't safe against operations which may rebalance the tree. Fix it by switching to repeatedly removing rb_first() until empty. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Cody P Schafer <dev@codyps.com> Fixes: a20135ffbc44 ("writeback: don't drain bdi_writeback_congested on bdi destruction") Link: http://lkml.kernel.org/g/1443997973-1700-1-git-send-email-dev@codyps.com Signed-off-by: Jens Axboe <axboe@fb.com>
* block: don't release bdi while request_queue has live referencesTejun Heo2015-10-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | bdi's are initialized in two steps, bdi_init() and bdi_register(), but destroyed in a single step by bdi_destroy() which, for a bdi embedded in a request_queue, is called during blk_cleanup_queue() which makes the queue invisible and starts the draining of remaining usages. A request_queue's user can access the congestion state of the embedded bdi as long as it holds a reference to the queue. As such, it may access the congested state of a queue which finished blk_cleanup_queue() but hasn't reached blk_release_queue() yet. Because the congested state was embedded in backing_dev_info which in turn is embedded in request_queue, accessing the congested state after bdi_destroy() was called was fine. The bdi was destroyed but the memory region for the congested state remained accessible till the queue got released. a13f35e87140 ("writeback: don't embed root bdi_writeback_congested in bdi_writeback") changed the situation. Now, the root congested state which is expected to be pinned while request_queue remains accessible is separately reference counted and the base ref is put during bdi_destroy(). This means that the root congested state may go away prematurely while the queue is between bdi_dstroy() and blk_cleanup_queue(), which was detected by Andrey's KASAN tests. The root cause of this problem is that bdi doesn't distinguish the two steps of destruction, unregistration and release, and now the root congested state actually requires a separate release step. To fix the issue, this patch separates out bdi_unregister() and bdi_exit() from bdi_destroy(). bdi_unregister() is called from blk_cleanup_queue() and bdi_exit() from blk_release_queue(). bdi_destroy() is now just a simple wrapper calling the two steps back-to-back. While at it, the prototype of bdi_destroy() is moved right below bdi_setup_and_register() so that the counterpart operations are located together. Signed-off-by: Tejun Heo <tj@kernel.org> Fixes: a13f35e87140 ("writeback: don't embed root bdi_writeback_congested in bdi_writeback") Cc: stable@vger.kernel.org # v4.2+ Reported-and-tested-by: Andrey Konovalov <andreyknvl@google.com> Link: http://lkml.kernel.org/g/CAAeHK+zUJ74Zn17=rOyxacHU18SgCfC6bsYW=6kCY5GXJBwGfQ@mail.gmail.com Reviewed-by: Jan Kara <jack@suse.com> Reviewed-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Jens Axboe <axboe@fb.com>
* writeback: bdi_writeback iteration must not skip dying onesTejun Heo2015-10-12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | bdi_for_each_wb() is used in several places to wake up or issue writeback work items to all wb's (bdi_writeback's) on a given bdi. The iteration is performed by walking bdi->cgwb_tree; however, the tree only indexes wb's which are currently active. For example, when a memcg gets associated with a different blkcg, the old wb is removed from the tree so that the new one can be indexed. The old wb starts dying from then on but will linger till all its inodes are drained. As these dying wb's may still host dirty inodes, writeback operations which affect all wb's must include them. bdi_for_each_wb() skipping dying wb's led to sync(2) missing and failing to sync the inodes belonging to those wb's. This patch adds a RCU protected @bdi->wb_list which lists all wb's beloinging to that bdi. wb's are added on creation and removed on release rather than on the start of destruction. bdi_for_each_wb() usages are replaced with list_for_each[_continue]_rcu() iterations over @bdi->wb_list and bdi_for_each_wb() and its helpers are removed. v2: Updated as per Jan. last_wb ref leak in bdi_split_work_to_wbs() fixed and unnecessary list head severing in cgwb_bdi_destroy() removed. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-and-tested-by: Artem Bityutskiy <dedekind1@gmail.com> Fixes: ebe41ab0c79d ("writeback: implement bdi_for_each_wb()") Link: http://lkml.kernel.org/g/1443012552.19983.209.camel@gmail.com Cc: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@fb.com>
* Merge branch 'for-4.3/blkcg' of git://git.kernel.dk/linux-blockLinus Torvalds2015-09-10
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull blk-cg updates from Jens Axboe: "A bit later in the cycle, but this has been in the block tree for a a while. This is basically four patchsets from Tejun, that improve our buffered cgroup writeback. It was dependent on the other cgroup changes, but they went in earlier in this cycle. Series 1 is set of 5 patches that has cgroup writeback updates: - bdi_writeback iteration fix which could lead to some wb's being skipped or repeated during e.g. sync under memory pressure. - Simplification of wb work wait mechanism. - Writeback tracepoints updated to report cgroup. Series 2 is is a set of updates for the CFQ cgroup writeback handling: cfq has always charged all async IOs to the root cgroup. It didn't have much choice as writeback didn't know about cgroups and there was no way to tell who to blame for a given writeback IO. writeback finally grew support for cgroups and now tags each writeback IO with the appropriate cgroup to charge it against. This patchset updates cfq so that it follows the blkcg each bio is tagged with. Async cfq_queues are now shared across cfq_group, which is per-cgroup, instead of per-request_queue cfq_data. This makes all IOs follow the weight based IO resource distribution implemented by cfq. - Switched from GFP_ATOMIC to GFP_NOWAIT as suggested by Jeff. - Other misc review points addressed, acks added and rebased. Series 3 is the blkcg policy cleanup patches: This patchset contains assorted cleanups for blkcg_policy methods and blk[c]g_policy_data handling. - alloc/free added for blkg_policy_data. exit dropped. - alloc/free added for blkcg_policy_data. - blk-throttle's async percpu allocation is replaced with direct allocation. - all methods now take blk[c]g_policy_data instead of blkcg_gq or blkcg. And finally, series 4 is a set of patches cleaning up the blkcg stats handling: blkcg's stats have always been somwhat of a mess. This patchset tries to improve the situation a bit. - The following patches added to consolidate blkcg entry point and blkg creation. This is in itself is an improvement and helps colllecting common stats on bio issue. - per-blkg stats now accounted on bio issue rather than request completion so that bio based and request based drivers can behave the same way. The issue was spotted by Vivek. - cfq-iosched implements custom recursive stats and blk-throttle implements custom per-cpu stats. This patchset make blkcg core support both by default. - cfq-iosched and blk-throttle keep track of the same stats multiple times. Unify them" * 'for-4.3/blkcg' of git://git.kernel.dk/linux-block: (45 commits) blkcg: use CGROUP_WEIGHT_* scale for io.weight on the unified hierarchy blkcg: s/CFQ_WEIGHT_*/CFQ_WEIGHT_LEGACY_*/ blkcg: implement interface for the unified hierarchy blkcg: misc preparations for unified hierarchy interface blkcg: separate out tg_conf_updated() from tg_set_conf() blkcg: move body parsing from blkg_conf_prep() to its callers blkcg: mark existing cftypes as legacy blkcg: rename subsystem name from blkio to io blkcg: refine error codes returned during blkcg configuration blkcg: remove unnecessary NULL checks from __cfqg_set_weight_device() blkcg: reduce stack usage of blkg_rwstat_recursive_sum() blkcg: remove cfqg_stats->sectors blkcg: move io_service_bytes and io_serviced stats into blkcg_gq blkcg: make blkg_[rw]stat_recursive_sum() to be able to index into blkcg_gq blkcg: make blkcg_[rw]stat per-cpu blkcg: add blkg_[rw]stat->aux_cnt and replace cfq_group->dead_stats with it blkcg: consolidate blkg creation in blkcg_bio_issue_check() blk-throttle: improve queue bypass handling blkcg: move root blkg lookup optimization from throtl_lookup_tg() to __blkg_lookup() blkcg: inline [__]blkg_lookup() ...
| * blkcg: rename subsystem name from blkio to ioTejun Heo2015-08-18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | blkio interface has become messy over time and is currently the largest. In addition to the inconsistent naming scheme, it has multiple stat files which report more or less the same thing, a number of debug stat files which expose internal details which shouldn't have been part of the public interface in the first place, recursive and non-recursive stats and leaf and non-leaf knobs. Both recursive vs. non-recursive and leaf vs. non-leaf distinctions don't make any sense on the unified hierarchy as only leaf cgroups can contain processes. cgroups is going through a major interface revision with the unified hierarchy involving significant fundamental usage changes and given that a significant portion of the interface doesn't make sense anymore, it's a good time to reorganize the interface. As the first step, this patch renames the external visible subsystem name from "blkio" to "io". This is more concise, matches the other two major subsystem names, "cpu" and "memory", and better suited as blkcg will be involved in anything writeback related too whether an actual block device is involved or not. As the subsystem legacy_name is set to "blkio", the only userland visible change outside the unified hierarchy is that blkcg is reported as "io" instead of "blkio" in the subsystem initialized message during boot. On the unified hierarchy, blkcg now appears as "io". Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Li Zefan <lizefan@huawei.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: cgroups@vger.kernel.org Signed-off-by: Jens Axboe <axboe@fb.com>
* | inode: rename i_wb_list to i_io_listDave Chinner2015-08-17
|/ | | | | | | | | | | | | | | | | | | There's a small consistency problem between the inode and writeback naming. Writeback calls the "for IO" inode queues b_io and b_more_io, but the inode calls these the "writeback list" or i_wb_list. This makes it hard to an new "under writeback" list to the inode, or call it an "under IO" list on the bdi because either way we'll have writeback on IO and IO on writeback and it'll just be confusing. I'm getting confused just writing this! So, rename the inode "for IO" list variable to i_io_list so we can add a new "writeback list" in a subsequent patch. Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Josef Bacik <jbacik@fb.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Christoph Hellwig <hch@lst.de> Tested-by: Dave Chinner <dchinner@redhat.com>
* writeback: don't drain bdi_writeback_congested on bdi destructionTejun Heo2015-07-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | 52ebea749aae ("writeback: make backing_dev_info host cgroup-specific bdi_writebacks") made bdi (backing_dev_info) host per-cgroup wb's (bdi_writeback's). As the congested state needs to be per-wb and referenced from blkcg side and multiple wbs, the patch made all non-root cong's (bdi_writeback_congested's) reference counted and indexed on bdi. When a bdi is destroyed, cgwb_bdi_destroy() tries to drain all non-root cong's; however, this can hang indefinitely because wb's can also be referenced from blkcg_gq's which are destroyed after bdi destruction is complete. This patch fixes the bug by updating bdi destruction to not wait for cong's to drain. A cong is unlinked from bdi->cgwb_congested_tree on bdi destuction regardless of its reference count as the bdi may go away any point after destruction. wb_congested_put() checks whether the cong is already unlinked on release. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Jon Christopherson <jon@jons.org> Link: https://bugzilla.kernel.org/show_bug.cgi?id=100681 Fixes: 52ebea749aae ("writeback: make backing_dev_info host cgroup-specific bdi_writebacks") Tested-by: Jon Christopherson <jon@jons.org> Signed-off-by: Jens Axboe <axboe@fb.com>
* writeback: don't embed root bdi_writeback_congested in bdi_writebackTejun Heo2015-07-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 52ebea749aae ("writeback: make backing_dev_info host cgroup-specific bdi_writebacks") made bdi (backing_dev_info) host per-cgroup wb's (bdi_writeback's). As the congested state needs to be per-wb and referenced from blkcg side and multiple wbs, the patch made all non-root cong's (bdi_writeback_congested's) reference counted and indexed on bdi. When a bdi is destroyed, cgwb_bdi_destroy() tries to drain all non-root cong's; however, this can hang indefinitely because wb's can also be referenced from blkcg_gq's which are destroyed after bdi destruction is complete. To fix the bug, bdi destruction will be updated to not wait for cong's to drain, which naturally means that cong's may outlive the associated bdi. This is fine for non-root cong's but is problematic for the root cong's which are embedded in their bdi's as they may end up getting dereferenced after the containing bdi's are freed. This patch makes root cong's behave the same as non-root cong's. They are no longer embedded in their bdi's but allocated separately during bdi initialization, indexed and reference counted the same way. * As cong handling is the same for all wb's, wb->congested initialization is moved into wb_init(). * When !CONFIG_CGROUP_WRITEBACK, there was no indexing or refcnting. bdi->wb_congested is now a pointer pointing to the root cong allocated during bdi init and minimal refcnting operations are implemented. * The above makes root wb init paths diverge depending on CONFIG_CGROUP_WRITEBACK. root wb init is moved to cgwb_bdi_init(). This patch in itself shouldn't cause any consequential behavior differences but prepares for the actual fix. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Jon Christopherson <jon@jons.org> Link: https://bugzilla.kernel.org/show_bug.cgi?id=100681 Tested-by: Jon Christopherson <jon@jons.org> Added <linux/slab.h> include to backing-dev.h for kfree() definition. Signed-off-by: Jens Axboe <axboe@fb.com>
* Merge branch 'for-4.2/writeback' of git://git.kernel.dk/linux-blockLinus Torvalds2015-06-25
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull cgroup writeback support from Jens Axboe: "This is the big pull request for adding cgroup writeback support. This code has been in development for a long time, and it has been simmering in for-next for a good chunk of this cycle too. This is one of those problems that has been talked about for at least half a decade, finally there's a solution and code to go with it. Also see last weeks writeup on LWN: http://lwn.net/Articles/648292/" * 'for-4.2/writeback' of git://git.kernel.dk/linux-block: (85 commits) writeback, blkio: add documentation for cgroup writeback support vfs, writeback: replace FS_CGROUP_WRITEBACK with SB_I_CGROUPWB writeback: do foreign inode detection iff cgroup writeback is enabled v9fs: fix error handling in v9fs_session_init() bdi: fix wrong error return value in cgwb_create() buffer: remove unusued 'ret' variable writeback: disassociate inodes from dying bdi_writebacks writeback: implement foreign cgroup inode bdi_writeback switching writeback: add lockdep annotation to inode_to_wb() writeback: use unlocked_inode_to_wb transaction in inode_congested() writeback: implement unlocked_inode_to_wb transaction and use it for stat updates writeback: implement [locked_]inode_to_wb_and_lock_list() writeback: implement foreign cgroup inode detection writeback: make writeback_control track the inode being written back writeback: relocate wb[_try]_get(), wb_put(), inode_{attach|detach}_wb() mm: vmscan: disable memcg direct reclaim stalling if cgroup writeback support is in use writeback: implement memcg writeback domain based throttling writeback: reset wb_domain->dirty_limit[_tstmp] when memcg domain size changes writeback: implement memcg wb_domain writeback: update wb_over_bg_thresh() to use wb_domain aware operations ...
| * bdi: fix wrong error return value in cgwb_create()Tejun Heo2015-06-04
| | | | | | | | | | | | | | | | | | On wb_congested_get_create() failure, cgwb_create() forgot to set @ret to -ENOMEM ending up returning 0. Fix it so that it returns -ENOMEM. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Jens Axboe <axboe@fb.com>
| * writeback: relocate wb[_try]_get(), wb_put(), inode_{attach|detach}_wb()Tejun Heo2015-06-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, majority of cgroup writeback support including all the above functions are implemented in include/linux/backing-dev.h and mm/backing-dev.c; however, the portion closely related to writeback logic implemented in include/linux/writeback.h and mm/page-writeback.c will expand to support foreign writeback detection and correction. This patch moves wb[_try]_get() and wb_put() to include/linux/backing-dev-defs.h so that they can be used from writeback.h and inode_{attach|detach}_wb() to writeback.h and page-writeback.c. This is pure reorganization and doesn't introduce any functional changes. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Jan Kara <jack@suse.cz> Cc: Wu Fengguang <fengguang.wu@intel.com> Cc: Greg Thelen <gthelen@google.com> Signed-off-by: Jens Axboe <axboe@fb.com>
| * writeback: implement memcg wb_domainTejun Heo2015-06-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Dirtyable memory is distributed to a wb (bdi_writeback) according to the relative bandwidth the wb is writing out in the whole system. This distribution is global - each wb is measured against all other wb's and gets the proportinately sized portion of the memory in the whole system. For cgroup writeback, the amount of dirtyable memory is scoped by memcg and thus each wb would need to be measured and controlled in its memcg. IOW, a wb will belong to two writeback domains - the global and memcg domains. The previous patches laid the groundwork to support the two wb_domains and this patch implements memcg wb_domain. memcg->cgwb_domain is initialized on css online and destroyed on css release, wb->memcg_completions is added, and __wb_writeout_inc() is updated to increment completions against both global and memcg wb_domains. The following patches will update balance_dirty_pages() and its subroutines to actually consider memcg wb_domain for throttling. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Jan Kara <jack@suse.cz> Cc: Wu Fengguang <fengguang.wu@intel.com> Cc: Greg Thelen <gthelen@google.com> Signed-off-by: Jens Axboe <axboe@fb.com>
| * writeback: clean up wb_dirty_limit()Tejun Heo2015-06-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The function name wb_dirty_limit(), its argument @dirty and the local variable @wb_dirty are mortally confusing given that the function calculates per-wb threshold value not dirty pages, especially given that @dirty and @wb_dirty are used elsewhere for dirty pages. Let's rename the function to wb_calc_thresh() and wb_dirty to wb_thresh. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Jan Kara <jack@suse.cz> Cc: Wu Fengguang <fengguang.wu@intel.com> Cc: Greg Thelen <gthelen@google.com> Signed-off-by: Jens Axboe <axboe@fb.com>
| * writeback: implement bdi_wait_for_completion()Tejun Heo2015-06-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the completion of a wb_writeback_work can be waited upon by setting its ->done to a struct completion and waiting on it; however, for cgroup writeback support, it's necessary to issue multiple work items to multiple bdi_writebacks and wait for the completion of all. This patch implements wb_completion which can wait for multiple work items and replaces the struct completion with it. It can be defined using DEFINE_WB_COMPLETION_ONSTACK(), used for multiple work items and waited for by wb_wait_for_completion(). Nobody currently issues multiple work items and this patch doesn't introduce any behavior changes. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@fb.com>
| * writeback: make bdi_has_dirty_io() take multiple bdi_writeback's into accountTejun Heo2015-06-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | bdi_has_dirty_io() used to only reflect whether the root wb (bdi_writeback) has dirty inodes. For cgroup writeback support, it needs to take all active wb's into account. If any wb on the bdi has dirty inodes, bdi_has_dirty_io() should return true. To achieve that, as inode_wb_list_{move|del}_locked() now keep track of the dirty state transition of each wb, the number of dirty wbs can be counted in the bdi; however, bdi is already aggregating wb->avg_write_bandwidth which can easily be guaranteed to be > 0 when there are any dirty inodes by ensuring wb->avg_write_bandwidth can't dip below 1. bdi_has_dirty_io() can simply test whether bdi->tot_write_bandwidth is zero or not. While this bumps the value of wb->avg_write_bandwidth to one when it used to be zero, this shouldn't cause any meaningful behavior difference. bdi_has_dirty_io() is made an inline function which tests whether ->tot_write_bandwidth is non-zero. Also, WARN_ON_ONCE()'s on its value are added to inode_wb_list_{move|del}_locked(). Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@fb.com>
| * writeback: implement WB_has_dirty_io wb_state flagTejun Heo2015-06-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, wb_has_dirty_io() determines whether a wb (bdi_writeback) has any dirty inode by testing all three IO lists on each invocation without actively keeping track. For cgroup writeback support, a single bdi will host multiple wb's each of which will host dirty inodes separately and we'll need to make bdi_has_dirty_io(), which currently only represents the root wb, aggregate has_dirty_io from all member wb's, which requires tracking transitions in has_dirty_io state on each wb. This patch introduces inode_wb_list_{move|del}_locked() to consolidate IO list operations leaving queue_io() the only other function which directly manipulates IO lists (via move_expired_inodes()). All three functions are updated to call wb_io_lists_[de]populated() which keep track of whether the wb has dirty inodes or not and record it using the new WB_has_dirty_io flag. inode_wb_list_moved_locked()'s return value indicates whether the wb had no dirty inodes before. mark_inode_dirty() is restructured so that the return value of inode_wb_list_move_locked() can be used for deciding whether to wake up the wb. While at it, change {bdi|wb}_has_dirty_io()'s return values to bool. These functions were returning 0 and 1 before. Also, add a comment explaining the synchronization of wb_state flags. v2: Updated to accommodate b_dirty_time. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@fb.com>
| * writeback: make congestion functions per bdi_writebackTejun Heo2015-06-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, all congestion functions take bdi (backing_dev_info) and always operate on the root wb (bdi->wb) and the congestion state from the block layer is propagated only for the root blkcg. This patch introduces {set|clear}_wb_congested() and wb_congested() which take a bdi_writeback_congested and bdi_writeback respectively. The bdi counteparts are now wrappers invoking the wb based functions on @bdi->wb. While converting clear_bdi_congested() to clear_wb_congested(), the local variable declaration order between @wqh and @bit is swapped for cosmetic reason. This patch just adds the new wb based functions. The following patches will apply them. v2: Updated for bdi_writeback_congested. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Jens Axboe <axboe@fb.com>
| * writeback: make backing_dev_info host cgroup-specific bdi_writebacksTejun Heo2015-06-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For the planned cgroup writeback support, on each bdi (backing_dev_info), each memcg will be served by a separate wb (bdi_writeback). This patch updates bdi so that a bdi can host multiple wbs (bdi_writebacks). On the default hierarchy, blkcg implicitly enables memcg. This allows using memcg's page ownership for attributing writeback IOs, and every memcg - blkcg combination can be served by its own wb by assigning a dedicated wb to each memcg. This means that there may be multiple wb's of a bdi mapped to the same blkcg. As congested state is per blkcg - bdi combination, those wb's should share the same congested state. This is achieved by tracking congested state via bdi_writeback_congested structs which are keyed by blkcg. bdi->wb remains unchanged and will keep serving the root cgroup. cgwb's (cgroup wb's) for non-root cgroups are created on-demand or looked up while dirtying an inode according to the memcg of the page being dirtied or current task. Each cgwb is indexed on bdi->cgwb_tree by its memcg id. Once an inode is associated with its wb, it can be retrieved using inode_to_wb(). Currently, none of the filesystems has FS_CGROUP_WRITEBACK and all pages will keep being associated with bdi->wb. v3: inode_attach_wb() in account_page_dirtied() moved inside mapping_cap_account_dirty() block where it's known to be !NULL. Also, an unnecessary NULL check before kfree() removed. Both detected by the kbuild bot. v2: Updated so that wb association is per inode and wb is per memcg rather than blkcg. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: kbuild test robot <fengguang.wu@intel.com> Cc: Dan Carpenter <dan.carpenter@oracle.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@fb.com>
| * bdi: separate out congested state into a separate structTejun Heo2015-06-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, a wb's (bdi_writeback) congestion state is carried in its ->state field; however, cgroup writeback support will require multiple wb's sharing the same congestion state. This patch separates out congestion state into its own struct - struct bdi_writeback_congested. A new field wb field, wb_congested, points to its associated congested struct. The default wb, bdi->wb, always points to bdi->wb_congested. While this patch adds a layer of indirection, it doesn't introduce any behavior changes. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@fb.com>
| * writeback: add @gfp to wb_init()Tejun Heo2015-06-02
| | | | | | | | | | | | | | | | | | | | | | | | | | wb_init() currently always uses GFP_KERNEL but the planned cgroup writeback support needs using other allocation masks. Add @gfp to wb_init(). This patch doesn't introduce any behavior changes. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Jens Axboe <axboe@fb.com>
| * bdi: make inode_to_bdi() inlineTejun Heo2015-06-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now that bdi definitions are moved to backing-dev-defs.h, backing-dev.h can include blkdev.h and inline inode_to_bdi() without worrying about introducing circular include dependency. The function gets called from hot paths and fairly trivial. This patch makes inode_to_bdi() and sb_is_blkdev_sb() that the function calls inline. blockdev_superblock and noop_backing_dev_info are EXPORT_GPL'd to allow the inline functions to be used from modules. While at it, make sb_is_blkdev_sb() return bool instead of int. v2: Fixed typo in description as suggested by Jan. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Jens Axboe <axboe@kernel.dk> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: Jens Axboe <axboe@fb.com>
| * writeback: reorganize mm/backing-dev.cTejun Heo2015-06-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Move wb_shutdown(), bdi_register(), bdi_register_dev(), bdi_prune_sb(), bdi_remove_from_list() and bdi_unregister() so that init / exit functions are grouped together. This will make updating init / exit paths for cgroup writeback support easier. This is pure source file reorganization. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Jens Axboe <axboe@kernel.dk> Cc: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
| * writeback: move backing_dev_info->wb_lock and ->worklist into bdi_writebackTejun Heo2015-06-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, a bdi (backing_dev_info) embeds single wb (bdi_writeback) and the role of the separation is unclear. For cgroup support for writeback IOs, a bdi will be updated to host multiple wb's where each wb serves writeback IOs of a different cgroup on the bdi. To achieve that, a wb should carry all states necessary for servicing writeback IOs for a cgroup independently. This patch moves bdi->wb_lock and ->worklist into wb. * The lock protects bdi->worklist and bdi->wb.dwork scheduling. While moving, rename it to wb->work_lock as wb->wb_lock is confusing. Also, move wb->dwork downwards so that it's colocated with the new ->work_lock and ->work_list fields. * bdi_writeback_workfn() -> wb_workfn() bdi_wakeup_thread_delayed(bdi) -> wb_wakeup_delayed(wb) bdi_wakeup_thread(bdi) -> wb_wakeup(wb) bdi_queue_work(bdi, ...) -> wb_queue_work(wb, ...) __bdi_start_writeback(bdi, ...) -> __wb_start_writeback(wb, ...) get_next_work_item(bdi) -> get_next_work_item(wb) * bdi_wb_shutdown() is renamed to wb_shutdown() and now takes @wb. The function contained parts which belong to the containing bdi rather than the wb itself - testing cap_writeback_dirty and bdi_remove_from_list() invocation. Those are moved to bdi_unregister(). * bdi_wb_{init|exit}() are renamed to wb_{init|exit}(). Initializations of the moved bdi->wb_lock and ->work_list are relocated from bdi_init() to wb_init(). * As there's still only one bdi_writeback per backing_dev_info, all uses of bdi->state are mechanically replaced with bdi->wb.state introducing no behavior changes. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Jens Axboe <axboe@kernel.dk> Cc: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
| * writeback: move bandwidth related fields from backing_dev_info into ↵Tejun Heo2015-06-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | bdi_writeback Currently, a bdi (backing_dev_info) embeds single wb (bdi_writeback) and the role of the separation is unclear. For cgroup support for writeback IOs, a bdi will be updated to host multiple wb's where each wb serves writeback IOs of a different cgroup on the bdi. To achieve that, a wb should carry all states necessary for servicing writeback IOs for a cgroup independently. This patch moves bandwidth related fields from backing_dev_info into bdi_writeback. * The moved fields are: bw_time_stamp, dirtied_stamp, written_stamp, write_bandwidth, avg_write_bandwidth, dirty_ratelimit, balanced_dirty_ratelimit, completions and dirty_exceeded. * writeback_chunk_size() and over_bground_thresh() now take @wb instead of @bdi. * bdi_writeout_fraction(bdi, ...) -> wb_writeout_fraction(wb, ...) bdi_dirty_limit(bdi, ...) -> wb_dirty_limit(wb, ...) bdi_position_ration(bdi, ...) -> wb_position_ratio(wb, ...) bdi_update_writebandwidth(bdi, ...) -> wb_update_write_bandwidth(wb, ...) [__]bdi_update_bandwidth(bdi, ...) -> [__]wb_update_bandwidth(wb, ...) bdi_{max|min}_pause(bdi, ...) -> wb_{max|min}_pause(wb, ...) bdi_dirty_limits(bdi, ...) -> wb_dirty_limits(wb, ...) * Init/exits of the relocated fields are moved to bdi_wb_init/exit() respectively. Note that explicit zeroing is dropped in the process as wb's are cleared in entirety anyway. * As there's still only one bdi_writeback per backing_dev_info, all uses of bdi->stat[] are mechanically replaced with bdi->wb.stat[] introducing no behavior changes. v2: Typo in description fixed as suggested by Jan. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Jens Axboe <axboe@kernel.dk> Cc: Wu Fengguang <fengguang.wu@intel.com> Cc: Jaegeuk Kim <jaegeuk@kernel.org> Cc: Steven Whitehouse <swhiteho@redhat.com> Signed-off-by: Jens Axboe <axboe@fb.com>
| * writeback: move backing_dev_info->bdi_stat[] into bdi_writebackTejun Heo2015-06-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, a bdi (backing_dev_info) embeds single wb (bdi_writeback) and the role of the separation is unclear. For cgroup support for writeback IOs, a bdi will be updated to host multiple wb's where each wb serves writeback IOs of a different cgroup on the bdi. To achieve that, a wb should carry all states necessary for servicing writeback IOs for a cgroup independently. This patch moves bdi->bdi_stat[] into wb. * enum bdi_stat_item is renamed to wb_stat_item and the prefix of all enums is changed from BDI_ to WB_. * BDI_STAT_BATCH() -> WB_STAT_BATCH() * [__]{add|inc|dec|sum}_wb_stat(bdi, ...) -> [__]{add|inc}_wb_stat(wb, ...) * bdi_stat[_error]() -> wb_stat[_error]() * bdi_writeout_inc() -> wb_writeout_inc() * stat init is moved to bdi_wb_init() and bdi_wb_exit() is added and frees stat. * As there's still only one bdi_writeback per backing_dev_info, all uses of bdi->stat[] are mechanically replaced with bdi->wb.stat[] introducing no behavior changes. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Jens Axboe <axboe@kernel.dk> Cc: Wu Fengguang <fengguang.wu@intel.com> Cc: Miklos Szeredi <miklos@szeredi.hu> Cc: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Jens Axboe <axboe@fb.com>
| * writeback: move backing_dev_info->state into bdi_writebackTejun Heo2015-06-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, a bdi (backing_dev_info) embeds single wb (bdi_writeback) and the role of the separation is unclear. For cgroup support for writeback IOs, a bdi will be updated to host multiple wb's where each wb serves writeback IOs of a different cgroup on the bdi. To achieve that, a wb should carry all states necessary for servicing writeback IOs for a cgroup independently. This patch moves bdi->state into wb. * enum bdi_state is renamed to wb_state and the prefix of all enums is changed from BDI_ to WB_. * Explicit zeroing of bdi->state is removed without adding zeoring of wb->state as the whole data structure is zeroed on init anyway. * As there's still only one bdi_writeback per backing_dev_info, all uses of bdi->state are mechanically replaced with bdi->wb.state introducing no behavior changes. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Jens Axboe <axboe@kernel.dk> Cc: Wu Fengguang <fengguang.wu@intel.com> Cc: drbd-dev@lists.linbit.com Cc: Neil Brown <neilb@suse.de> Cc: Alasdair Kergon <agk@redhat.com> Cc: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Jens Axboe <axboe@fb.com>
* | block: discard bdi_unregister() in favour of bdi_destroy()NeilBrown2015-05-28
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | bdi_unregister() now contains very little functionality. It contains a "WARN_ON" if bdi->dev is NULL. This warning is of no real consequence as bdi->dev isn't needed by anything else in the function, and it triggers if blk_cleanup_queue() -> bdi_destroy() is called before bdi_unregister, which happens since Commit: 6cd18e711dd8 ("block: destroy bdi before blockdev is unregistered.") So this isn't wanted. It also calls bdi_set_min_ratio(). This needs to be called after writes through the bdi have all been flushed, and before the bdi is destroyed. Calling it early is better than calling it late as it frees up a global resource. Calling it immediately after bdi_wb_shutdown() in bdi_destroy() perfectly fits these requirements. So bdi_unregister() can be discarded with the important content moved to bdi_destroy(), as can the writeback_bdi_unregister event which is already not used. Reported-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org (v4.0) Fixes: c4db59d31e39 ("fs: don't reassign dirty inodes to default_backing_dev_info") Fixes: 6cd18e711dd8 ("block: destroy bdi before blockdev is unregistered.") Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Dan Williams <dan.j.williams@intel.com> Tested-by: Nicholas Moulin <nicholas.w.moulin@linux.intel.com> Signed-off-by: NeilBrown <neilb@suse.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
* Merge branch 'lazytime' of ↵Linus Torvalds2015-02-17
|\ | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull lazytime mount option support from Al Viro: "Lazytime stuff from tytso" * 'lazytime' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: ext4: add optimization for the lazytime mount option vfs: add find_inode_nowait() function vfs: add support for a lazytime mount option
| * vfs: add support for a lazytime mount optionTheodore Ts'o2015-02-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a new mount option which enables a new "lazytime" mode. This mode causes atime, mtime, and ctime updates to only be made to the in-memory version of the inode. The on-disk times will only get updated when (a) if the inode needs to be updated for some non-time related change, (b) if userspace calls fsync(), syncfs() or sync(), or (c) just before an undeleted inode is evicted from memory. This is OK according to POSIX because there are no guarantees after a crash unless userspace explicitly requests via a fsync(2) call. For workloads which feature a large number of random write to a preallocated file, the lazytime mount option significantly reduces writes to the inode table. The repeated 4k writes to a single block will result in undesirable stress on flash devices and SMR disk drives. Even on conventional HDD's, the repeated writes to the inode table block will trigger Adjacent Track Interference (ATI) remediation latencies, which very negatively impact long tail latencies --- which is a very big deal for web serving tiers (for example). Google-Bug-Id: 18297052 Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | fs: remove default_backing_dev_infoChristoph Hellwig2015-01-20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now that default_backing_dev_info is not used for writeback purposes we can git rid of it easily: - instead of using it's name for tracing unregistered bdi we just use "unknown" - btrfs and ceph can just assign the default read ahead window themselves like several other filesystems already do. - we can assign noop_backing_dev_info as the default one in alloc_super. All filesystems already either assigned their own or noop_backing_dev_info. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Tejun Heo <tj@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@fb.com>
* | fs: don't reassign dirty inodes to default_backing_dev_infoChristoph Hellwig2015-01-20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If we have dirty inodes we need to call the filesystem for it, even if the device has been removed and the filesystem will error out early. The current code does that by reassining all dirty inodes to the default backing_dev_info when a bdi is unlinked, but that's pretty pointless given that the bdi must always outlive the super block. Instead of stopping writeback at unregister time and moving inodes to the default bdi just keep the current bdi alive until it is destroyed. The containing objects of the bdi ensure this doesn't happen until all writeback has finished by erroring out. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Tejun Heo <tj@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> Killed the redundant WARN_ON(), as noticed by Jan. Signed-off-by: Jens Axboe <axboe@fb.com>
* | fs: remove mapping->backing_dev_infoChristoph Hellwig2015-01-20
| | | | | | | | | | | | | | | | | | | | | | Now that we never use the backing_dev_info pointer in struct address_space we can simply remove it and save 4 to 8 bytes in every inode. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Reviewed-by: Tejun Heo <tj@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@fb.com>
* | fs: introduce f_op->mmap_capabilities for nommu mmap supportChristoph Hellwig2015-01-20
|/ | | | | | | | | | | | | | | | | | | Since "BDI: Provide backing device capability information [try #3]" the backing_dev_info structure also provides flags for the kind of mmap operation available in a nommu environment, which is entirely unrelated to it's original purpose. Introduce a new nommu-only file operation to provide this information to the nommu mmap code instead. Splitting this from the backing_dev_info structure allows to remove lots of backing_dev_info instance that aren't otherwise needed, and entirely gets rid of the concept of providing a backing_dev_info for a character device. It also removes the need for the mtd_inodefs filesystem. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Tejun Heo <tj@kernel.org> Acked-by: Brian Norris <computersforpeace@gmail.com> Signed-off-by: Jens Axboe <axboe@fb.com>
* Merge branch 'for-3.18/core' of git://git.kernel.dk/linux-blockLinus Torvalds2014-10-18
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull core block layer changes from Jens Axboe: "This is the core block IO pull request for 3.18. Apart from the new and improved flush machinery for blk-mq, this is all mostly bug fixes and cleanups. - blk-mq timeout updates and fixes from Christoph. - Removal of REQ_END, also from Christoph. We pass it through the ->queue_rq() hook for blk-mq instead, freeing up one of the request bits. The space was overly tight on 32-bit, so Martin also killed REQ_KERNEL since it's no longer used. - blk integrity updates and fixes from Martin and Gu Zheng. - Update to the flush machinery for blk-mq from Ming Lei. Now we have a per hardware context flush request, which both cleans up the code should scale better for flush intensive workloads on blk-mq. - Improve the error printing, from Rob Elliott. - Backing device improvements and cleanups from Tejun. - Fixup of a misplaced rq_complete() tracepoint from Hannes. - Make blk_get_request() return error pointers, fixing up issues where we NULL deref when a device goes bad or missing. From Joe Lawrence. - Prep work for drastically reducing the memory consumption of dm devices from Junichi Nomura. This allows creating clone bio sets without preallocating a lot of memory. - Fix a blk-mq hang on certain combinations of queue depths and hardware queues from me. - Limit memory consumption for blk-mq devices for crash dump scenarios and drivers that use crazy high depths (certain SCSI shared tag setups). We now just use a single queue and limited depth for that" * 'for-3.18/core' of git://git.kernel.dk/linux-block: (58 commits) block: Remove REQ_KERNEL blk-mq: allocate cpumask on the home node bio-integrity: remove the needless fail handle of bip_slab creating block: include func name in __get_request prints block: make blk_update_request print prefix match ratelimited prefix blk-merge: don't compute bi_phys_segments from bi_vcnt for cloned bio block: fix alignment_offset math that assumes io_min is a power-of-2 blk-mq: Make bt_clear_tag() easier to read blk-mq: fix potential hang if rolling wakeup depth is too high block: add bioset_create_nobvec() block: use bio_clone_fast() in blk_rq_prep_clone() block: misplaced rq_complete tracepoint sd: Honor block layer integrity handling flags block: Replace strnicmp with strncasecmp block: Add T10 Protection Information functions block: Don't merge requests if integrity flags differ block: Integrity checksum flag block: Relocate bio integrity flags block: Add a disk flag to block integrity profile block: Add prefix to block integrity profile flags ...
| * bdi: reimplement bdev_inode_switch_bdi()Tejun Heo2014-09-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A block_device may be attached to different gendisks and thus different bdis over time. bdev_inode_switch_bdi() is used to switch the associated bdi. The function assumes that the inode could be dirty and transfers it between bdis if so. This is a bit nasty in that it reaches into bdi internals. This patch reimplements the function so that it writes out the inode if dirty. This is a lot simpler and can be implemented without exposing bdi internals. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Signed-off-by: Jens Axboe <axboe@fb.com>
| * bdi: explain the dirty list transferring in bdi_destroy()Tejun Heo2014-09-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | bdi_destroy() has code to transfer the remaining dirty inodes to the default_backing_dev_info; however, given the shutdown sequence, it isn't clear how such condition would happen. Also, it isn't a full solution as the transferred inodes stlil point to the bdi which is being destroyed. Operations on those inodes can end up accessing already released fields such as the percpu stat fields. Digging through the history, it seems that the code was added as a quick workaround for a bug report without fully root-causing the issue. We probably want to remove the code in time but for now let's add a comment noting that it is a quick workaround. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@fb.com>