| Commit message (Collapse) | Author | Age |
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
lineage-18.1-caf-msm8998
* common/android-4.4-p:
Linux 4.4.296
xen/netback: don't queue unlimited number of packages
xen/console: harden hvc_xen against event channel storms
xen/netfront: harden netfront against event channel storms
xen/blkfront: harden blkfront against event channel storms
Input: touchscreen - avoid bitwise vs logical OR warning
ARM: 8805/2: remove unneeded naked function usage
net: lan78xx: Avoid unnecessary self assignment
net: systemport: Add global locking for descriptor lifecycle
timekeeping: Really make sure wall_to_monotonic isn't positive
USB: serial: option: add Telit FN990 compositions
PCI/MSI: Clear PCI_MSIX_FLAGS_MASKALL on error
USB: gadget: bRequestType is a bitfield, not a enum
igbvf: fix double free in `igbvf_probe`
soc/tegra: fuse: Fix bitwise vs. logical OR warning
nfsd: fix use-after-free due to delegation race
dm btree remove: fix use after free in rebalance_children()
recordmcount.pl: look for jgnop instruction as well as bcrl on s390
mac80211: send ADDBA requests using the tid/queue of the aggregation session
hwmon: (dell-smm) Fix warning on /proc/i8k creation error
net: netlink: af_netlink: Prevent empty skb by adding a check on len.
i2c: rk3x: Handle a spurious start completion interrupt flag
parisc/agp: Annotate parisc agp init functions with __init
nfc: fix segfault in nfc_genl_dump_devices_done
FROMGIT: USB: gadget: bRequestType is a bitfield, not a enum
Linux 4.4.295
irqchip: nvic: Fix offset for Interrupt Priority Offsets
irqchip/irq-gic-v3-its.c: Force synchronisation when issuing INVALL
iio: accel: kxcjk-1013: Fix possible memory leak in probe and remove
iio: itg3200: Call iio_trigger_notify_done() on error
iio: ltr501: Don't return error code in trigger handler
iio: mma8452: Fix trigger reference couting
iio: stk3310: Don't return error code in interrupt handler
usb: core: config: fix validation of wMaxPacketValue entries
USB: gadget: zero allocate endpoint 0 buffers
USB: gadget: detect too-big endpoint 0 requests
net/qla3xxx: fix an error code in ql_adapter_up()
net, neigh: clear whole pneigh_entry at alloc time
net: fec: only clear interrupt of handling queue in fec_enet_rx_queue()
net: altera: set a couple error code in probe()
net: cdc_ncm: Allow for dwNtbOutMaxSize to be unset or zero
block: fix ioprio_get(IOPRIO_WHO_PGRP) vs setuid(2)
tracefs: Set all files to the same group ownership as the mount option
signalfd: use wake_up_pollfree()
binder: use wake_up_pollfree()
wait: add wake_up_pollfree()
libata: add horkage for ASMedia 1092
can: pch_can: pch_can_rx_normal: fix use after free
tracefs: Have new files inherit the ownership of their parent
ALSA: pcm: oss: Handle missing errors in snd_pcm_oss_change_params*()
ALSA: pcm: oss: Limit the period size to 16MB
ALSA: pcm: oss: Fix negative period/buffer sizes
ALSA: ctl: Fix copy of updated id with element read/write
mm: bdi: initialize bdi_min_ratio when bdi is unregistered
nfc: fix potential NULL pointer deref in nfc_genl_dump_ses_done
can: sja1000: fix use after free in ems_pcmcia_add_card()
HID: check for valid USB device for many HID drivers
HID: wacom: fix problems when device is not a valid USB device
HID: add USB_HID dependancy on some USB HID drivers
HID: add USB_HID dependancy to hid-chicony
HID: add USB_HID dependancy to hid-prodikeys
HID: add hid_is_usb() function to make it simpler for USB detection
HID: introduce hid_is_using_ll_driver
UPSTREAM: USB: gadget: zero allocate endpoint 0 buffers
UPSTREAM: USB: gadget: detect too-big endpoint 0 requests
Linux 4.4.294
serial: pl011: Add ACPI SBSA UART match id
tty: serial: msm_serial: Deactivate RX DMA for polling support
vgacon: Propagate console boot parameters before calling `vc_resize'
parisc: Fix "make install" on newer debian releases
siphash: use _unaligned version by default
net: qlogic: qlcnic: Fix a NULL pointer dereference in qlcnic_83xx_add_rings()
natsemi: xtensa: fix section mismatch warnings
fget: check that the fd still exists after getting a ref to it
fs: add fget_many() and fput_many()
sata_fsl: fix warning in remove_proc_entry when rmmod sata_fsl
sata_fsl: fix UAF in sata_fsl_port_stop when rmmod sata_fsl
kprobes: Limit max data_size of the kretprobe instances
net: ethernet: dec: tulip: de4x5: fix possible array overflows in type3_infoblock()
net: tulip: de4x5: fix the problem that the array 'lp->phy[8]' may be out of bound
scsi: iscsi: Unblock session then wake up error handler
s390/setup: avoid using memblock_enforce_memory_limit
platform/x86: thinkpad_acpi: Fix WWAN device disabled issue after S3 deep
net: return correct error code
hugetlb: take PMD sharing into account when flushing tlb/caches
tty: hvc: replace BUG_ON() with negative return value
xen/netfront: don't trust the backend response data blindly
xen/netfront: disentangle tx_skb_freelist
xen/netfront: don't read data from request on the ring page
xen/netfront: read response from backend only once
xen/blkfront: don't trust the backend response data blindly
xen/blkfront: don't take local copy of a request from the ring page
xen/blkfront: read response from backend only once
xen: sync include/xen/interface/io/ring.h with Xen's newest version
shm: extend forced shm destroy to support objects from several IPC nses
fuse: release pipe buf after last use
fuse: fix page stealing
NFC: add NCI_UNREG flag to eliminate the race
proc/vmcore: fix clearing user buffer by properly using clear_user()
hugetlbfs: flush TLBs correctly after huge_pmd_unshare
tracing: Check pid filtering when creating events
tcp_cubic: fix spurious Hystart ACK train detections for not-cwnd-limited flows
scsi: mpt3sas: Fix kernel panic during drive powercycle test
ARM: socfpga: Fix crash with CONFIG_FORTIRY_SOURCE
NFSv42: Don't fail clone() unless the OP_CLONE operation failed
net: ieee802154: handle iftypes as u32
ASoC: topology: Add missing rwsem around snd_ctl_remove() calls
ARM: dts: BCM5301X: Add interrupt properties to GPIO node
xen: detect uninitialized xenbus in xenbus_init
xen: don't continue xenstore initialization in case of errors
staging: rtl8192e: Fix use after free in _rtl92e_pci_disconnect()
ALSA: ctxfi: Fix out-of-range access
binder: fix test regression due to sender_euid change
usb: hub: Fix locking issues with address0_mutex
usb: hub: Fix usb enumeration issue due to address0 race
USB: serial: option: add Fibocom FM101-GL variants
USB: serial: option: add Telit LE910S1 0x9200 composition
staging: ion: Prevent incorrect reference counting behavour
Change-Id: Iadf9f213915d2a02b27ceb3b2144eac827ade329
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
commit 3c376dfafbf7a8ea0dea212d095ddd83e93280bb upstream.
Initialize min_ratio if it is set during bdi unregistration. This can
prevent problems that may occur a when bdi is removed without resetting
min_ratio.
For example.
1) insert external sdcard
2) set external sdcard's min_ratio 70
3) remove external sdcard without setting min_ratio 0
4) insert external sdcard
5) set external sdcard's min_ratio 70 << error occur(can't set)
Because when an sdcard is removed, the present bdi_min_ratio value will
remain. Currently, the only way to reset bdi_min_ratio is to reboot.
[akpm@linux-foundation.org: tweak comment and coding style]
Link: https://lkml.kernel.org/r/20211021161942.5983-1-mj0123.lee@samsung.com
Signed-off-by: Manjong Lee <mj0123.lee@samsung.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Changheun Lee <nanich.lee@samsung.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: <seunghwan.hyun@samsung.com>
Cc: <sookwan7.kim@samsung.com>
Cc: <yt0928.kim@samsung.com>
Cc: <junho89.kim@samsung.com>
Cc: <jisoo2146.oh@samsung.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|\|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
https://android.googlesource.com/kernel/common into lineage-18.1-caf-msm8998
This brings LA.UM.9.2.r1-02500-SDMxx0.0 up to date with
https://android.googlesource.com/kernel/common/ android-4.4-p at commit:
4fd124d1546d8 Merge 4.4.258 into android-4.4-p
Change-Id: Idbae7489bc1d831a378dd60993f46139e5e28c4c
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
[ Upstream commit 68f23b89067fdf187763e75a56087550624fdbee ]
Without memcg, there is a one-to-one mapping between the bdi and
bdi_writeback structures. In this world, things are fairly
straightforward; the first thing bdi_unregister() does is to shutdown
the bdi_writeback structure (or wb), and part of that writeback ensures
that no other work queued against the wb, and that the wb is fully
drained.
With memcg, however, there is a one-to-many relationship between the bdi
and bdi_writeback structures; that is, there are multiple wb objects
which can all point to a single bdi. There is a refcount which prevents
the bdi object from being released (and hence, unregistered). So in
theory, the bdi_unregister() *should* only get called once its refcount
goes to zero (bdi_put will drop the refcount, and when it is zero,
release_bdi gets called, which calls bdi_unregister).
Unfortunately, del_gendisk() in block/gen_hd.c never got the memo about
the Brave New memcg World, and calls bdi_unregister directly. It does
this without informing the file system, or the memcg code, or anything
else. This causes the root wb associated with the bdi to be
unregistered, but none of the memcg-specific wb's are shutdown. So when
one of these wb's are woken up to do delayed work, they try to
dereference their wb->bdi->dev to fetch the device name, but
unfortunately bdi->dev is now NULL, thanks to the bdi_unregister()
called by del_gendisk(). As a result, *boom*.
Fortunately, it looks like the rest of the writeback path is perfectly
happy with bdi->dev and bdi->owner being NULL, so the simplest fix is to
create a bdi_dev_name() function which can handle bdi->dev being NULL.
This also allows us to bulletproof the writeback tracepoints to prevent
them from dereferencing a NULL pointer and crashing the kernel if one is
tracing with memcg's enabled, and an iSCSI device dies or a USB storage
stick is pulled.
The most common way of triggering this will be hotremoval of a device
while writeback with memcg enabled is going on. It was triggering
several times a day in a heavily loaded production environment.
Google Bug Id: 145475544
Link: https://lore.kernel.org/r/20191227194829.150110-1-tytso@mit.edu
Link: http://lkml.kernel.org/r/20191228005211.163952-1-tytso@mit.edu
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: Chris Mason <clm@fb.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|\|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
* refs/heads/tmp-bd858d7
Linux 4.4.181
ethtool: check the return value of get_regs_len
ipv4: Define __ipv4_neigh_lookup_noref when CONFIG_INET is disabled
fuse: Add FOPEN_STREAM to use stream_open()
fs: stream_open - opener for stream-like files so that read and write can run simultaneously without deadlock
drm/gma500/cdv: Check vbt config bits when detecting lvds panels
genwqe: Prevent an integer overflow in the ioctl
MIPS: pistachio: Build uImage.gz by default
fuse: fallocate: fix return with locked inode
parisc: Use implicit space register selection for loading the coherence index of I/O pdirs
rcu: locking and unlocking need to always be at least barriers
pktgen: do not sleep with the thread lock held.
net: rds: fix memory leak in rds_ib_flush_mr_pool
net/mlx4_en: ethtool, Remove unsupported SFP EEPROM high pages query
neighbor: Call __ipv4_neigh_lookup_noref in neigh_xmit
ethtool: fix potential userspace buffer overflow
media: uvcvideo: Fix uvc_alloc_entity() allocation alignment
usb: gadget: fix request length error for isoc transfer
net: cdc_ncm: GetNtbFormat endian fix
Revert "x86/build: Move _etext to actual end of .text"
userfaultfd: don't pin the user memory in userfaultfd_file_create()
brcmfmac: add subtype check for event handling in data path
brcmfmac: add length checks in scheduled scan result handler
brcmfmac: fix incorrect event channel deduction
brcmfmac: revise handling events in receive path
brcmfmac: screening firmware event packet
brcmfmac: Add length checks on firmware events
bnx2x: disable GSO where gso_size is too big for hardware
net: create skb_gso_validate_mac_len()
binder: replace "%p" with "%pK"
binder: Replace "%p" with "%pK" for stable
CIFS: cifs_read_allocate_pages: don't iterate through whole page array on ENOMEM
kernel/signal.c: trace_signal_deliver when signal_group_exit
memcg: make it work on sparse non-0-node systems
tty: max310x: Fix external crystal register setup
tty: serial: msm_serial: Fix XON/XOFF
drm/nouveau/i2c: Disable i2c bus access after ->fini()
ALSA: hda/realtek - Set default power save node to 0
Btrfs: fix race updating log root item during fsync
scsi: zfcp: fix to prevent port_remove with pure auto scan LUNs (only sdevs)
scsi: zfcp: fix missing zfcp_port reference put on -EBUSY from port_remove
media: smsusb: better handle optional alignment
media: usb: siano: Fix false-positive "uninitialized variable" warning
media: usb: siano: Fix general protection fault in smsusb
USB: rio500: fix memory leak in close after disconnect
USB: rio500: refuse more than one device at a time
USB: Add LPM quirk for Surface Dock GigE adapter
USB: sisusbvga: fix oops in error path of sisusb_probe
USB: Fix slab-out-of-bounds write in usb_get_bos_descriptor
usb: xhci: avoid null pointer deref when bos field is NULL
xhci: Convert xhci_handshake() to use readl_poll_timeout_atomic()
include/linux/bitops.h: sanitize rotate primitives
sparc64: Fix regression in non-hypervisor TLB flush xcall
tipc: fix modprobe tipc failed after switch order of device registration -v2
Revert "tipc: fix modprobe tipc failed after switch order of device registration"
xen/pciback: Don't disable PCI_COMMAND on PCI device reset.
crypto: vmx - ghash: do nosimd fallback manually
net: mvpp2: fix bad MVPP2_TXQ_SCHED_TOKEN_CNTR_REG queue value
bnxt_en: Fix aggregation buffer leak under OOM condition.
tipc: Avoid copying bytes beyond the supplied data
usbnet: fix kernel crash after disconnect
net: stmmac: fix reset gpio free missing
net-gro: fix use-after-free read in napi_gro_frags()
llc: fix skb leak in llc_build_and_send_ui_pkt()
ipv6: Consider sk_bound_dev_if when binding a raw socket to an address
ASoC: davinci-mcasp: Fix clang warning without CONFIG_PM
spi: Fix zero length xfer bug
spi: rspi: Fix sequencer reset during initialization
spi : spi-topcliff-pch: Fix to handle empty DMA buffers
scsi: lpfc: Fix SLI3 commands being issued on SLI4 devices
media: saa7146: avoid high stack usage with clang
media: go7007: avoid clang frame overflow warning with KASAN
media: m88ds3103: serialize reset messages in m88ds3103_set_frontend
scsi: qla4xxx: avoid freeing unallocated dma memory
usb: core: Add PM runtime calls to usb_hcd_platform_shutdown
rcutorture: Fix cleanup path for invalid torture_type strings
tty: ipwireless: fix missing checks for ioremap
virtio_console: initialize vtermno value for ports
media: wl128x: prevent two potential buffer overflows
spi: tegra114: reset controller on probe
cxgb3/l2t: Fix undefined behaviour
ASoC: fsl_utils: fix a leaked reference by adding missing of_node_put
ASoC: eukrea-tlv320: fix a leaked reference by adding missing of_node_put
HID: core: move Usage Page concatenation to Main item
chardev: add additional check for minor range overlap
x86/ia32: Fix ia32_restore_sigcontext() AC leak
arm64: cpu_ops: fix a leaked reference by adding missing of_node_put
scsi: ufs: Avoid configuring regulator with undefined voltage range
scsi: ufs: Fix regulator load and icc-level configuration
brcmfmac: fix race during disconnect when USB completion is in progress
brcmfmac: convert dev_init_lock mutex to completion
b43: shut up clang -Wuninitialized variable warning
brcmfmac: fix missing checks for kmemdup
rtlwifi: fix a potential NULL pointer dereference
iio: common: ssp_sensors: Initialize calculated_time in ssp_common_process_data
iio: hmc5843: fix potential NULL pointer dereferences
iio: ad_sigma_delta: Properly handle SPI bus locking vs CS assertion
x86/build: Keep local relocations with ld.lld
cpufreq: pmac32: fix possible object reference leak
cpufreq/pasemi: fix possible object reference leak
cpufreq: ppc_cbe: fix possible object reference leak
s390: cio: fix cio_irb declaration
extcon: arizona: Disable mic detect if running when driver is removed
PM / core: Propagate dev->power.wakeup_path when no callbacks
mmc: sdhci-of-esdhc: add erratum eSDHC-A001 and A-008358 support
mmc: sdhci-of-esdhc: add erratum eSDHC5 support
mmc_spi: add a status check for spi_sync_locked
scsi: libsas: Do discovery on empty PHY to update PHY info
hwmon: (f71805f) Use request_muxed_region for Super-IO accesses
hwmon: (pc87427) Use request_muxed_region for Super-IO accesses
hwmon: (smsc47b397) Use request_muxed_region for Super-IO accesses
hwmon: (smsc47m1) Use request_muxed_region for Super-IO accesses
hwmon: (vt1211) Use request_muxed_region for Super-IO accesses
RDMA/cxgb4: Fix null pointer dereference on alloc_skb failure
i40e: don't allow changes to HW VLAN stripping on active port VLANs
x86/irq/64: Limit IST stack overflow check to #DB stack
USB: core: Don't unbind interfaces following device reset failure
sched/core: Handle overflow in cpu_shares_write_u64
sched/core: Check quota and period overflow at usec to nsec conversion
powerpc/numa: improve control of topology updates
media: pvrusb2: Prevent a buffer overflow
media: au0828: Fix NULL pointer dereference in au0828_analog_stream_enable()
audit: fix a memory leak bug
media: ov2659: make S_FMT succeed even if requested format doesn't match
media: au0828: stop video streaming only when last user stops
media: ov6650: Move v4l2_clk_get() to ov6650_video_probe() helper
media: coda: clear error return value before picture run
dmaengine: at_xdmac: remove BUG_ON macro in tasklet
pinctrl: pistachio: fix leaked of_node references
HID: logitech-hidpp: use RAP instead of FAP to get the protocol version
mm/uaccess: Use 'unsigned long' to placate UBSAN warnings on older GCC versions
x86/mm: Remove in_nmi() warning from 64-bit implementation of vmalloc_fault()
smpboot: Place the __percpu annotation correctly
x86/build: Move _etext to actual end of .text
bcache: avoid clang -Wunintialized warning
bcache: add failure check to run_cache_set() for journal replay
bcache: fix failure in journal relplay
bcache: return error immediately in bch_journal_replay()
net: cw1200: fix a NULL pointer dereference
mwifiex: prevent an array overflow
ASoC: fsl_sai: Update is_slave_mode with correct value
mac80211/cfg80211: update bss channel on channel switch
dmaengine: pl330: _stop: clear interrupt status
w1: fix the resume command API
rtc: 88pm860x: prevent use-after-free on device remove
brcm80211: potential NULL dereference in brcmf_cfg80211_vndr_cmds_dcmd_handler()
spi: pxa2xx: fix SCR (divisor) calculation
ASoC: imx: fix fiq dependencies
powerpc/boot: Fix missing check of lseek() return value
mmc: core: Verify SD bus width
cxgb4: Fix error path in cxgb4_init_module
gfs2: Fix lru_count going negative
tools include: Adopt linux/bits.h
perf tools: No need to include bitops.h in util.h
at76c50x-usb: Don't register led_trigger if usb_register_driver failed
ssb: Fix possible NULL pointer dereference in ssb_host_pcmcia_exit
media: vivid: use vfree() instead of kfree() for dev->bitmap_cap
media: cpia2: Fix use-after-free in cpia2_exit
fbdev: fix WARNING in __alloc_pages_nodemask bug
hugetlb: use same fault hash key for shared and private mappings
fbdev: fix divide error in fb_var_to_videomode
btrfs: sysfs: don't leak memory when failing add fsid
Btrfs: fix race between ranged fsync and writeback of adjacent ranges
gfs2: Fix sign extension bug in gfs2_update_stats
crypto: vmx - CTR: always increment IV as quadword
Revert "scsi: sd: Keep disk read-only when re-reading partition"
bio: fix improper use of smp_mb__before_atomic()
KVM: x86: fix return value for reserved EFER
ext4: do not delete unlinked inode from orphan list on failed truncate
fbdev: sm712fb: fix memory frequency by avoiding a switch/case fallthrough
btrfs: Honour FITRIM range constraints during free space trim
md/raid: raid5 preserve the writeback action after the parity check
Revert "Don't jump to compute_result state from check_result state"
perf bench numa: Add define for RUSAGE_THREAD if not present
ufs: fix braino in ufs_get_inode_gid() for solaris UFS flavour
power: supply: sysfs: prevent endless uevent loop with CONFIG_POWER_SUPPLY_DEBUG
KVM: arm/arm64: Ensure vcpu target is unset on reset failure
xfrm4: Fix uninitialized memory read in _decode_session4
vti4: ipip tunnel deregistration fixes.
xfrm6_tunnel: Fix potential panic when unloading xfrm6_tunnel module
xfrm: policy: Fix out-of-bound array accesses in __xfrm_policy_unlink
dm delay: fix a crash when invalid device is specified
PCI: Mark Atheros AR9462 to avoid bus reset
fbdev: sm712fb: fix crashes and garbled display during DPMS modesetting
fbdev: sm712fb: use 1024x768 by default on non-MIPS, fix garbled display
fbdev: sm712fb: fix support for 1024x768-16 mode
fbdev: sm712fb: fix crashes during framebuffer writes by correctly mapping VRAM
fbdev: sm712fb: fix boot screen glitch when sm712fb replaces VGA
fbdev: sm712fb: fix white screen of death on reboot, don't set CR3B-CR3F
fbdev: sm712fb: fix VRAM detection, don't set SR70/71/74/75
fbdev: sm712fb: fix brightness control on reboot, don't set SR30
perf intel-pt: Fix sample timestamp wrt non-taken branches
perf intel-pt: Fix improved sample timestamp
perf intel-pt: Fix instructions sampling rate
memory: tegra: Fix integer overflow on tick value calculation
tracing: Fix partial reading of trace event's id file
ceph: flush dirty inodes before proceeding with remount
iommu/tegra-smmu: Fix invalid ASID bits on Tegra30/114
fuse: honor RLIMIT_FSIZE in fuse_file_fallocate
fuse: fix writepages on 32bit
clk: tegra: Fix PLLM programming on Tegra124+ when PMC overrides divider
NFS4: Fix v4.0 client state corruption when mount
media: ov6650: Fix sensor possibly not detected on probe
cifs: fix strcat buffer overflow and reduce raciness in smb21_set_oplock_level()
of: fix clang -Wunsequenced for be32_to_cpu()
intel_th: msu: Fix single mode with IOMMU
md: add mddev->pers to avoid potential NULL pointer dereference
stm class: Fix channel free in stm output free path
tipc: fix modprobe tipc failed after switch order of device registration
tipc: switch order of device registration to fix a crash
ppp: deflate: Fix possible crash in deflate_init
net/mlx4_core: Change the error print to info print
net: avoid weird emergency message
KVM: x86: Skip EFER vs. guest CPUID checks for host-initiated writes
ALSA: hda/realtek - Fix for Lenovo B50-70 inverted internal microphone bug
ext4: zero out the unused memory region in the extent tree block
fs/writeback.c: use rcu_barrier() to wait for inflight wb switches going into workqueue when umount
writeback: synchronize sync(2) against cgroup writeback membership switches
crypto: arm/aes-neonbs - don't access already-freed walk.iv
crypto: salsa20 - don't access already-freed walk.iv
crypto: chacha20poly1305 - set cra_name correctly
crypto: gcm - fix incompatibility between "gcm" and "gcm_base"
crypto: gcm - Fix error return code in crypto_gcm_create_common()
ipmi:ssif: compare block number correctly for multi-part return messages
bcache: never set KEY_PTRS of journal key to 0 in journal_reclaim()
bcache: fix a race between cache register and cacheset unregister
Btrfs: do not start a transaction at iterate_extent_inodes()
ext4: fix ext4_show_options for file systems w/o journal
ext4: actually request zeroing of inode table after grow
tty/vt: fix write/write race in ioctl(KDSKBSENT) handler
mfd: da9063: Fix OTP control register names to match datasheets for DA9063/63L
ocfs2: fix ocfs2 read inode data panic in ocfs2_iget
mm/mincore.c: make mincore() more conservative
ASoC: RT5677-SPI: Disable 16Bit SPI Transfers
ASoC: max98090: Fix restore of DAPM Muxes
ALSA: hda/realtek - EAPD turn on later
ALSA: hda/hdmi - Consider eld_valid when reporting jack event
ALSA: usb-audio: Fix a memory leak bug
crypto: x86/crct10dif-pcl - fix use via crypto_shash_digest()
crypto: crct10dif-generic - fix use via crypto_shash_digest()
crypto: vmx - fix copy-paste error in CTR mode
ARM: exynos: Fix a leaked reference by adding missing of_node_put
x86/speculation/mds: Improve CPU buffer clear documentation
x86/speculation/mds: Revert CPU buffer clear on double fault exit
f2fs: link f2fs quota ops for sysfile
fs: sdcardfs: Add missing option to show_options
Conflicts:
drivers/scsi/sd.c
drivers/scsi/ufs/ufshcd.c
Change-Id: If6679c7cc8c3fee323c749ac359353fbebfd12d9
Signed-off-by: Srinivasarao P <spathi@codeaurora.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
commit 7fc5854f8c6efae9e7624970ab49a1eac2faefb1 upstream.
sync_inodes_sb() can race against cgwb (cgroup writeback) membership
switches and fail to writeback some inodes. For example, if an inode
switches to another wb while sync_inodes_sb() is in progress, the new
wb might not be visible to bdi_split_work_to_wbs() at all or the inode
might jump from a wb which hasn't issued writebacks yet to one which
already has.
This patch adds backing_dev_info->wb_switch_rwsem to synchronize cgwb
switch path against sync_inodes_sb() so that sync_inodes_sb() is
guaranteed to see all the target wbs and inodes can't jump wbs to
escape syncing.
v2: Fixed misplaced rwsem init. Spotted by Jiufei.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Jiufei Xue <xuejiufei@gmail.com>
Link: http://lkml.kernel.org/r/dc694ae2-f07f-61e1-7097-7c8411cee12d@gmail.com
Acked-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|\|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
* refs/heads/tmp-d6bbe8b
Linux 4.4.127
Revert "ip6_vti: adjust vti mtu according to mtu of lower device"
net: cavium: liquidio: fix up "Avoid dma_unmap_single on uninitialized ndata"
spi: davinci: fix up dma_mapping_error() incorrect patch
Revert "mtip32xx: use runtime tag to initialize command header"
Revert "cpufreq: Fix governor module removal race"
Revert "ARM: dts: omap3-n900: Fix the audio CODEC's reset pin"
Revert "ARM: dts: am335x-pepper: Fix the audio CODEC's reset pin"
Revert "PCI/MSI: Stop disabling MSI/MSI-X in pci_device_shutdown()"
nospec: Kill array_index_nospec_mask_check()
nospec: Move array_index_nospec() parameter checking into separate macro
net: hns: Fix ethtool private flags
md/raid10: reset the 'first' at the end of loop
ARM: dts: am57xx-beagle-x15-common: Add overide powerhold property
ARM: dts: dra7: Add power hold and power controller properties to palmas
Documentation: pinctrl: palmas: Add ti,palmas-powerhold-override property definition
vt: change SGR 21 to follow the standards
Input: i8042 - enable MUX on Sony VAIO VGN-CS series to fix touchpad
Input: i8042 - add Lenovo ThinkPad L460 to i8042 reset list
staging: comedi: ni_mio_common: ack ai fifo error interrupts.
fs/proc: Stop trying to report thread stacks
crypto: x86/cast5-avx - fix ECB encryption when long sg follows short one
crypto: ahash - Fix early termination in hash walk
parport_pc: Add support for WCH CH382L PCI-E single parallel port card.
media: usbtv: prevent double free in error case
mei: remove dev_err message on an unsupported ioctl
USB: serial: cp210x: add ELDAT Easywave RX09 id
USB: serial: ftdi_sio: add support for Harman FirmwareHubEmulator
USB: serial: ftdi_sio: add RT Systems VX-8 cable
usb: dwc2: Improve gadget state disconnection handling
scsi: virtio_scsi: always read VPD pages for multiqueue too
llist: clang: introduce member_address_is_nonnull()
Bluetooth: Fix missing encryption refresh on Security Request
netfilter: x_tables: add and use xt_check_proc_name
netfilter: bridge: ebt_among: add more missing match size checks
xfrm: Refuse to insert 32 bit userspace socket policies on 64 bit systems
net: xfrm: use preempt-safe this_cpu_read() in ipcomp_alloc_tfms()
RDMA/ucma: Introduce safer rdma_addr_size() variants
RDMA/ucma: Don't allow join attempts for unsupported AF family
RDMA/ucma: Check that device exists prior to accessing it
RDMA/ucma: Check that device is connected prior to access it
RDMA/ucma: Ensure that CM_ID exists prior to access it
RDMA/ucma: Fix use-after-free access in ucma_close
RDMA/ucma: Check AF family prior resolving address
xfrm_user: uncoditionally validate esn replay attribute struct
arm64: avoid overflow in VA_START and PAGE_OFFSET
selinux: Remove redundant check for unknown labeling behavior
netfilter: ctnetlink: Make some parameters integer to avoid enum mismatch
tty: provide tty_name() even without CONFIG_TTY
audit: add tty field to LOGIN event
frv: declare jiffies to be located in the .data section
jiffies.h: declare jiffies and jiffies_64 with ____cacheline_aligned_in_smp
fs: compat: Remove warning from COMPATIBLE_IOCTL
selinux: Remove unnecessary check of array base in selinux_set_mapping()
cpumask: Add helper cpumask_available()
genirq: Use cpumask_available() for check of cpumask variable
netfilter: nf_nat_h323: fix logical-not-parentheses warning
Input: mousedev - fix implicit conversion warning
dm ioctl: remove double parentheses
PCI: Make PCI_ROM_ADDRESS_MASK a 32-bit constant
writeback: fix the wrong congested state variable definition
ACPI, PCI, irq: remove redundant check for null string pointer
kprobes/x86: Fix to set RWX bits correctly before releasing trampoline
usb: gadget: f_hid: fix: Prevent accessing released memory
usb: gadget: align buffer size when allocating for OUT endpoint
usb: gadget: fix usb_ep_align_maybe endianness and new usb_ep_align
usb: gadget: change len to size_t on alloc_ep_req()
usb: gadget: define free_ep_req as universal function
partitions/msdos: Unable to mount UFS 44bsd partitions
perf/hwbp: Simplify the perf-hwbp code, fix documentation
ALSA: pcm: potential uninitialized return values
ALSA: pcm: Use dma_bytes as size parameter in dma_mmap_coherent()
mtd: jedec_probe: Fix crash in jedec_read_mfr()
Replace #define with enum for better compilation errors.
Add missing include to drivers/tty/goldfish.c
Fix whitespace in drivers/tty/goldfish.c
ANDROID: fuse: Add null terminator to path in canonical path to avoid issue
ANDROID: sdcardfs: Fix sdcardfs to stop creating cases-sensitive duplicate entries.
ANDROID: add missing include to pdev_bus
ANDROID: pdev_bus: replace writel with gf_write_ptr
ANDROID: Cleanup type casting in goldfish.h
ANDROID: Include missing headers in goldfish.h
ANDROID: cpufreq: times: skip printing invalid frequencies
ANDROID: xt_qtaguid: Remove unnecessary null checks to device's name
ANDROID: xt_qtaguid: Remove unnecessary null checks to ifa_label
ANDROID: cpufreq: times: allocate enough space for a uid_entry
Linux 4.4.126
net: systemport: Rewrite __bcm_sysport_tx_reclaim()
net: fec: Fix unbalanced PM runtime calls
ieee802154: 6lowpan: fix possible NULL deref in lowpan_device_event()
s390/qeth: on channel error, reject further cmd requests
s390/qeth: lock read device while queueing next buffer
s390/qeth: when thread completes, wake up all waiters
s390/qeth: free netdevice when removing a card
team: Fix double free in error path
skbuff: Fix not waking applications when errors are enqueued
net: Only honor ifindex in IP_PKTINFO if non-0
netlink: avoid a double skb free in genlmsg_mcast()
net/iucv: Free memory obtained by kzalloc
net: ethernet: ti: cpsw: add check for in-band mode setting with RGMII PHY interface
net: ethernet: arc: Fix a potential memory leak if an optional regulator is deferred
l2tp: do not accept arbitrary sockets
ipv6: fix access to non-linear packet in ndisc_fill_redirect_hdr_option()
dccp: check sk for closed state in dccp_sendmsg()
net: Fix hlist corruptions in inet_evict_bucket()
Revert "genirq: Use irqd_get_trigger_type to compare the trigger type for shared IRQs"
scsi: sg: don't return bogus Sg_requests
Revert "genirq: Use irqd_get_trigger_type to compare the trigger type for shared IRQs"
UPSTREAM: drm: virtio-gpu: set atomic flag
UPSTREAM: drm: virtio-gpu: transfer dumb buffers to host on plane update
UPSTREAM: drm: virtio-gpu: ensure plane is flushed to host on atomic update
UPSTREAM: drm: virtio-gpu: get the fb from the plane state for atomic updates
Linux 4.4.125
bpf, x64: increase number of passes
bpf: skip unnecessary capability check
kbuild: disable clang's default use of -fmerge-all-constants
staging: lustre: ptlrpc: kfree used instead of kvfree
perf/x86/intel: Don't accidentally clear high bits in bdw_limit_period()
x86/entry/64: Don't use IST entry for #BP stack
x86/boot/64: Verify alignment of the LOAD segment
x86/build/64: Force the linker to use 2MB page size
kvm/x86: fix icebp instruction handling
tty: vt: fix up tabstops properly
can: cc770: Fix use after free in cc770_tx_interrupt()
can: cc770: Fix queue stall & dropped RTR reply
can: cc770: Fix stalls on rt-linux, remove redundant IRQ ack
staging: ncpfs: memory corruption in ncp_read_kernel()
mtd: nand: fsl_ifc: Fix nand waitfunc return value
tracing: probeevent: Fix to support minus offset from symbol
rtlwifi: rtl8723be: Fix loss of signal
brcmfmac: fix P2P_DEVICE ethernet address generation
acpi, numa: fix pxm to online numa node associations
drm: udl: Properly check framebuffer mmap offsets
drm/radeon: Don't turn off DP sink when disconnected
drm/vmwgfx: Fix a destoy-while-held mutex problem.
x86/mm: implement free pmd/pte page interfaces
mm/vmalloc: add interfaces to free unmapped page table
libata: Modify quirks for MX100 to limit NCQ_TRIM quirk to MU01 version
libata: Make Crucial BX100 500GB LPM quirk apply to all firmware versions
libata: Apply NOLPM quirk to Crucial M500 480 and 960GB SSDs
libata: Enable queued TRIM for Samsung SSD 860
libata: disable LPM for Crucial BX100 SSD 500GB drive
libata: Apply NOLPM quirk to Crucial MX100 512GB SSDs
libata: remove WARN() for DMA or PIO command without data
libata: fix length validation of ATAPI-relayed SCSI commands
Bluetooth: btusb: Fix quirk for Atheros 1525/QCA6174
clk: bcm2835: Protect sections updating shared registers
ahci: Add PCI-id for the Highpoint Rocketraid 644L card
PCI: Add function 1 DMA alias quirk for Highpoint RocketRAID 644L
mmc: dw_mmc: fix falling from idmac to PIO mode when dw_mci_reset occurs
ALSA: hda/realtek - Always immediately update mute LED with pin VREF
ALSA: aloop: Fix access to not-yet-ready substream via cable
ALSA: aloop: Sync stale timer before release
ALSA: usb-audio: Fix parsing descriptor of UAC2 processing unit
iio: st_pressure: st_accel: pass correct platform data to init
MIPS: ralink: Remove ralink_halt()
ANDROID: cpufreq: times: fix proc_time_in_state_show
dtc: turn off dtc unit address warnings by default
Linux 4.4.124
RDMA/ucma: Fix access to non-initialized CM_ID object
dmaengine: ti-dma-crossbar: Fix event mapping for TPCC_EVT_MUX_60_63
clk: si5351: Rename internal plls to avoid name collisions
nfsd4: permit layoutget of executable-only files
RDMA/ocrdma: Fix permissions for OCRDMA_RESET_STATS
ip6_vti: adjust vti mtu according to mtu of lower device
iommu/vt-d: clean up pr_irq if request_threaded_irq fails
pinctrl: Really force states during suspend/resume
coresight: Fix disabling of CoreSight TPIU
pty: cancel pty slave port buf's work in tty_release
drm/omap: DMM: Check for DMM readiness after successful transaction commit
vgacon: Set VGA struct resource types
IB/umem: Fix use of npages/nmap fields
RDMA/cma: Use correct size when writing netlink stats
IB/ipoib: Avoid memory leak if the SA returns a different DGID
mmc: avoid removing non-removable hosts during suspend
platform/chrome: Use proper protocol transfer function
cros_ec: fix nul-termination for firmware build info
media: [RESEND] media: dvb-frontends: Add delay to Si2168 restart
media: bt8xx: Fix err 'bt878_probe()'
rtlwifi: rtl_pci: Fix the bug when inactiveps is enabled.
RDMA/iwpm: Fix uninitialized error code in iwpm_send_mapinfo()
drm/msm: fix leak in failed get_pages
media: c8sectpfe: fix potential NULL pointer dereference in c8sectpfe_timer_interrupt
Bluetooth: hci_qca: Avoid setup failure on missing rampatch
perf tests kmod-path: Don't fail if compressed modules aren't supported
rtc: ds1374: wdt: Fix stop/start ioctl always returning -EINVAL
rtc: ds1374: wdt: Fix issue with timeout scaling from secs to wdt ticks
cifs: small underflow in cnvrtDosUnixTm()
net: hns: fix ethtool_get_strings overflow in hns driver
sm501fb: don't return zero on failure path in sm501fb_start()
video: fbdev: udlfb: Fix buffer on stack
tcm_fileio: Prevent information leak for short reads
ia64: fix module loading for gcc-5.4
md/raid10: skip spare disk as 'first' disk
Input: twl4030-pwrbutton - use correct device for irq request
power: supply: pda_power: move from timer to delayed_work
bnx2x: Align RX buffers
drm/nouveau/kms: Increase max retries in scanout position queries.
ACPI / PMIC: xpower: Fix power_table addresses
ipmi/watchdog: fix wdog hang on panic waiting for ipmi response
ARM: DRA7: clockdomain: Change the CLKTRCTRL of CM_PCIE_CLKSTCTRL to SW_WKUP
mmc: sdhci-of-esdhc: limit SD clock for ls1012a/ls1046a
staging: wilc1000: fix unchecked return value
staging: unisys: visorhba: fix s-Par to boot with option CONFIG_VMAP_STACK set to y
mtip32xx: use runtime tag to initialize command header
mfd: palmas: Reset the POWERHOLD mux during power off
mac80211: don't parse encrypted management frames in ieee80211_frame_acked
Btrfs: send, fix file hole not being preserved due to inline extent
rndis_wlan: add return value validation
mt7601u: check return value of alloc_skb
iio: st_pressure: st_accel: Initialise sensor platform data properly
NFS: don't try to cross a mountpount when there isn't one there.
infiniband/uverbs: Fix integer overflows
scsi: mac_esp: Replace bogus memory barrier with spinlock
qlcnic: fix unchecked return value
wan: pc300too: abort path on failure
mmc: host: omap_hsmmc: checking for NULL instead of IS_ERR()
openvswitch: Delete conntrack entry clashing with an expectation.
netfilter: xt_CT: fix refcnt leak on error path
Fix driver usage of 128B WQEs when WQ_CREATE is V1.
ASoC: Intel: Skylake: Uninitialized variable in probe_codec()
IB/mlx4: Change vma from shared to private
IB/mlx4: Take write semaphore when changing the vma struct
HSI: ssi_protocol: double free in ssip_pn_xmit()
IB/ipoib: Update broadcast object if PKey value was changed in index 0
IB/ipoib: Fix deadlock between ipoib_stop and mcast join flow
ALSA: hda - Fix headset microphone detection for ASUS N551 and N751
e1000e: fix timing for 82579 Gigabit Ethernet controller
tcp: remove poll() flakes with FastOpen
NFS: Fix missing pg_cleanup after nfs_pageio_cond_complete()
md/raid10: wait up frozen array in handle_write_completed
iommu/omap: Register driver before setting IOMMU ops
ARM: 8668/1: ftrace: Fix dynamic ftrace with DEBUG_RODATA and !FRAME_POINTER
KVM: PPC: Book3S PR: Exit KVM on failed mapping
scsi: virtio_scsi: Always try to read VPD pages
clk: ns2: Correct SDIO bits
ath: Fix updating radar flags for coutry code India
spi: dw: Disable clock after unregistering the host
media/dvb-core: Race condition when writing to CAM
net: ipv6: send unsolicited NA on admin up
i2c: i2c-scmi: add a MS HID
genirq: Use irqd_get_trigger_type to compare the trigger type for shared IRQs
cpufreq/sh: Replace racy task affinity logic
ACPI/processor: Replace racy task affinity logic
ACPI/processor: Fix error handling in __acpi_processor_start()
time: Change posix clocks ops interfaces to use timespec64
Input: ar1021_i2c - fix too long name in driver's device table
rtc: cmos: Do not assume irq 8 for rtc when there are no legacy irqs
x86: i8259: export legacy_pic symbol
regulator: anatop: set default voltage selector for pcie
platform/x86: asus-nb-wmi: Add wapf4 quirk for the X302UA
staging: android: ashmem: Fix possible deadlock in ashmem_ioctl
CIFS: Enable encryption during session setup phase
SMB3: Validate negotiate request must always be signed
tpm_tis: fix potential buffer overruns caused by bit glitches on the bus
tpm: fix potential buffer overruns caused by bit glitches on the bus
BACKPORT, FROMLIST: crypto: arm64/speck - add NEON-accelerated implementation of Speck-XTS
Linux 4.4.123
bpf: fix incorrect sign extension in check_alu_op()
usb: gadget: bdc: 64-bit pointer capability check
USB: gadget: udc: Add missing platform_device_put() on error in bdc_pci_probe()
btrfs: Fix use-after-free when cleaning up fs_devs with a single stale device
btrfs: alloc_chunk: fix DUP stripe size handling
ARM: dts: LogicPD Torpedo: Fix I2C1 pinmux
scsi: sg: only check for dxfer_len greater than 256M
scsi: sg: fix static checker warning in sg_is_valid_dxfer
scsi: sg: fix SG_DXFER_FROM_DEV transfers
irqchip/gic-v3-its: Ensure nr_ites >= nr_lpis
fs/aio: Use RCU accessors for kioctx_table->table[]
fs/aio: Add explicit RCU grace period when freeing kioctx
lock_parent() needs to recheck if dentry got __dentry_kill'ed under it
fs: Teach path_connected to handle nfs filesystems with multiple roots.
drm/amdgpu/dce: Don't turn off DP sink when disconnected
ALSA: seq: Clear client entry before deleting else at closing
ALSA: seq: Fix possible UAF in snd_seq_check_queue()
ALSA: hda - Revert power_save option default value
ALSA: pcm: Fix UAF in snd_pcm_oss_get_formats()
x86/mm: Fix vmalloc_fault to use pXd_large
x86/vm86/32: Fix POPF emulation
selftests/x86/entry_from_vm86: Add test cases for POPF
selftests/x86: Add tests for the STR and SLDT instructions
selftests/x86: Add tests for User-Mode Instruction Prevention
selftests/x86/entry_from_vm86: Exit with 1 if we fail
ima: relax requiring a file signature for new files with zero length
rcutorture/configinit: Fix build directory error message
ipvlan: add L2 check for packets arriving via virtual devices
ASoC: nuc900: Fix a loop timeout test
mac80211: remove BUG() when interface type is invalid
mac80211_hwsim: enforce PS_MANUAL_POLL to be set after PS_ENABLED
agp/intel: Flush all chipset writes after updating the GGTT
drm/amdkfd: Fix memory leaks in kfd topology
veth: set peer GSO values
media: cpia2: Fix a couple off by one bugs
scsi: dh: add new rdac devices
scsi: devinfo: apply to HP XP the same flags as Hitachi VSP
scsi: core: scsi_get_device_flags_keyed(): Always return device flags
spi: sun6i: disable/unprepare clocks on remove
tools/usbip: fixes build with musl libc toolchain
ath10k: fix invalid STS_CAP_OFFSET_MASK
clk: qcom: msm8916: fix mnd_width for codec_digcodec
cpufreq: Fix governor module removal race
ath10k: update tdls teardown state to target
ARM: dts: omap3-n900: Fix the audio CODEC's reset pin
ARM: dts: am335x-pepper: Fix the audio CODEC's reset pin
mtd: nand: fix interpretation of NAND_CMD_NONE in nand_command[_lp]()
net: xfrm: allow clearing socket xfrm policies.
test_firmware: fix setting old custom fw path back on exit
sched: Stop resched_cpu() from sending IPIs to offline CPUs
sched: Stop switched_to_rt() from sending IPIs to offline CPUs
ARM: dts: exynos: Correct Trats2 panel reset line
HID: elo: clear BTN_LEFT mapping
video/hdmi: Allow "empty" HDMI infoframes
drm/edid: set ELD connector type in drm_edid_to_eld()
wil6210: fix memory access violation in wil_memcpy_from/toio_32
pwm: tegra: Increase precision in PWM rate calculation
kprobes/x86: Set kprobes pages read-only
kprobes/x86: Fix kprobe-booster not to boost far call instructions
scsi: sg: close race condition in sg_remove_sfp_usercontext()
scsi: sg: check for valid direction before starting the request
perf session: Don't rely on evlist in pipe mode
perf inject: Copy events when reordering events in pipe mode
drivers/perf: arm_pmu: handle no platform_device
usb: gadget: dummy_hcd: Fix wrong power status bit clear/reset in dummy_hub_control()
usb: dwc2: Make sure we disconnect the gadget state
md/raid6: Fix anomily when recovering a single device in RAID6.
regulator: isl9305: fix array size
MIPS: r2-on-r6-emu: Clear BLTZALL and BGEZALL debugfs counters
MIPS: r2-on-r6-emu: Fix BLEZL and BGTZL identification
MIPS: BPF: Fix multiple problems in JIT skb access helpers.
MIPS: BPF: Quit clobbering callee saved registers in JIT code.
coresight: Fixes coresight DT parse to get correct output port ID.
drm/amdgpu: Fail fb creation from imported dma-bufs. (v2)
drm/radeon: Fail fb creation from imported dma-bufs.
video: ARM CLCD: fix dma allocation size
iommu/iova: Fix underflow bug in __alloc_and_insert_iova_range
apparmor: Make path_max parameter readonly
scsi: ses: don't get power status of SES device slot on probe
fm10k: correctly check if interface is removed
ALSA: firewire-digi00x: handle all MIDI messages on streaming packets
reiserfs: Make cancel_old_flush() reliable
ARM: dts: koelsch: Correct clock frequency of X2 DU clock input
net/faraday: Add missing include of of.h
powerpc: Avoid taking a data miss on every userspace instruction miss
ARM: dts: r8a7791: Correct parent of SSI[0-9] clocks
ARM: dts: r8a7790: Correct parent of SSI[0-9] clocks
NFC: nfcmrvl: double free on error path
NFC: nfcmrvl: Include unaligned.h instead of access_ok.h
vxlan: vxlan dev should inherit lowerdev's gso_max_size
drm/vmwgfx: Fixes to vmwgfx_fb
braille-console: Fix value returned by _braille_console_setup
bonding: refine bond_fold_stats() wrap detection
f2fs: relax node version check for victim data in gc
blk-throttle: make sure expire time isn't too big
mm: Fix false-positive VM_BUG_ON() in page_cache_{get,add}_speculative()
driver: (adm1275) set the m,b and R coefficients correctly for power
dmaengine: imx-sdma: add 1ms delay to ensure SDMA channel is stopped
tcp: sysctl: Fix a race to avoid unexpected 0 window from space
spi: omap2-mcspi: poll OMAP2_MCSPI_CHSTAT_RXS for PIO transfer
ASoC: rcar: ssi: don't set SSICR.CKDV = 000 with SSIWSR.CONT
sched: act_csum: don't mangle TCP and UDP GSO packets
Input: qt1070 - add OF device ID table
sysrq: Reset the watchdog timers while displaying high-resolution timers
timers, sched_clock: Update timeout for clock wrap
media: i2c/soc_camera: fix ov6650 sensor getting wrong clock
scsi: ipr: Fix missed EH wakeup
solo6x10: release vb2 buffers in solo_stop_streaming()
of: fix of_device_get_modalias returned length when truncating buffers
batman-adv: handle race condition for claims between gateways
ARM: dts: Adjust moxart IRQ controller and flags
net/8021q: create device with all possible features in wanted_features
HID: clamp input to logical range if no null state
perf probe: Return errno when not hitting any event
ath10k: disallow DFS simulation if DFS channel is not enabled
drm: Defer disabling the vblank IRQ until the next interrupt (for instant-off)
drivers: net: xgene: Fix hardware checksum setting
perf tools: Make perf_event__synthesize_mmap_events() scale
i40e: fix ethtool to get EEPROM data from X722 interface
i40e: Acquire NVM lock before reads on all devices
perf sort: Fix segfault with basic block 'cycles' sort dimension
selinux: check for address length in selinux_socket_bind()
PCI/MSI: Stop disabling MSI/MSI-X in pci_device_shutdown()
ath10k: fix a warning during channel switch with multiple vaps
drm: qxl: Don't alloc fbdev if emulation is not supported
HID: reject input outside logical range only if null state is set
staging: wilc1000: add check for kmalloc allocation failure.
staging: speakup: Replace BUG_ON() with WARN_ON().
Input: tsc2007 - check for presence and power down tsc2007 during probe
blkcg: fix double free of new_blkg in blkcg_init_queue
ANDROID: cpufreq: times: avoid prematurely freeing uid_entry
ANDROID: Use standard logging functions in goldfish_pipe
ANDROID: Fix whitespace in goldfish
staging: android: ashmem: Fix possible deadlock in ashmem_ioctl
llist: clang: introduce member_address_is_nonnull()
Linux 4.4.122
fixup: sctp: verify size of a new chunk in _sctp_make_chunk()
serial: 8250_pci: Add Brainboxes UC-260 4 port serial device
usb: gadget: f_fs: Fix use-after-free in ffs_fs_kill_sb()
usb: usbmon: Read text within supplied buffer size
USB: usbmon: remove assignment from IS_ERR argument
usb: quirks: add control message delay for 1b1c:1b20
USB: storage: Add JMicron bridge 152d:2567 to unusual_devs.h
staging: android: ashmem: Fix lockdep issue during llseek
staging: comedi: fix comedi_nsamples_left.
uas: fix comparison for error code
tty/serial: atmel: add new version check for usart
serial: sh-sci: prevent lockup on full TTY buffers
x86: Treat R_X86_64_PLT32 as R_X86_64_PC32
x86/module: Detect and skip invalid relocations
Revert "ARM: dts: LogicPD Torpedo: Fix I2C1 pinmux"
NFS: Fix an incorrect type in struct nfs_direct_req
scsi: qla2xxx: Replace fcport alloc with qla2x00_alloc_fcport
ubi: Fix race condition between ubi volume creation and udev
ext4: inplace xattr block update fails to deduplicate blocks
netfilter: x_tables: pack percpu counter allocations
netfilter: x_tables: pass xt_counters struct to counter allocator
netfilter: x_tables: pass xt_counters struct instead of packet counter
netfilter: use skb_to_full_sk in ip_route_me_harder
netfilter: ipv6: fix use-after-free Write in nf_nat_ipv6_manip_pkt
netfilter: bridge: ebt_among: add missing match size checks
netfilter: ebtables: CONFIG_COMPAT: don't trust userland offsets
netfilter: IDLETIMER: be syzkaller friendly
netfilter: nat: cope with negative port range
netfilter: x_tables: fix missing timer initialization in xt_LED
netfilter: add back stackpointer size checks
tc358743: fix register i2c_rd/wr function fix
Input: tca8418_keypad - remove double read of key event register
ARM: omap2: hide omap3_save_secure_ram on non-OMAP3 builds
netfilter: nfnetlink_queue: fix timestamp attribute
watchdog: hpwdt: fix unused variable warning
watchdog: hpwdt: Check source of NMI
watchdog: hpwdt: SMBIOS check
nospec: Include <asm/barrier.h> dependency
ALSA: hda: add dock and led support for HP ProBook 640 G2
ALSA: hda: add dock and led support for HP EliteBook 820 G3
ALSA: seq: More protection for concurrent write and ioctl races
ALSA: seq: Don't allow resizing pool in use
ALSA: hda/realtek - Fix dock line-out volume on Dell Precision 7520
x86/MCE: Serialize sysfs changes
bcache: don't attach backing with duplicate UUID
kbuild: Handle builtin dtb file names containing hyphens
loop: Fix lost writes caused by missing flag
Input: matrix_keypad - fix race when disabling interrupts
MIPS: OCTEON: irq: Check for null return on kzalloc allocation
MIPS: ath25: Check for kzalloc allocation failure
MIPS: BMIPS: Do not mask IPIs during suspend
drm/amdgpu: fix KV harvesting
drm/radeon: fix KV harvesting
drm/amdgpu: Notify sbios device ready before send request
drm/amdgpu: Fix deadlock on runtime suspend
drm/radeon: Fix deadlock on runtime suspend
drm/nouveau: Fix deadlock on runtime suspend
drm: Allow determining if current task is output poll worker
workqueue: Allow retrieval of current task's work struct
scsi: qla2xxx: Fix NULL pointer crash due to active timer for ABTS
RDMA/mlx5: Fix integer overflow while resizing CQ
RDMA/ucma: Check that user doesn't overflow QP state
RDMA/ucma: Limit possible option size
ANDROID: ranchu: 32 bit framebuffer support
ANDROID: Address checkpatch warnings in goldfishfb
ANDROID: Address checkpatch.pl warnings in goldfish_pipe
ANDROID: sdcardfs: fix lock issue on 32 bit/SMP architectures
ANDROID: goldfish: Fix typo in goldfish_cmd_locked() call
ANDROID: Address checkpatch.pl warnings in goldfish_pipe_v2
FROMLIST: f2fs: don't put dentry page in pagecache into highmem
Linux 4.4.121
btrfs: preserve i_mode if __btrfs_set_acl() fails
bpf, x64: implement retpoline for tail call
dm io: fix duplicate bio completion due to missing ref count
mpls, nospec: Sanitize array index in mpls_label_ok()
net: mpls: Pull common label check into helper
sctp: verify size of a new chunk in _sctp_make_chunk()
s390/qeth: fix IPA command submission race
s390/qeth: fix SETIP command handling
sctp: fix dst refcnt leak in sctp_v6_get_dst()
sctp: fix dst refcnt leak in sctp_v4_get_dst
udplite: fix partial checksum initialization
ppp: prevent unregistered channels from connecting to PPP units
netlink: ensure to loop over all netns in genlmsg_multicast_allns()
net: ipv4: don't allow setting net.ipv4.route.min_pmtu below 68
net: fix race on decreasing number of TX queues
ipv6 sit: work around bogus gcc-8 -Wrestrict warning
hdlc_ppp: carrier detect ok, don't turn off negotiation
fib_semantics: Don't match route with mismatching tclassid
bridge: check brport attr show in brport_show
Revert "led: core: Fix brightness setting when setting delay_off=0"
x86/spectre: Fix an error message
leds: do not overflow sysfs buffer in led_trigger_show
x86/apic/vector: Handle legacy irq data correctly
ARM: dts: LogicPD Torpedo: Fix I2C1 pinmux
btrfs: Don't clear SGID when inheriting ACLs
x86/syscall: Sanitize syscall table de-references under speculation fix
KVM: mmu: Fix overlap between public and private memslots
ARM: mvebu: Fix broken PL310_ERRATA_753970 selects
nospec: Allow index argument to have const-qualified type
media: m88ds3103: don't call a non-initalized function
cpufreq: s3c24xx: Fix broken s3c_cpufreq_init()
ALSA: hda: Add a power_save blacklist
ALSA: usb-audio: Add a quirck for B&W PX headphones
tpm_i2c_nuvoton: fix potential buffer overruns caused by bit glitches on the bus
tpm_i2c_infineon: fix potential buffer overruns caused by bit glitches on the bus
tpm: st33zp24: fix potential buffer overruns caused by bit glitches on the bus
ANDROID: Delete the goldfish_nand driver.
ANDROID: Add input support for Android Wear.
ANDROID: proc: fix config & includes for /proc/uid
FROMLIST: ARM: amba: Don't read past the end of sysfs "driver_override" buffer
UPSTREAM: ANDROID: binder: remove WARN() for redundant txn error
ANDROID: cpufreq: times: Add missing includes
ANDROID: cpufreq: Add time_in_state to /proc/uid directories
ANDROID: proc: Add /proc/uid directory
ANDROID: cpufreq: times: track per-uid time in state
ANDROID: cpufreq: track per-task time in state
Conflicts:
drivers/gpu/drm/msm/msm_gem.c
drivers/net/wireless/ath/regd.c
kernel/sched/core.c
Change-Id: I9bb7b5a062415da6925a5a56a34e6eb066a53320
Signed-off-by: Srinivasarao P <spathi@codeaurora.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
commit c877ef8ae7b8edaedccad0fc8c23d4d1de7e2480 upstream.
The right variable definition should be wb_congested_state that
include WB_async_congested and WB_sync_congested. So fix it.
Signed-off-by: Kaixu Xia <xiakaixu@huawei.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@fb.com>
Cc: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|/
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Instead of storing backing_dev_info inside struct request_queue,
allocate it dynamically, reference count it, and free it when the last
reference is dropped. Currently only request_queue holds the reference
but in the following patch we add other users referencing
backing_dev_info.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@fb.com>
Change-Id: Ibcee7b4c014018f9243cd3edbfd9c4a8877c3862
Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git
Git-commit: d03f6cdc1fc422accb734c7c07a661a0018d8631
[riteshh@codeaurora.org: resolved merge conflicts]
Signed-off-by: Ritesh Harjani <riteshh@codeaurora.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
commit 5f478e4ea5c5560b4e40eb136991a09f9389f331 upstream.
When !CONFIG_CGROUP_WRITEBACK, bdi has single bdi_writeback_congested
at bdi->wb_congested. cgwb_bdi_init() allocates it with kzalloc() and
doesn't do further initialization. This usually works fine as the
reference count gets bumped to 1 by wb_init() and the put from
wb_exit() releases it.
However, when wb_init() fails, it puts the wb base ref automatically
freeing the wb and the explicit kfree() in cgwb_bdi_init() error path
ends up trying to free the same pointer the second time causing a
double-free.
Fix it by explicitly initilizing the refcnt to 1 and putting the base
ref from cgwb_bdi_destroy().
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Fixes: a13f35e87140 ("writeback: don't embed root bdi_writeback_congested in bdi_writeback")
Signed-off-by: Jens Axboe <axboe@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
commit df08c32ce3be5be138c1dbfcba203314a3a7cd6f upstream.
The name for a bdi of a gendisk is derived from the gendisk's devt.
However, since the gendisk is destroyed before the bdi it leaves a
window where a new gendisk could dynamically reuse the same devt while a
bdi with the same name is still live. Arrange for the bdi to hold a
reference against its "owner" disk device while it is registered.
Otherwise we can hit sysfs duplicate name collisions like the following:
WARNING: CPU: 10 PID: 2078 at fs/sysfs/dir.c:31 sysfs_warn_dup+0x64/0x80
sysfs: cannot create duplicate filename '/devices/virtual/bdi/259:1'
Hardware name: HP ProLiant DL580 Gen8, BIOS P79 05/06/2015
0000000000000286 0000000002c04ad5 ffff88006f24f970 ffffffff8134caec
ffff88006f24f9c0 0000000000000000 ffff88006f24f9b0 ffffffff8108c351
0000001f0000000c ffff88105d236000 ffff88105d1031e0 ffff8800357427f8
Call Trace:
[<ffffffff8134caec>] dump_stack+0x63/0x87
[<ffffffff8108c351>] __warn+0xd1/0xf0
[<ffffffff8108c3cf>] warn_slowpath_fmt+0x5f/0x80
[<ffffffff812a0d34>] sysfs_warn_dup+0x64/0x80
[<ffffffff812a0e1e>] sysfs_create_dir_ns+0x7e/0x90
[<ffffffff8134faaa>] kobject_add_internal+0xaa/0x320
[<ffffffff81358d4e>] ? vsnprintf+0x34e/0x4d0
[<ffffffff8134ff55>] kobject_add+0x75/0xd0
[<ffffffff816e66b2>] ? mutex_lock+0x12/0x2f
[<ffffffff8148b0a5>] device_add+0x125/0x610
[<ffffffff8148b788>] device_create_groups_vargs+0xd8/0x100
[<ffffffff8148b7cc>] device_create_vargs+0x1c/0x20
[<ffffffff811b775c>] bdi_register+0x8c/0x180
[<ffffffff811b7877>] bdi_register_dev+0x27/0x30
[<ffffffff813317f5>] add_disk+0x175/0x4a0
Reported-by: Yi Zhang <yizhan@redhat.com>
Tested-by: Yi Zhang <yizhan@redhat.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Fixed up missing 0 return in bdi_register_owner().
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
commit 564e81a57f9788b1475127012e0fd44e9049e342 upstream.
Jan Stancek has reported that system occasionally hanging after "oom01"
testcase from LTP triggers OOM. Guessing from a result that there is a
kworker thread doing memory allocation and the values between "Node 0
Normal free:" and "Node 0 Normal:" differs when hanging, vmstat is not
up-to-date for some reason.
According to commit 373ccbe59270 ("mm, vmstat: allow WQ concurrency to
discover memory reclaim doesn't make any progress"), it meant to force
the kworker thread to take a short sleep, but it by error used
schedule_timeout(1). We missed that schedule_timeout() in state
TASK_RUNNING doesn't do anything.
Fix it by using schedule_timeout_uninterruptible(1) which forces the
kworker thread to take a short sleep in order to make sure that vmstat
is up-to-date.
Fixes: 373ccbe59270 ("mm, vmstat: allow WQ concurrency to discover memory reclaim doesn't make any progress")
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reported-by: Jan Stancek <jstancek@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Cristopher Lameter <clameter@sgi.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Arkadiusz Miskiewicz <arekm@maven.pl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
progress
Tetsuo Handa has reported that the system might basically livelock in
OOM condition without triggering the OOM killer.
The issue is caused by internal dependency of the direct reclaim on
vmstat counter updates (via zone_reclaimable) which are performed from
the workqueue context. If all the current workers get assigned to an
allocation request, though, they will be looping inside the allocator
trying to reclaim memory but zone_reclaimable can see stalled numbers so
it will consider a zone reclaimable even though it has been scanned way
too much. WQ concurrency logic will not consider this situation as a
congested workqueue because it relies that worker would have to sleep in
such a situation. This also means that it doesn't try to spawn new
workers or invoke the rescuer thread if the one is assigned to the
queue.
In order to fix this issue we need to do two things. First we have to
let wq concurrency code know that we are in trouble so we have to do a
short sleep. In order to prevent from issues handled by 0e093d99763e
("writeback: do not sleep on the congestion queue if there are no
congested BDIs or if significant congestion is not being encountered in
the current zone") we limit the sleep only to worker threads which are
the ones of the interest anyway.
The second thing to do is to create a dedicated workqueue for vmstat and
mark it WQ_MEM_RECLAIM to note it participates in the reclaim and to
have a spare worker thread for it.
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Tejun Heo <tj@kernel.org>
Cc: Cristopher Lameter <clameter@sgi.com>
Cc: Joonsoo Kim <js1304@gmail.com>
Cc: Arkadiusz Miskiewicz <arekm@maven.pl>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
sleep and avoiding waking kswapd
__GFP_WAIT has been used to identify atomic context in callers that hold
spinlocks or are in interrupts. They are expected to be high priority and
have access one of two watermarks lower than "min" which can be referred
to as the "atomic reserve". __GFP_HIGH users get access to the first
lower watermark and can be called the "high priority reserve".
Over time, callers had a requirement to not block when fallback options
were available. Some have abused __GFP_WAIT leading to a situation where
an optimisitic allocation with a fallback option can access atomic
reserves.
This patch uses __GFP_ATOMIC to identify callers that are truely atomic,
cannot sleep and have no alternative. High priority users continue to use
__GFP_HIGH. __GFP_DIRECT_RECLAIM identifies callers that can sleep and
are willing to enter direct reclaim. __GFP_KSWAPD_RECLAIM to identify
callers that want to wake kswapd for background reclaim. __GFP_WAIT is
redefined as a caller that is willing to enter direct reclaim and wake
kswapd for background reclaim.
This patch then converts a number of sites
o __GFP_ATOMIC is used by callers that are high priority and have memory
pools for those requests. GFP_ATOMIC uses this flag.
o Callers that have a limited mempool to guarantee forward progress clear
__GFP_DIRECT_RECLAIM but keep __GFP_KSWAPD_RECLAIM. bio allocations fall
into this category where kswapd will still be woken but atomic reserves
are not used as there is a one-entry mempool to guarantee progress.
o Callers that are checking if they are non-blocking should use the
helper gfpflags_allow_blocking() where possible. This is because
checking for __GFP_WAIT as was done historically now can trigger false
positives. Some exceptions like dm-crypt.c exist where the code intent
is clearer if __GFP_DIRECT_RECLAIM is used instead of the helper due to
flag manipulations.
o Callers that built their own GFP flags instead of starting with GFP_KERNEL
and friends now also need to specify __GFP_KSWAPD_RECLAIM.
The first key hazard to watch out for is callers that removed __GFP_WAIT
and was depending on access to atomic reserves for inconspicuous reasons.
In some cases it may be appropriate for them to use __GFP_HIGH.
The second key hazard is callers that assembled their own combination of
GFP flags instead of starting with something like GFP_KERNEL. They may
now wish to specify __GFP_KSWAPD_RECLAIM. It's almost certainly harmless
if it's missed in most cases as other activity will wake kswapd.
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Vitaly Wool <vitalywool@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
cgwb_bdi_destroy()
a20135ffbc44 ("writeback: don't drain bdi_writeback_congested on bdi
destruction") added rbtree_postorder_for_each_entry_safe() which is
used to remove all entries; however, according to Cody, the iterator
isn't safe against operations which may rebalance the tree. Fix it by
switching to repeatedly removing rb_first() until empty.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Cody P Schafer <dev@codyps.com>
Fixes: a20135ffbc44 ("writeback: don't drain bdi_writeback_congested on bdi destruction")
Link: http://lkml.kernel.org/g/1443997973-1700-1-git-send-email-dev@codyps.com
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
bdi's are initialized in two steps, bdi_init() and bdi_register(), but
destroyed in a single step by bdi_destroy() which, for a bdi embedded
in a request_queue, is called during blk_cleanup_queue() which makes
the queue invisible and starts the draining of remaining usages.
A request_queue's user can access the congestion state of the embedded
bdi as long as it holds a reference to the queue. As such, it may
access the congested state of a queue which finished
blk_cleanup_queue() but hasn't reached blk_release_queue() yet.
Because the congested state was embedded in backing_dev_info which in
turn is embedded in request_queue, accessing the congested state after
bdi_destroy() was called was fine. The bdi was destroyed but the
memory region for the congested state remained accessible till the
queue got released.
a13f35e87140 ("writeback: don't embed root bdi_writeback_congested in
bdi_writeback") changed the situation. Now, the root congested state
which is expected to be pinned while request_queue remains accessible
is separately reference counted and the base ref is put during
bdi_destroy(). This means that the root congested state may go away
prematurely while the queue is between bdi_dstroy() and
blk_cleanup_queue(), which was detected by Andrey's KASAN tests.
The root cause of this problem is that bdi doesn't distinguish the two
steps of destruction, unregistration and release, and now the root
congested state actually requires a separate release step. To fix the
issue, this patch separates out bdi_unregister() and bdi_exit() from
bdi_destroy(). bdi_unregister() is called from blk_cleanup_queue()
and bdi_exit() from blk_release_queue(). bdi_destroy() is now just a
simple wrapper calling the two steps back-to-back.
While at it, the prototype of bdi_destroy() is moved right below
bdi_setup_and_register() so that the counterpart operations are
located together.
Signed-off-by: Tejun Heo <tj@kernel.org>
Fixes: a13f35e87140 ("writeback: don't embed root bdi_writeback_congested in bdi_writeback")
Cc: stable@vger.kernel.org # v4.2+
Reported-and-tested-by: Andrey Konovalov <andreyknvl@google.com>
Link: http://lkml.kernel.org/g/CAAeHK+zUJ74Zn17=rOyxacHU18SgCfC6bsYW=6kCY5GXJBwGfQ@mail.gmail.com
Reviewed-by: Jan Kara <jack@suse.com>
Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
bdi_for_each_wb() is used in several places to wake up or issue
writeback work items to all wb's (bdi_writeback's) on a given bdi.
The iteration is performed by walking bdi->cgwb_tree; however, the
tree only indexes wb's which are currently active.
For example, when a memcg gets associated with a different blkcg, the
old wb is removed from the tree so that the new one can be indexed.
The old wb starts dying from then on but will linger till all its
inodes are drained. As these dying wb's may still host dirty inodes,
writeback operations which affect all wb's must include them.
bdi_for_each_wb() skipping dying wb's led to sync(2) missing and
failing to sync the inodes belonging to those wb's.
This patch adds a RCU protected @bdi->wb_list which lists all wb's
beloinging to that bdi. wb's are added on creation and removed on
release rather than on the start of destruction. bdi_for_each_wb()
usages are replaced with list_for_each[_continue]_rcu() iterations
over @bdi->wb_list and bdi_for_each_wb() and its helpers are removed.
v2: Updated as per Jan. last_wb ref leak in bdi_split_work_to_wbs()
fixed and unnecessary list head severing in cgwb_bdi_destroy()
removed.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-and-tested-by: Artem Bityutskiy <dedekind1@gmail.com>
Fixes: ebe41ab0c79d ("writeback: implement bdi_for_each_wb()")
Link: http://lkml.kernel.org/g/1443012552.19983.209.camel@gmail.com
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Pull blk-cg updates from Jens Axboe:
"A bit later in the cycle, but this has been in the block tree for a a
while. This is basically four patchsets from Tejun, that improve our
buffered cgroup writeback. It was dependent on the other cgroup
changes, but they went in earlier in this cycle.
Series 1 is set of 5 patches that has cgroup writeback updates:
- bdi_writeback iteration fix which could lead to some wb's being
skipped or repeated during e.g. sync under memory pressure.
- Simplification of wb work wait mechanism.
- Writeback tracepoints updated to report cgroup.
Series 2 is is a set of updates for the CFQ cgroup writeback handling:
cfq has always charged all async IOs to the root cgroup. It didn't
have much choice as writeback didn't know about cgroups and there
was no way to tell who to blame for a given writeback IO.
writeback finally grew support for cgroups and now tags each
writeback IO with the appropriate cgroup to charge it against.
This patchset updates cfq so that it follows the blkcg each bio is
tagged with. Async cfq_queues are now shared across cfq_group,
which is per-cgroup, instead of per-request_queue cfq_data. This
makes all IOs follow the weight based IO resource distribution
implemented by cfq.
- Switched from GFP_ATOMIC to GFP_NOWAIT as suggested by Jeff.
- Other misc review points addressed, acks added and rebased.
Series 3 is the blkcg policy cleanup patches:
This patchset contains assorted cleanups for blkcg_policy methods
and blk[c]g_policy_data handling.
- alloc/free added for blkg_policy_data. exit dropped.
- alloc/free added for blkcg_policy_data.
- blk-throttle's async percpu allocation is replaced with direct
allocation.
- all methods now take blk[c]g_policy_data instead of blkcg_gq or
blkcg.
And finally, series 4 is a set of patches cleaning up the blkcg stats
handling:
blkcg's stats have always been somwhat of a mess. This patchset
tries to improve the situation a bit.
- The following patches added to consolidate blkcg entry point and
blkg creation. This is in itself is an improvement and helps
colllecting common stats on bio issue.
- per-blkg stats now accounted on bio issue rather than request
completion so that bio based and request based drivers can behave
the same way. The issue was spotted by Vivek.
- cfq-iosched implements custom recursive stats and blk-throttle
implements custom per-cpu stats. This patchset make blkcg core
support both by default.
- cfq-iosched and blk-throttle keep track of the same stats
multiple times. Unify them"
* 'for-4.3/blkcg' of git://git.kernel.dk/linux-block: (45 commits)
blkcg: use CGROUP_WEIGHT_* scale for io.weight on the unified hierarchy
blkcg: s/CFQ_WEIGHT_*/CFQ_WEIGHT_LEGACY_*/
blkcg: implement interface for the unified hierarchy
blkcg: misc preparations for unified hierarchy interface
blkcg: separate out tg_conf_updated() from tg_set_conf()
blkcg: move body parsing from blkg_conf_prep() to its callers
blkcg: mark existing cftypes as legacy
blkcg: rename subsystem name from blkio to io
blkcg: refine error codes returned during blkcg configuration
blkcg: remove unnecessary NULL checks from __cfqg_set_weight_device()
blkcg: reduce stack usage of blkg_rwstat_recursive_sum()
blkcg: remove cfqg_stats->sectors
blkcg: move io_service_bytes and io_serviced stats into blkcg_gq
blkcg: make blkg_[rw]stat_recursive_sum() to be able to index into blkcg_gq
blkcg: make blkcg_[rw]stat per-cpu
blkcg: add blkg_[rw]stat->aux_cnt and replace cfq_group->dead_stats with it
blkcg: consolidate blkg creation in blkcg_bio_issue_check()
blk-throttle: improve queue bypass handling
blkcg: move root blkg lookup optimization from throtl_lookup_tg() to __blkg_lookup()
blkcg: inline [__]blkg_lookup()
...
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
blkio interface has become messy over time and is currently the
largest. In addition to the inconsistent naming scheme, it has
multiple stat files which report more or less the same thing, a number
of debug stat files which expose internal details which shouldn't have
been part of the public interface in the first place, recursive and
non-recursive stats and leaf and non-leaf knobs.
Both recursive vs. non-recursive and leaf vs. non-leaf distinctions
don't make any sense on the unified hierarchy as only leaf cgroups can
contain processes. cgroups is going through a major interface
revision with the unified hierarchy involving significant fundamental
usage changes and given that a significant portion of the interface
doesn't make sense anymore, it's a good time to reorganize the
interface.
As the first step, this patch renames the external visible subsystem
name from "blkio" to "io". This is more concise, matches the other
two major subsystem names, "cpu" and "memory", and better suited as
blkcg will be involved in anything writeback related too whether an
actual block device is involved or not.
As the subsystem legacy_name is set to "blkio", the only userland
visible change outside the unified hierarchy is that blkcg is reported
as "io" instead of "blkio" in the subsystem initialized message during
boot. On the unified hierarchy, blkcg now appears as "io".
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: cgroups@vger.kernel.org
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|/
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There's a small consistency problem between the inode and writeback
naming. Writeback calls the "for IO" inode queues b_io and
b_more_io, but the inode calls these the "writeback list" or
i_wb_list. This makes it hard to an new "under writeback" list to
the inode, or call it an "under IO" list on the bdi because either
way we'll have writeback on IO and IO on writeback and it'll just be
confusing. I'm getting confused just writing this!
So, rename the inode "for IO" list variable to i_io_list so we can
add a new "writeback list" in a subsequent patch.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Josef Bacik <jbacik@fb.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Tested-by: Dave Chinner <dchinner@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
52ebea749aae ("writeback: make backing_dev_info host cgroup-specific
bdi_writebacks") made bdi (backing_dev_info) host per-cgroup wb's
(bdi_writeback's). As the congested state needs to be per-wb and
referenced from blkcg side and multiple wbs, the patch made all
non-root cong's (bdi_writeback_congested's) reference counted and
indexed on bdi.
When a bdi is destroyed, cgwb_bdi_destroy() tries to drain all
non-root cong's; however, this can hang indefinitely because wb's can
also be referenced from blkcg_gq's which are destroyed after bdi
destruction is complete.
This patch fixes the bug by updating bdi destruction to not wait for
cong's to drain. A cong is unlinked from bdi->cgwb_congested_tree on
bdi destuction regardless of its reference count as the bdi may go
away any point after destruction. wb_congested_put() checks whether
the cong is already unlinked on release.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Jon Christopherson <jon@jons.org>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=100681
Fixes: 52ebea749aae ("writeback: make backing_dev_info host cgroup-specific bdi_writebacks")
Tested-by: Jon Christopherson <jon@jons.org>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
52ebea749aae ("writeback: make backing_dev_info host cgroup-specific
bdi_writebacks") made bdi (backing_dev_info) host per-cgroup wb's
(bdi_writeback's). As the congested state needs to be per-wb and
referenced from blkcg side and multiple wbs, the patch made all
non-root cong's (bdi_writeback_congested's) reference counted and
indexed on bdi.
When a bdi is destroyed, cgwb_bdi_destroy() tries to drain all
non-root cong's; however, this can hang indefinitely because wb's can
also be referenced from blkcg_gq's which are destroyed after bdi
destruction is complete.
To fix the bug, bdi destruction will be updated to not wait for cong's
to drain, which naturally means that cong's may outlive the associated
bdi. This is fine for non-root cong's but is problematic for the root
cong's which are embedded in their bdi's as they may end up getting
dereferenced after the containing bdi's are freed.
This patch makes root cong's behave the same as non-root cong's. They
are no longer embedded in their bdi's but allocated separately during
bdi initialization, indexed and reference counted the same way.
* As cong handling is the same for all wb's, wb->congested
initialization is moved into wb_init().
* When !CONFIG_CGROUP_WRITEBACK, there was no indexing or refcnting.
bdi->wb_congested is now a pointer pointing to the root cong
allocated during bdi init and minimal refcnting operations are
implemented.
* The above makes root wb init paths diverge depending on
CONFIG_CGROUP_WRITEBACK. root wb init is moved to cgwb_bdi_init().
This patch in itself shouldn't cause any consequential behavior
differences but prepares for the actual fix.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Jon Christopherson <jon@jons.org>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=100681
Tested-by: Jon Christopherson <jon@jons.org>
Added <linux/slab.h> include to backing-dev.h for kfree() definition.
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Pull cgroup writeback support from Jens Axboe:
"This is the big pull request for adding cgroup writeback support.
This code has been in development for a long time, and it has been
simmering in for-next for a good chunk of this cycle too. This is one
of those problems that has been talked about for at least half a
decade, finally there's a solution and code to go with it.
Also see last weeks writeup on LWN:
http://lwn.net/Articles/648292/"
* 'for-4.2/writeback' of git://git.kernel.dk/linux-block: (85 commits)
writeback, blkio: add documentation for cgroup writeback support
vfs, writeback: replace FS_CGROUP_WRITEBACK with SB_I_CGROUPWB
writeback: do foreign inode detection iff cgroup writeback is enabled
v9fs: fix error handling in v9fs_session_init()
bdi: fix wrong error return value in cgwb_create()
buffer: remove unusued 'ret' variable
writeback: disassociate inodes from dying bdi_writebacks
writeback: implement foreign cgroup inode bdi_writeback switching
writeback: add lockdep annotation to inode_to_wb()
writeback: use unlocked_inode_to_wb transaction in inode_congested()
writeback: implement unlocked_inode_to_wb transaction and use it for stat updates
writeback: implement [locked_]inode_to_wb_and_lock_list()
writeback: implement foreign cgroup inode detection
writeback: make writeback_control track the inode being written back
writeback: relocate wb[_try]_get(), wb_put(), inode_{attach|detach}_wb()
mm: vmscan: disable memcg direct reclaim stalling if cgroup writeback support is in use
writeback: implement memcg writeback domain based throttling
writeback: reset wb_domain->dirty_limit[_tstmp] when memcg domain size changes
writeback: implement memcg wb_domain
writeback: update wb_over_bg_thresh() to use wb_domain aware operations
...
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
On wb_congested_get_create() failure, cgwb_create() forgot to set @ret
to -ENOMEM ending up returning 0. Fix it so that it returns -ENOMEM.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Currently, majority of cgroup writeback support including all the
above functions are implemented in include/linux/backing-dev.h and
mm/backing-dev.c; however, the portion closely related to writeback
logic implemented in include/linux/writeback.h and mm/page-writeback.c
will expand to support foreign writeback detection and correction.
This patch moves wb[_try]_get() and wb_put() to
include/linux/backing-dev-defs.h so that they can be used from
writeback.h and inode_{attach|detach}_wb() to writeback.h and
page-writeback.c.
This is pure reorganization and doesn't introduce any functional
changes.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Jan Kara <jack@suse.cz>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Greg Thelen <gthelen@google.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Dirtyable memory is distributed to a wb (bdi_writeback) according to
the relative bandwidth the wb is writing out in the whole system.
This distribution is global - each wb is measured against all other
wb's and gets the proportinately sized portion of the memory in the
whole system.
For cgroup writeback, the amount of dirtyable memory is scoped by
memcg and thus each wb would need to be measured and controlled in its
memcg. IOW, a wb will belong to two writeback domains - the global
and memcg domains.
The previous patches laid the groundwork to support the two wb_domains
and this patch implements memcg wb_domain. memcg->cgwb_domain is
initialized on css online and destroyed on css release,
wb->memcg_completions is added, and __wb_writeout_inc() is updated to
increment completions against both global and memcg wb_domains.
The following patches will update balance_dirty_pages() and its
subroutines to actually consider memcg wb_domain for throttling.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Jan Kara <jack@suse.cz>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Greg Thelen <gthelen@google.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
The function name wb_dirty_limit(), its argument @dirty and the local
variable @wb_dirty are mortally confusing given that the function
calculates per-wb threshold value not dirty pages, especially given
that @dirty and @wb_dirty are used elsewhere for dirty pages.
Let's rename the function to wb_calc_thresh() and wb_dirty to
wb_thresh.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Jan Kara <jack@suse.cz>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Greg Thelen <gthelen@google.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
If the completion of a wb_writeback_work can be waited upon by setting
its ->done to a struct completion and waiting on it; however, for
cgroup writeback support, it's necessary to issue multiple work items
to multiple bdi_writebacks and wait for the completion of all.
This patch implements wb_completion which can wait for multiple work
items and replaces the struct completion with it. It can be defined
using DEFINE_WB_COMPLETION_ONSTACK(), used for multiple work items and
waited for by wb_wait_for_completion().
Nobody currently issues multiple work items and this patch doesn't
introduce any behavior changes.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
bdi_has_dirty_io() used to only reflect whether the root wb
(bdi_writeback) has dirty inodes. For cgroup writeback support, it
needs to take all active wb's into account. If any wb on the bdi has
dirty inodes, bdi_has_dirty_io() should return true.
To achieve that, as inode_wb_list_{move|del}_locked() now keep track
of the dirty state transition of each wb, the number of dirty wbs can
be counted in the bdi; however, bdi is already aggregating
wb->avg_write_bandwidth which can easily be guaranteed to be > 0 when
there are any dirty inodes by ensuring wb->avg_write_bandwidth can't
dip below 1. bdi_has_dirty_io() can simply test whether
bdi->tot_write_bandwidth is zero or not.
While this bumps the value of wb->avg_write_bandwidth to one when it
used to be zero, this shouldn't cause any meaningful behavior
difference.
bdi_has_dirty_io() is made an inline function which tests whether
->tot_write_bandwidth is non-zero. Also, WARN_ON_ONCE()'s on its
value are added to inode_wb_list_{move|del}_locked().
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Currently, wb_has_dirty_io() determines whether a wb (bdi_writeback)
has any dirty inode by testing all three IO lists on each invocation
without actively keeping track. For cgroup writeback support, a
single bdi will host multiple wb's each of which will host dirty
inodes separately and we'll need to make bdi_has_dirty_io(), which
currently only represents the root wb, aggregate has_dirty_io from all
member wb's, which requires tracking transitions in has_dirty_io state
on each wb.
This patch introduces inode_wb_list_{move|del}_locked() to consolidate
IO list operations leaving queue_io() the only other function which
directly manipulates IO lists (via move_expired_inodes()). All three
functions are updated to call wb_io_lists_[de]populated() which keep
track of whether the wb has dirty inodes or not and record it using
the new WB_has_dirty_io flag. inode_wb_list_moved_locked()'s return
value indicates whether the wb had no dirty inodes before.
mark_inode_dirty() is restructured so that the return value of
inode_wb_list_move_locked() can be used for deciding whether to wake
up the wb.
While at it, change {bdi|wb}_has_dirty_io()'s return values to bool.
These functions were returning 0 and 1 before. Also, add a comment
explaining the synchronization of wb_state flags.
v2: Updated to accommodate b_dirty_time.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Currently, all congestion functions take bdi (backing_dev_info) and
always operate on the root wb (bdi->wb) and the congestion state from
the block layer is propagated only for the root blkcg. This patch
introduces {set|clear}_wb_congested() and wb_congested() which take a
bdi_writeback_congested and bdi_writeback respectively. The bdi
counteparts are now wrappers invoking the wb based functions on
@bdi->wb.
While converting clear_bdi_congested() to clear_wb_congested(), the
local variable declaration order between @wqh and @bit is swapped for
cosmetic reason.
This patch just adds the new wb based functions. The following
patches will apply them.
v2: Updated for bdi_writeback_congested.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
For the planned cgroup writeback support, on each bdi
(backing_dev_info), each memcg will be served by a separate wb
(bdi_writeback). This patch updates bdi so that a bdi can host
multiple wbs (bdi_writebacks).
On the default hierarchy, blkcg implicitly enables memcg. This allows
using memcg's page ownership for attributing writeback IOs, and every
memcg - blkcg combination can be served by its own wb by assigning a
dedicated wb to each memcg. This means that there may be multiple
wb's of a bdi mapped to the same blkcg. As congested state is per
blkcg - bdi combination, those wb's should share the same congested
state. This is achieved by tracking congested state via
bdi_writeback_congested structs which are keyed by blkcg.
bdi->wb remains unchanged and will keep serving the root cgroup.
cgwb's (cgroup wb's) for non-root cgroups are created on-demand or
looked up while dirtying an inode according to the memcg of the page
being dirtied or current task. Each cgwb is indexed on bdi->cgwb_tree
by its memcg id. Once an inode is associated with its wb, it can be
retrieved using inode_to_wb().
Currently, none of the filesystems has FS_CGROUP_WRITEBACK and all
pages will keep being associated with bdi->wb.
v3: inode_attach_wb() in account_page_dirtied() moved inside
mapping_cap_account_dirty() block where it's known to be !NULL.
Also, an unnecessary NULL check before kfree() removed. Both
detected by the kbuild bot.
v2: Updated so that wb association is per inode and wb is per memcg
rather than blkcg.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: kbuild test robot <fengguang.wu@intel.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Currently, a wb's (bdi_writeback) congestion state is carried in its
->state field; however, cgroup writeback support will require multiple
wb's sharing the same congestion state. This patch separates out
congestion state into its own struct - struct bdi_writeback_congested.
A new field wb field, wb_congested, points to its associated congested
struct. The default wb, bdi->wb, always points to bdi->wb_congested.
While this patch adds a layer of indirection, it doesn't introduce any
behavior changes.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
wb_init() currently always uses GFP_KERNEL but the planned cgroup
writeback support needs using other allocation masks. Add @gfp to
wb_init().
This patch doesn't introduce any behavior changes.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Now that bdi definitions are moved to backing-dev-defs.h,
backing-dev.h can include blkdev.h and inline inode_to_bdi() without
worrying about introducing circular include dependency. The function
gets called from hot paths and fairly trivial.
This patch makes inode_to_bdi() and sb_is_blkdev_sb() that the
function calls inline. blockdev_superblock and noop_backing_dev_info
are EXPORT_GPL'd to allow the inline functions to be used from
modules.
While at it, make sb_is_blkdev_sb() return bool instead of int.
v2: Fixed typo in description as suggested by Jan.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Move wb_shutdown(), bdi_register(), bdi_register_dev(),
bdi_prune_sb(), bdi_remove_from_list() and bdi_unregister() so that
init / exit functions are grouped together. This will make updating
init / exit paths for cgroup writeback support easier.
This is pure source file reorganization.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Currently, a bdi (backing_dev_info) embeds single wb (bdi_writeback)
and the role of the separation is unclear. For cgroup support for
writeback IOs, a bdi will be updated to host multiple wb's where each
wb serves writeback IOs of a different cgroup on the bdi. To achieve
that, a wb should carry all states necessary for servicing writeback
IOs for a cgroup independently.
This patch moves bdi->wb_lock and ->worklist into wb.
* The lock protects bdi->worklist and bdi->wb.dwork scheduling. While
moving, rename it to wb->work_lock as wb->wb_lock is confusing.
Also, move wb->dwork downwards so that it's colocated with the new
->work_lock and ->work_list fields.
* bdi_writeback_workfn() -> wb_workfn()
bdi_wakeup_thread_delayed(bdi) -> wb_wakeup_delayed(wb)
bdi_wakeup_thread(bdi) -> wb_wakeup(wb)
bdi_queue_work(bdi, ...) -> wb_queue_work(wb, ...)
__bdi_start_writeback(bdi, ...) -> __wb_start_writeback(wb, ...)
get_next_work_item(bdi) -> get_next_work_item(wb)
* bdi_wb_shutdown() is renamed to wb_shutdown() and now takes @wb.
The function contained parts which belong to the containing bdi
rather than the wb itself - testing cap_writeback_dirty and
bdi_remove_from_list() invocation. Those are moved to
bdi_unregister().
* bdi_wb_{init|exit}() are renamed to wb_{init|exit}().
Initializations of the moved bdi->wb_lock and ->work_list are
relocated from bdi_init() to wb_init().
* As there's still only one bdi_writeback per backing_dev_info, all
uses of bdi->state are mechanically replaced with bdi->wb.state
introducing no behavior changes.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
bdi_writeback
Currently, a bdi (backing_dev_info) embeds single wb (bdi_writeback)
and the role of the separation is unclear. For cgroup support for
writeback IOs, a bdi will be updated to host multiple wb's where each
wb serves writeback IOs of a different cgroup on the bdi. To achieve
that, a wb should carry all states necessary for servicing writeback
IOs for a cgroup independently.
This patch moves bandwidth related fields from backing_dev_info into
bdi_writeback.
* The moved fields are: bw_time_stamp, dirtied_stamp, written_stamp,
write_bandwidth, avg_write_bandwidth, dirty_ratelimit,
balanced_dirty_ratelimit, completions and dirty_exceeded.
* writeback_chunk_size() and over_bground_thresh() now take @wb
instead of @bdi.
* bdi_writeout_fraction(bdi, ...) -> wb_writeout_fraction(wb, ...)
bdi_dirty_limit(bdi, ...) -> wb_dirty_limit(wb, ...)
bdi_position_ration(bdi, ...) -> wb_position_ratio(wb, ...)
bdi_update_writebandwidth(bdi, ...) -> wb_update_write_bandwidth(wb, ...)
[__]bdi_update_bandwidth(bdi, ...) -> [__]wb_update_bandwidth(wb, ...)
bdi_{max|min}_pause(bdi, ...) -> wb_{max|min}_pause(wb, ...)
bdi_dirty_limits(bdi, ...) -> wb_dirty_limits(wb, ...)
* Init/exits of the relocated fields are moved to bdi_wb_init/exit()
respectively. Note that explicit zeroing is dropped in the process
as wb's are cleared in entirety anyway.
* As there's still only one bdi_writeback per backing_dev_info, all
uses of bdi->stat[] are mechanically replaced with bdi->wb.stat[]
introducing no behavior changes.
v2: Typo in description fixed as suggested by Jan.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Currently, a bdi (backing_dev_info) embeds single wb (bdi_writeback)
and the role of the separation is unclear. For cgroup support for
writeback IOs, a bdi will be updated to host multiple wb's where each
wb serves writeback IOs of a different cgroup on the bdi. To achieve
that, a wb should carry all states necessary for servicing writeback
IOs for a cgroup independently.
This patch moves bdi->bdi_stat[] into wb.
* enum bdi_stat_item is renamed to wb_stat_item and the prefix of all
enums is changed from BDI_ to WB_.
* BDI_STAT_BATCH() -> WB_STAT_BATCH()
* [__]{add|inc|dec|sum}_wb_stat(bdi, ...) -> [__]{add|inc}_wb_stat(wb, ...)
* bdi_stat[_error]() -> wb_stat[_error]()
* bdi_writeout_inc() -> wb_writeout_inc()
* stat init is moved to bdi_wb_init() and bdi_wb_exit() is added and
frees stat.
* As there's still only one bdi_writeback per backing_dev_info, all
uses of bdi->stat[] are mechanically replaced with bdi->wb.stat[]
introducing no behavior changes.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Currently, a bdi (backing_dev_info) embeds single wb (bdi_writeback)
and the role of the separation is unclear. For cgroup support for
writeback IOs, a bdi will be updated to host multiple wb's where each
wb serves writeback IOs of a different cgroup on the bdi. To achieve
that, a wb should carry all states necessary for servicing writeback
IOs for a cgroup independently.
This patch moves bdi->state into wb.
* enum bdi_state is renamed to wb_state and the prefix of all enums is
changed from BDI_ to WB_.
* Explicit zeroing of bdi->state is removed without adding zeoring of
wb->state as the whole data structure is zeroed on init anyway.
* As there's still only one bdi_writeback per backing_dev_info, all
uses of bdi->state are mechanically replaced with bdi->wb.state
introducing no behavior changes.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: drbd-dev@lists.linbit.com
Cc: Neil Brown <neilb@suse.de>
Cc: Alasdair Kergon <agk@redhat.com>
Cc: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|/
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
bdi_unregister() now contains very little functionality.
It contains a "WARN_ON" if bdi->dev is NULL. This warning is of no
real consequence as bdi->dev isn't needed by anything else in the function,
and it triggers if
blk_cleanup_queue() -> bdi_destroy()
is called before bdi_unregister, which happens since
Commit: 6cd18e711dd8 ("block: destroy bdi before blockdev is unregistered.")
So this isn't wanted.
It also calls bdi_set_min_ratio(). This needs to be called after
writes through the bdi have all been flushed, and before the bdi is destroyed.
Calling it early is better than calling it late as it frees up a global
resource.
Calling it immediately after bdi_wb_shutdown() in bdi_destroy()
perfectly fits these requirements.
So bdi_unregister() can be discarded with the important content moved to
bdi_destroy(), as can the
writeback_bdi_unregister
event which is already not used.
Reported-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org (v4.0)
Fixes: c4db59d31e39 ("fs: don't reassign dirty inodes to default_backing_dev_info")
Fixes: 6cd18e711dd8 ("block: destroy bdi before blockdev is unregistered.")
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Dan Williams <dan.j.williams@intel.com>
Tested-by: Nicholas Moulin <nicholas.w.moulin@linux.intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull lazytime mount option support from Al Viro:
"Lazytime stuff from tytso"
* 'lazytime' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
ext4: add optimization for the lazytime mount option
vfs: add find_inode_nowait() function
vfs: add support for a lazytime mount option
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Add a new mount option which enables a new "lazytime" mode. This mode
causes atime, mtime, and ctime updates to only be made to the
in-memory version of the inode. The on-disk times will only get
updated when (a) if the inode needs to be updated for some non-time
related change, (b) if userspace calls fsync(), syncfs() or sync(), or
(c) just before an undeleted inode is evicted from memory.
This is OK according to POSIX because there are no guarantees after a
crash unless userspace explicitly requests via a fsync(2) call.
For workloads which feature a large number of random write to a
preallocated file, the lazytime mount option significantly reduces
writes to the inode table. The repeated 4k writes to a single block
will result in undesirable stress on flash devices and SMR disk
drives. Even on conventional HDD's, the repeated writes to the inode
table block will trigger Adjacent Track Interference (ATI) remediation
latencies, which very negatively impact long tail latencies --- which
is a very big deal for web serving tiers (for example).
Google-Bug-Id: 18297052
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Now that default_backing_dev_info is not used for writeback purposes we can
git rid of it easily:
- instead of using it's name for tracing unregistered bdi we just use
"unknown"
- btrfs and ceph can just assign the default read ahead window themselves
like several other filesystems already do.
- we can assign noop_backing_dev_info as the default one in alloc_super.
All filesystems already either assigned their own or
noop_backing_dev_info.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
If we have dirty inodes we need to call the filesystem for it, even if the
device has been removed and the filesystem will error out early. The
current code does that by reassining all dirty inodes to the default
backing_dev_info when a bdi is unlinked, but that's pretty pointless given
that the bdi must always outlive the super block.
Instead of stopping writeback at unregister time and moving inodes to the
default bdi just keep the current bdi alive until it is destroyed. The
containing objects of the bdi ensure this doesn't happen until all
writeback has finished by erroring out.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Killed the redundant WARN_ON(), as noticed by Jan.
Signed-off-by: Jens Axboe <axboe@fb.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Now that we never use the backing_dev_info pointer in struct address_space
we can simply remove it and save 4 to 8 bytes in every inode.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Reviewed-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|/
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Since "BDI: Provide backing device capability information [try #3]" the
backing_dev_info structure also provides flags for the kind of mmap
operation available in a nommu environment, which is entirely unrelated
to it's original purpose.
Introduce a new nommu-only file operation to provide this information to
the nommu mmap code instead. Splitting this from the backing_dev_info
structure allows to remove lots of backing_dev_info instance that aren't
otherwise needed, and entirely gets rid of the concept of providing a
backing_dev_info for a character device. It also removes the need for
the mtd_inodefs filesystem.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Tejun Heo <tj@kernel.org>
Acked-by: Brian Norris <computersforpeace@gmail.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Pull core block layer changes from Jens Axboe:
"This is the core block IO pull request for 3.18. Apart from the new
and improved flush machinery for blk-mq, this is all mostly bug fixes
and cleanups.
- blk-mq timeout updates and fixes from Christoph.
- Removal of REQ_END, also from Christoph. We pass it through the
->queue_rq() hook for blk-mq instead, freeing up one of the request
bits. The space was overly tight on 32-bit, so Martin also killed
REQ_KERNEL since it's no longer used.
- blk integrity updates and fixes from Martin and Gu Zheng.
- Update to the flush machinery for blk-mq from Ming Lei. Now we
have a per hardware context flush request, which both cleans up the
code should scale better for flush intensive workloads on blk-mq.
- Improve the error printing, from Rob Elliott.
- Backing device improvements and cleanups from Tejun.
- Fixup of a misplaced rq_complete() tracepoint from Hannes.
- Make blk_get_request() return error pointers, fixing up issues
where we NULL deref when a device goes bad or missing. From Joe
Lawrence.
- Prep work for drastically reducing the memory consumption of dm
devices from Junichi Nomura. This allows creating clone bio sets
without preallocating a lot of memory.
- Fix a blk-mq hang on certain combinations of queue depths and
hardware queues from me.
- Limit memory consumption for blk-mq devices for crash dump
scenarios and drivers that use crazy high depths (certain SCSI
shared tag setups). We now just use a single queue and limited
depth for that"
* 'for-3.18/core' of git://git.kernel.dk/linux-block: (58 commits)
block: Remove REQ_KERNEL
blk-mq: allocate cpumask on the home node
bio-integrity: remove the needless fail handle of bip_slab creating
block: include func name in __get_request prints
block: make blk_update_request print prefix match ratelimited prefix
blk-merge: don't compute bi_phys_segments from bi_vcnt for cloned bio
block: fix alignment_offset math that assumes io_min is a power-of-2
blk-mq: Make bt_clear_tag() easier to read
blk-mq: fix potential hang if rolling wakeup depth is too high
block: add bioset_create_nobvec()
block: use bio_clone_fast() in blk_rq_prep_clone()
block: misplaced rq_complete tracepoint
sd: Honor block layer integrity handling flags
block: Replace strnicmp with strncasecmp
block: Add T10 Protection Information functions
block: Don't merge requests if integrity flags differ
block: Integrity checksum flag
block: Relocate bio integrity flags
block: Add a disk flag to block integrity profile
block: Add prefix to block integrity profile flags
...
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
A block_device may be attached to different gendisks and thus
different bdis over time. bdev_inode_switch_bdi() is used to switch
the associated bdi. The function assumes that the inode could be
dirty and transfers it between bdis if so. This is a bit nasty in
that it reaches into bdi internals.
This patch reimplements the function so that it writes out the inode
if dirty. This is a lot simpler and can be implemented without
exposing bdi internals.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
bdi_destroy() has code to transfer the remaining dirty inodes to the
default_backing_dev_info; however, given the shutdown sequence, it
isn't clear how such condition would happen. Also, it isn't a full
solution as the transferred inodes stlil point to the bdi which is
being destroyed. Operations on those inodes can end up accessing
already released fields such as the percpu stat fields.
Digging through the history, it seems that the code was added as a
quick workaround for a bug report without fully root-causing the
issue. We probably want to remove the code in time but for now let's
add a comment noting that it is a quick workaround.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@fb.com>
|