summaryrefslogtreecommitdiff
path: root/net/ipv4/tcp_output.c (follow)
Commit message (Collapse)AuthorAge
* Merge 4.4.189 into android-4.4Greg Kroah-Hartman2019-08-11
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Changes in 4.4.189 arm64: cpufeature: Fix CTR_EL0 field definitions arm64: cpufeature: Fix feature comparison for CTR_EL0.{CWG,ERG} netfilter: nfnetlink_acct: validate NFACCT_QUOTA parameter HID: Add quirk for HP X1200 PIXART OEM mouse tcp: be more careful in tcp_fragment() atm: iphase: Fix Spectre v1 vulnerability net: bridge: delete local fdb on device init failure net: fix ifindex collision during namespace removal tipc: compat: allow tipc commands without arguments net: sched: Fix a possible null-pointer dereference in dequeue_func() net/mlx5: Use reversed order when unregister devices bnx2x: Disable multi-cos feature. compat_ioctl: pppoe: fix PPPOEIOCSFWD handling block: blk_init_allocated_queue() set q->fq as NULL in the fail case spi: bcm2835: Fix 3-wire mode if DMA is enabled x86: cpufeatures: Sort feature word 7 x86/entry/64: Fix context tracking state warning when load_gs_index fails x86/speculation: Prepare entry code for Spectre v1 swapgs mitigations x86/speculation: Enable Spectre v1 swapgs mitigations x86/entry/64: Use JMP instead of JMPQ x86/speculation/swapgs: Exclude ATOMs from speculation through SWAPGS Linux 4.4.189 Change-Id: I3d4e7965c8f5547ab025236686ea0d60e0b6e1f4 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
| * tcp: be more careful in tcp_fragment()Eric Dumazet2019-08-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit b617158dc096709d8600c53b6052144d12b89fab ] Some applications set tiny SO_SNDBUF values and expect TCP to just work. Recent patches to address CVE-2019-11478 broke them in case of losses, since retransmits might be prevented. We should allow these flows to make progress. This patch allows the first and last skb in retransmit queue to be split even if memory limits are hit. It also adds the some room due to the fact that tcp_sendmsg() and tcp_sendpage() might overshoot sk_wmem_queued by about one full TSO skb (64KB size). Note this allowance was already present in stable backports for kernels < 4.15 Note for < 4.15 backports : tcp_rtx_queue_tail() will probably look like : static inline struct sk_buff *tcp_rtx_queue_tail(const struct sock *sk) { struct sk_buff *skb = tcp_send_head(sk); return skb ? tcp_write_queue_prev(sk, skb) : tcp_write_queue_tail(sk); } Fixes: f070ef2ac667 ("tcp: tcp_fragment() should apply sane memory limits") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Andrew Prout <aprout@ll.mit.edu> Tested-by: Andrew Prout <aprout@ll.mit.edu> Tested-by: Jonathan Lemon <jonathan.lemon@gmail.com> Tested-by: Michal Kubecek <mkubecek@suse.cz> Acked-by: Neal Cardwell <ncardwell@google.com> Acked-by: Yuchung Cheng <ycheng@google.com> Acked-by: Christoph Paasch <cpaasch@apple.com> Cc: Jonathan Looney <jtl@netflix.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
* | Merge 4.4.184 into android-4.4Greg Kroah-Hartman2019-06-28
|\| | | | | | | | | | | | | | | | | Changes in 4.4.184 tcp: refine memory limit test in tcp_fragment() Linux 4.4.184 Change-Id: I7119c826708041464de37eaec1d6c5a344be8124 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
| * tcp: refine memory limit test in tcp_fragment()Eric Dumazet2019-06-27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit b6653b3629e5b88202be3c9abc44713973f5c4b4 upstream. tcp_fragment() might be called for skbs in the write queue. Memory limits might have been exceeded because tcp_sendmsg() only checks limits at full skb (64KB) boundaries. Therefore, we need to make sure tcp_fragment() wont punish applications that might have setup very low SO_SNDBUF values. Fixes: f070ef2ac667 ("tcp: tcp_fragment() should apply sane memory limits") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Christoph Paasch <cpaasch@apple.com> Tested-by: Christoph Paasch <cpaasch@apple.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* | Merge 4.4.182 into android-4.4Greg Kroah-Hartman2019-06-17
|\| | | | | | | | | | | | | | | | | | | | | | | Changes in 4.4.182 tcp: limit payload size of sacked skbs tcp: tcp_fragment() should apply sane memory limits tcp: add tcp_min_snd_mss sysctl tcp: enforce tcp_min_snd_mss in tcp_mtu_probing() Linux 4.4.182 Change-Id: I2698fa2ad3884a20d936e6bee638304f52df19bd Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
| * tcp: add tcp_min_snd_mss sysctlEric Dumazet2019-06-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 5f3e2bf008c2221478101ee72f5cb4654b9fc363 upstream. Some TCP peers announce a very small MSS option in their SYN and/or SYN/ACK messages. This forces the stack to send packets with a very high network/cpu overhead. Linux has enforced a minimal value of 48. Since this value includes the size of TCP options, and that the options can consume up to 40 bytes, this means that each segment can include only 8 bytes of payload. In some cases, it can be useful to increase the minimal value to a saner value. We still let the default to 48 (TCP_MIN_SND_MSS), for compatibility reasons. Note that TCP_MAXSEG socket option enforces a minimal value of (TCP_MIN_MSS). David Miller increased this minimal value in commit c39508d6f118 ("tcp: Make TCP_MAXSEG minimum more correct.") from 64 to 88. We might in the future merge TCP_MIN_SND_MSS and TCP_MIN_MSS. CVE-2019-11479 -- tcp mss hardcoded to 48 Signed-off-by: Eric Dumazet <edumazet@google.com> Suggested-by: Jonathan Looney <jtl@netflix.com> Acked-by: Neal Cardwell <ncardwell@google.com> Cc: Yuchung Cheng <ycheng@google.com> Cc: Tyler Hicks <tyhicks@canonical.com> Cc: Bruce Curtis <brucec@netflix.com> Cc: Jonathan Lemon <jonathan.lemon@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * tcp: tcp_fragment() should apply sane memory limitsEric Dumazet2019-06-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit f070ef2ac66716357066b683fb0baf55f8191a2e upstream. Jonathan Looney reported that a malicious peer can force a sender to fragment its retransmit queue into tiny skbs, inflating memory usage and/or overflow 32bit counters. TCP allows an application to queue up to sk_sndbuf bytes, so we need to give some allowance for non malicious splitting of retransmit queue. A new SNMP counter is added to monitor how many times TCP did not allow to split an skb if the allowance was exceeded. Note that this counter might increase in the case applications use SO_SNDBUF socket option to lower sk_sndbuf. CVE-2019-11478 : tcp_fragment, prevent fragmenting a packet when the socket is already using more than half the allowed space Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Jonathan Looney <jtl@netflix.com> Acked-by: Neal Cardwell <ncardwell@google.com> Acked-by: Yuchung Cheng <ycheng@google.com> Reviewed-by: Tyler Hicks <tyhicks@canonical.com> Cc: Bruce Curtis <brucec@netflix.com> Cc: Jonathan Lemon <jonathan.lemon@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * tcp: limit payload size of sacked skbsEric Dumazet2019-06-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 3b4929f65b0d8249f19a50245cd88ed1a2f78cff upstream. Jonathan Looney reported that TCP can trigger the following crash in tcp_shifted_skb() : BUG_ON(tcp_skb_pcount(skb) < pcount); This can happen if the remote peer has advertized the smallest MSS that linux TCP accepts : 48 An skb can hold 17 fragments, and each fragment can hold 32KB on x86, or 64KB on PowerPC. This means that the 16bit witdh of TCP_SKB_CB(skb)->tcp_gso_segs can overflow. Note that tcp_sendmsg() builds skbs with less than 64KB of payload, so this problem needs SACK to be enabled. SACK blocks allow TCP to coalesce multiple skbs in the retransmit queue, thus filling the 17 fragments to maximal capacity. CVE-2019-11477 -- u16 overflow of TCP_SKB_CB(skb)->tcp_gso_segs Backport notes, provided by Joao Martins <joao.m.martins@oracle.com> v4.15 or since commit 737ff314563 ("tcp: use sequence distance to detect reordering") had switched from the packet-based FACK tracking and switched to sequence-based. v4.14 and older still have the old logic and hence on tcp_skb_shift_data() needs to retain its original logic and have @fack_count in sync. In other words, we keep the increment of pcount with tcp_skb_pcount(skb) to later used that to update fack_count. To make it more explicit we track the new skb that gets incremented to pcount in @next_pcount, and we get to avoid the constant invocation of tcp_skb_pcount(skb) all together. Fixes: 832d11c5cd07 ("tcp: Try to restore large SKBs while SACK processing") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Jonathan Looney <jtl@netflix.com> Acked-by: Neal Cardwell <ncardwell@google.com> Reviewed-by: Tyler Hicks <tyhicks@canonical.com> Cc: Yuchung Cheng <ycheng@google.com> Cc: Bruce Curtis <brucec@netflix.com> Cc: Jonathan Lemon <jonathan.lemon@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* | Merge 4.4.168 into android-4.4Greg Kroah-Hartman2018-12-19
|\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Changes in 4.4.168 ipv6: Check available headroom in ip6_xmit() even without options net: 8139cp: fix a BUG triggered by changing mtu with network traffic net: phy: don't allow __set_phy_supported to add unsupported modes net: Prevent invalid access to skb->prev in __qdisc_drop_all rtnetlink: ndo_dflt_fdb_dump() only work for ARPHRD_ETHER devices tcp: fix NULL ref in tail loss probe tun: forbid iface creation with rtnl ops neighbour: Avoid writing before skb->head in neigh_hh_output() ARM: OMAP2+: prm44xx: Fix section annotation on omap44xx_prm_enable_io_wakeup ARM: OMAP1: ams-delta: Fix possible use of uninitialized field sysv: return 'err' instead of 0 in __sysv_write_inode s390/cpum_cf: Reject request for sampling in event initialization hwmon: (ina2xx) Fix current value calculation ASoC: dapm: Recalculate audio map forcely when card instantiated hwmon: (w83795) temp4_type has writable permission Btrfs: send, fix infinite loop due to directory rename dependencies ASoC: omap-mcpdm: Add pm_qos handling to avoid under/overruns with CPU_IDLE ASoC: omap-dmic: Add pm_qos handling to avoid overruns with CPU_IDLE exportfs: do not read dentry after free bpf: fix check of allowed specifiers in bpf_trace_printk USB: omap_udc: use devm_request_irq() USB: omap_udc: fix crashes on probe error and module removal USB: omap_udc: fix omap_udc_start() on 15xx machines USB: omap_udc: fix USB gadget functionality on Palm Tungsten E KVM: x86: fix empty-body warnings net: thunderx: fix NULL pointer dereference in nic_remove ixgbe: recognize 1000BaseLX SFP modules as 1Gbps net: hisilicon: remove unexpected free_netdev drm/ast: fixed reading monitor EDID not stable issue xen: xlate_mmu: add missing header to fix 'W=1' warning fscache: fix race between enablement and dropping of object fscache, cachefiles: remove redundant variable 'cache' ocfs2: fix deadlock caused by ocfs2_defrag_extent() hfs: do not free node before using hfsplus: do not free node before using debugobjects: avoid recursive calls with kmemleak ocfs2: fix potential use after free pstore: Convert console write to use ->write_buf ALSA: pcm: remove SNDRV_PCM_IOCTL1_INFO internal command KVM: nVMX: fix msr bitmaps to prevent L2 from accessing L0 x2APIC KVM: nVMX: mark vmcs12 pages dirty on L2 exit KVM: nVMX: Eliminate vmcs02 pool KVM: VMX: introduce alloc_loaded_vmcs KVM: VMX: make MSR bitmaps per-VCPU KVM/x86: Add IBPB support KVM/VMX: Emulate MSR_IA32_ARCH_CAPABILITIES KVM/VMX: Allow direct access to MSR_IA32_SPEC_CTRL KVM/SVM: Allow direct access to MSR_IA32_SPEC_CTRL KVM/x86: Remove indirect MSR op calls from SPEC_CTRL x86: reorganize SMAP handling in user space accesses x86: fix SMAP in 32-bit environments x86: Introduce __uaccess_begin_nospec() and uaccess_try_nospec x86/usercopy: Replace open coded stac/clac with __uaccess_{begin, end} x86/uaccess: Use __uaccess_begin_nospec() and uaccess_try_nospec x86/bugs, KVM: Support the combination of guest and host IBRS x86/KVM/VMX: Expose SPEC_CTRL Bit(2) to the guest KVM: SVM: Move spec control call after restore of GS x86/bugs, KVM: Extend speculation control for VIRT_SPEC_CTRL x86/speculation: Use synthetic bits for IBRS/IBPB/STIBP KVM: SVM: Implement VIRT_SPEC_CTRL support for SSBD bpf: support 8-byte metafield access bpf/verifier: Add spi variable to check_stack_write() bpf/verifier: Pass instruction index to check_mem_access() and check_xadd() bpf: Prevent memory disambiguation attack wil6210: missing length check in wmi_set_ie posix-timers: Sanitize overrun handling mm/hugetlb.c: don't call region_abort if region_chg fails hugetlbfs: fix offset overflow in hugetlbfs mmap hugetlbfs: check for pgoff value overflow hugetlbfs: fix bug in pgoff overflow checking swiotlb: clean up reporting sr: pass down correctly sized SCSI sense buffer mm: remove write/force parameters from __get_user_pages_locked() mm: remove write/force parameters from __get_user_pages_unlocked() mm/nommu.c: Switch __get_user_pages_unlocked() to use __get_user_pages() mm: replace get_user_pages_unlocked() write/force parameters with gup_flags mm: replace get_user_pages_locked() write/force parameters with gup_flags mm: replace get_vaddr_frames() write/force parameters with gup_flags mm: replace get_user_pages() write/force parameters with gup_flags mm: replace __access_remote_vm() write parameter with gup_flags mm: replace access_remote_vm() write parameter with gup_flags proc: don't use FOLL_FORCE for reading cmdline and environment proc: do not access cmdline nor environ from file-backed areas media: dvb-frontends: fix i2c access helpers for KASAN matroxfb: fix size of memcpy staging: speakup: Replace strncpy with memcpy rocker: fix rocker_tlv_put_* functions for KASAN selftests: Move networking/timestamping from Documentation Linux 4.4.168 Change-Id: I71a633f645fada4b473abcff660a9ada3103592b Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
| * tcp: fix NULL ref in tail loss probeYuchung Cheng2018-12-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit b2b7af861122a0c0f6260155c29a1b2e594cd5b5 ] TCP loss probe timer may fire when the retranmission queue is empty but has a non-zero tp->packets_out counter. tcp_send_loss_probe will call tcp_rearm_rto which triggers NULL pointer reference by fetching the retranmission queue head in its sub-routines. Add a more detailed warning to help catch the root cause of the inflight accounting inconsistency. Reported-by: Rafael Tinoco <rafael.tinoco@linaro.org> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* | Merge 4.4.152 into android-4.4Greg Kroah-Hartman2018-08-24
|\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Changes in 4.4.152 ARC: Explicitly add -mmedium-calls to CFLAGS netfilter: ipv6: nf_defrag: reduce struct net memory waste selftests: pstore: return Kselftest Skip code for skipped tests selftests: static_keys: return Kselftest Skip code for skipped tests selftests: user: return Kselftest Skip code for skipped tests selftests: zram: return Kselftest Skip code for skipped tests selftests: sync: add config fragment for testing sync framework ARM: dts: Cygnus: Fix I2C controller interrupt type usb: dwc2: fix isoc split in transfer with no data usb: gadget: composite: fix delayed_status race condition when set_interface usb: gadget: dwc2: fix memory leak in gadget_init() scsi: xen-scsifront: add error handling for xenbus_printf arm64: make secondary_start_kernel() notrace qed: Add sanity check for SIMD fastpath handler. enic: initialize enic->rfs_h.lock in enic_probe net: hamradio: use eth_broadcast_addr net: propagate dev_get_valid_name return code ARC: Enable machine_desc->init_per_cpu for !CONFIG_SMP net: davinci_emac: match the mdio device against its compatible if possible locking/lockdep: Do not record IRQ state within lockdep code ipv6: mcast: fix unsolicited report interval after receiving querys Smack: Mark inode instant in smack_task_to_inode cxgb4: when disabling dcb set txq dcb priority to 0 brcmfmac: stop watchdog before detach and free everything ARM: dts: am437x: make edt-ft5x06 a wakeup source usb: xhci: increase CRS timeout value perf test session topology: Fix test on s390 perf report powerpc: Fix crash if callchain is empty selftests/x86/sigreturn/64: Fix spurious failures on AMD CPUs ARM: dts: da850: Fix interrups property for gpio dmaengine: k3dma: Off by one in k3_of_dma_simple_xlate() md/raid10: fix that replacement cannot complete recovery after reassemble drm/exynos: gsc: Fix support for NV16/61, YUV420/YVU420 and YUV422 modes drm/exynos: decon5433: Fix per-plane global alpha for XRGB modes drm/exynos: decon5433: Fix WINCONx reset value bnx2x: Fix receiving tx-timeout in error or recovery state. m68k: fix "bad page state" oops on ColdFire boot HID: wacom: Correct touch maximum XY of 2nd-gen Intuos ARM: imx_v6_v7_defconfig: Select ULPI support ARM: imx_v4_v5_defconfig: Select ULPI support tracing: Use __printf markup to silence compiler kasan: fix shadow_size calculation error in kasan_module_alloc smsc75xx: Add workaround for gigabit link up hardware errata. netfilter: x_tables: set module owner for icmp(6) matches ARM: pxa: irq: fix handling of ICMR registers in suspend/resume ieee802154: at86rf230: switch from BUG_ON() to WARN_ON() on problem ieee802154: at86rf230: use __func__ macro for debug messages ieee802154: fakelb: switch from BUG_ON() to WARN_ON() on problem drm/armada: fix colorkey mode property bnxt_en: Fix for system hang if request_irq fails perf llvm-utils: Remove bashism from kernel include fetch script ARM: 8780/1: ftrace: Only set kernel memory back to read-only after boot ARM: dts: am3517.dtsi: Disable reference to OMAP3 OTG controller ixgbe: Be more careful when modifying MAC filters packet: reset network header if packet shorter than ll reserved space qlogic: check kstrtoul() for errors tcp: remove DELAYED ACK events in DCTCP drm/nouveau/gem: off by one bugs in nouveau_gem_pushbuf_reloc_apply() net/ethernet/freescale/fman: fix cross-build error net: usb: rtl8150: demote allmulti message to dev_dbg() net: qca_spi: Avoid packet drop during initial sync net: qca_spi: Make sure the QCA7000 reset is triggered net: qca_spi: Fix log level if probe fails tcp: identify cryptic messages as TCP seq # bugs staging: android: ion: check for kref overflow KVM: irqfd: fix race between EPOLLHUP and irq_bypass_register_consumer ext4: fix spectre gadget in ext4_mb_regular_allocator() parisc: Remove ordered stores from syscall.S xfrm_user: prevent leaking 2 bytes of kernel memory netfilter: conntrack: dccp: treat SYNC/SYNCACK as invalid if no prior state packet: refine ring v3 block size test to hold one frame bridge: Propagate vlan add failure to user parisc: Remove unnecessary barriers from spinlock.h PCI: hotplug: Don't leak pci_slot on registration failure PCI: Skip MPS logic for Virtual Functions (VFs) PCI: pciehp: Fix use-after-free on unplug i2c: imx: Fix race condition in dma read reiserfs: fix broken xattr handling (heap corruption, bad retval) Linux 4.4.152 Change-Id: I1058813031709d20abd0bc45e9ac5fc68ab3a1d7 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
| * tcp: remove DELAYED ACK events in DCTCPYuchung Cheng2018-08-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit a69258f7aa2623e0930212f09c586fd06674ad79 ] After fixing the way DCTCP tracking delayed ACKs, the delayed-ACK related callbacks are no longer needed Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Acked-by: Lawrence Brakmo <brakmo@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* | Merge 4.4.145 into android-4.4Greg Kroah-Hartman2018-07-31
|\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Changes in 4.4.145 MIPS: ath79: fix register address in ath79_ddr_wb_flush() ip: hash fragments consistently net/mlx4_core: Save the qpn from the input modifier in RST2INIT wrapper rtnetlink: add rtnl_link_state check in rtnl_configure_link tcp: fix dctcp delayed ACK schedule tcp: helpers to send special DCTCP ack tcp: do not cancel delay-AcK on DCTCP special ACK tcp: do not delay ACK in DCTCP upon CE status change tcp: avoid collapses in tcp_prune_queue() if possible tcp: detect malicious patterns in tcp_collapse_ofo_queue() ip: in cmsg IP(V6)_ORIGDSTADDR call pskb_may_pull usb: cdc_acm: Add quirk for Castles VEGA3000 usb: core: handle hub C_PORT_OVER_CURRENT condition usb: gadget: f_fs: Only return delayed status when len is 0 driver core: Partially revert "driver core: correct device's shutdown order" can: xilinx_can: fix RX loop if RXNEMP is asserted without RXOK can: xilinx_can: fix recovery from error states not being propagated can: xilinx_can: fix device dropping off bus on RX overrun can: xilinx_can: keep only 1-2 frames in TX FIFO to fix TX accounting can: xilinx_can: fix incorrect clear of non-processed interrupts can: xilinx_can: fix RX overflow interrupt not being enabled turn off -Wattribute-alias ARM: fix put_user() for gcc-8 Linux 4.4.145 Change-Id: I449c110f7f186f2c72c9cc45e00a8deda0d54e40 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
| * tcp: do not cancel delay-AcK on DCTCP special ACKYuchung Cheng2018-07-28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 27cde44a259c380a3c09066fc4b42de7dde9b1ad ] Currently when a DCTCP receiver delays an ACK and receive a data packet with a different CE mark from the previous one's, it sends two immediate ACKs acking previous and latest sequences respectly (for ECN accounting). Previously sending the first ACK may mark off the delayed ACK timer (tcp_event_ack_sent). This may subsequently prevent sending the second ACK to acknowledge the latest sequence (tcp_ack_snd_check). The culprit is that tcp_send_ack() assumes it always acknowleges the latest sequence, which is not true for the first special ACK. The fix is to not make the assumption in tcp_send_ack and check the actual ack sequence before cancelling the delayed ACK. Further it's safer to pass the ack sequence number as a local variable into tcp_send_ack routine, instead of intercepting tp->rcv_nxt to avoid future bugs like this. Reported-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * tcp: helpers to send special DCTCP ackYuchung Cheng2018-07-28
| | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 2987babb6982306509380fc11b450227a844493b ] Refactor and create helpers to send the special ACK in DCTCP. Signed-off-by: Yuchung Cheng <ycheng@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* | Merge 4.4.133 into android-4.4Greg Kroah-Hartman2018-05-26
|\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Changes in 4.4.133 8139too: Use disable_irq_nosync() in rtl8139_poll_controller() bridge: check iface upper dev when setting master via ioctl dccp: fix tasklet usage ipv4: fix memory leaks in udp_sendmsg, ping_v4_sendmsg llc: better deal with too small mtu net: ethernet: sun: niu set correct packet size in skb net/mlx4_en: Verify coalescing parameters are in range net_sched: fq: take care of throttled flows before reuse net: support compat 64-bit time in {s,g}etsockopt openvswitch: Don't swap table in nlattr_set() after OVS_ATTR_NESTED is found qmi_wwan: do not steal interfaces from class drivers r8169: fix powering up RTL8168h sctp: handle two v4 addrs comparison in sctp_inet6_cmp_addr sctp: use the old asoc when making the cookie-ack chunk in dupcook_d tg3: Fix vunmap() BUG_ON() triggered from tg3_free_consistent(). bonding: do not allow rlb updates to invalid mac tcp: ignore Fast Open on repair mode sctp: fix the issue that the cookie-ack with auth can't get processed sctp: delay the authentication for the duplicated cookie-echo chunk ALSA: timer: Call notifier in the same spinlock audit: move calcs after alloc and check when logging set loginuid arm64: introduce mov_q macro to move a constant into a 64-bit register arm64: Add work around for Arm Cortex-A55 Erratum 1024718 futex: Remove unnecessary warning from get_futex_key futex: Remove duplicated code and fix undefined behaviour xfrm: fix xfrm_do_migrate() with AEAD e.g(AES-GCM) lockd: lost rollback of set_grace_period() in lockd_down_net() Revert "ARM: dts: imx6qdl-wandboard: Fix audio channel swap" l2tp: revert "l2tp: fix missing print session offset info" pipe: cap initial pipe capacity according to pipe-max-size limit futex: futex_wake_op, fix sign_extend32 sign bits kernel/exit.c: avoid undefined behaviour when calling wait4() usbip: usbip_host: refine probe and disconnect debug msgs to be useful usbip: usbip_host: delete device from busid_table after rebind usbip: usbip_host: run rebind from exit when module is removed usbip: usbip_host: fix NULL-ptr deref and use-after-free errors usbip: usbip_host: fix bad unlock balance during stub_probe() ALSA: usb: mixer: volume quirk for CM102-A+/102S+ ALSA: hda: Add Lenovo C50 All in one to the power_save blacklist ALSA: control: fix a redundant-copy issue spi: pxa2xx: Allow 64-bit DMA powerpc/powernv: panic() on OPAL < V3 powerpc/powernv: Remove OPALv2 firmware define and references powerpc/powernv: remove FW_FEATURE_OPALv3 and just use FW_FEATURE_OPAL cpuidle: coupled: remove unused define cpuidle_coupled_lock powerpc: Don't preempt_disable() in show_cpuinfo() vmscan: do not force-scan file lru if its absolute size is small proc: meminfo: estimate available memory more conservatively mm: filemap: remove redundant code in do_read_cache_page mm: filemap: avoid unnecessary calls to lock_page when waiting for IO to complete during a read signals: avoid unnecessary taking of sighand->siglock cpufreq: intel_pstate: Enable HWP by default tracing/x86/xen: Remove zero data size trace events trace_xen_mmu_flush_tlb{_all} proc read mm's {arg,env}_{start,end} with mmap semaphore taken. procfs: fix pthread cross-thread naming if !PR_DUMPABLE powerpc/powernv: Fix NVRAM sleep in invalid context when crashing mm: don't allow deferred pages with NEED_PER_CPU_KM s390/qdio: fix access to uninitialized qdio_q fields s390/cpum_sf: ensure sample frequency of perf event attributes is non-zero s390/qdio: don't release memory in qdio_setup_irq() s390: remove indirect branch from do_softirq_own_stack efi: Avoid potential crashes, fix the 'struct efi_pci_io_protocol_32' definition for mixed mode ARM: 8771/1: kprobes: Prohibit kprobes on do_undefinstr tick/broadcast: Use for_each_cpu() specially on UP kernels ARM: 8769/1: kprobes: Fix to use get_kprobe_ctlblk after irq-disabed ARM: 8770/1: kprobes: Prohibit probing on optimized_callback ARM: 8772/1: kprobes: Prohibit kprobes on get_user functions Btrfs: fix xattr loss after power failure btrfs: fix crash when trying to resume balance without the resume flag btrfs: fix reading stale metadata blocks after degraded raid1 mounts net: test tailroom before appending to linear skb packet: in packet_snd start writing at link layer allocation sock_diag: fix use-after-free read in __sk_free tcp: purge write queue in tcp_connect_init() ext2: fix a block leak s390: add assembler macros for CPU alternatives s390: move expoline assembler macros to a header s390/lib: use expoline for indirect branches s390/kernel: use expoline for indirect branches s390: move spectre sysfs attribute code s390: extend expoline to BC instructions s390: use expoline thunks in the BPF JIT scsi: libsas: defer ata device eh commands to libata scsi: sg: allocate with __GFP_ZERO in sg_build_indirect() scsi: zfcp: fix infinite iteration on ERP ready list dmaengine: ensure dmaengine helpers check valid callback time: Fix CLOCK_MONOTONIC_RAW sub-nanosecond accounting gpio: rcar: Add Runtime PM handling for interrupts cfg80211: limit wiphy names to 128 bytes hfsplus: stop workqueue when fill_super() failed x86/kexec: Avoid double free_page() upon do_kexec_load() failure Linux 4.4.133 Change-Id: I0554b12889bc91add2a444da95f18d59c6fb9cdb Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
| * tcp: purge write queue in tcp_connect_init()Eric Dumazet2018-05-26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 7f582b248d0a86bae5788c548d7bb5bca6f7691a ] syzkaller found a reliable way to crash the host, hitting a BUG() in __tcp_retransmit_skb() Malicous MSG_FASTOPEN is the root cause. We need to purge write queue in tcp_connect_init() at the point we init snd_una/write_seq. This patch also replaces the BUG() by a less intrusive WARN_ON_ONCE() kernel BUG at net/ipv4/tcp_output.c:2837! invalid opcode: 0000 [#1] SMP KASAN Dumping ftrace buffer: (ftrace buffer empty) Modules linked in: CPU: 0 PID: 5276 Comm: syz-executor0 Not tainted 4.17.0-rc3+ #51 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:__tcp_retransmit_skb+0x2992/0x2eb0 net/ipv4/tcp_output.c:2837 RSP: 0000:ffff8801dae06ff8 EFLAGS: 00010206 RAX: ffff8801b9fe61c0 RBX: 00000000ffc18a16 RCX: ffffffff864e1a49 RDX: 0000000000000100 RSI: ffffffff864e2e12 RDI: 0000000000000005 RBP: ffff8801dae073a0 R08: ffff8801b9fe61c0 R09: ffffed0039c40dd2 R10: ffffed0039c40dd2 R11: ffff8801ce206e93 R12: 00000000421eeaad R13: ffff8801ce206d4e R14: ffff8801ce206cc0 R15: ffff8801cd4f4a80 FS: 0000000000000000(0000) GS:ffff8801dae00000(0063) knlGS:00000000096bc900 CS: 0010 DS: 002b ES: 002b CR0: 0000000080050033 CR2: 0000000020000000 CR3: 00000001c47b6000 CR4: 00000000001406f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <IRQ> tcp_retransmit_skb+0x2e/0x250 net/ipv4/tcp_output.c:2923 tcp_retransmit_timer+0xc50/0x3060 net/ipv4/tcp_timer.c:488 tcp_write_timer_handler+0x339/0x960 net/ipv4/tcp_timer.c:573 tcp_write_timer+0x111/0x1d0 net/ipv4/tcp_timer.c:593 call_timer_fn+0x230/0x940 kernel/time/timer.c:1326 expire_timers kernel/time/timer.c:1363 [inline] __run_timers+0x79e/0xc50 kernel/time/timer.c:1666 run_timer_softirq+0x4c/0x70 kernel/time/timer.c:1692 __do_softirq+0x2e0/0xaf5 kernel/softirq.c:285 invoke_softirq kernel/softirq.c:365 [inline] irq_exit+0x1d1/0x200 kernel/softirq.c:405 exiting_irq arch/x86/include/asm/apic.h:525 [inline] smp_apic_timer_interrupt+0x17e/0x710 arch/x86/kernel/apic/apic.c:1052 apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:863 Fixes: cf60af03ca4e ("net-tcp: Fast Open client - sendmsg(MSG_FASTOPEN)") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Yuchung Cheng <ycheng@google.com> Cc: Neal Cardwell <ncardwell@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* | Merge 4.4.101 into android-4.4Greg Kroah-Hartman2017-11-24
|\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Changes in 4.4.101 tcp: do not mangle skb->cb[] in tcp_make_synack() netfilter/ipvs: clear ipvs_property flag when SKB net namespace changed bonding: discard lowest hash bit for 802.3ad layer3+4 vlan: fix a use-after-free in vlan_device_event() af_netlink: ensure that NLMSG_DONE never fails in dumps sctp: do not peel off an assoc from one netns to another one fealnx: Fix building error on MIPS net/sctp: Always set scope_id in sctp_inet6_skb_msgname ima: do not update security.ima if appraisal status is not INTEGRITY_PASS serial: omap: Fix EFR write on RTS deassertion arm64: fix dump_instr when PAN and UAO are in use nvme: Fix memory order on async queue deletion ocfs2: should wait dio before inode lock in ocfs2_setattr() ipmi: fix unsigned long underflow mm/page_alloc.c: broken deferred calculation coda: fix 'kernel memory exposure attempt' in fsync mm: check the return value of lookup_page_ext for all call sites mm/page_ext.c: check if page_ext is not prepared mm/pagewalk.c: report holes in hugetlb ranges Linux 4.4.101 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
| * tcp: do not mangle skb->cb[] in tcp_make_synack()Eric Dumazet2017-11-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 3b11775033dc87c3d161996c54507b15ba26414a ] Christoph Paasch sent a patch to address the following issue : tcp_make_synack() is leaving some TCP private info in skb->cb[], then send the packet by other means than tcp_transmit_skb() tcp_transmit_skb() makes sure to clear skb->cb[] to not confuse IPv4/IPV6 stacks, but we have no such cleanup for SYNACK. tcp_make_synack() should not use tcp_init_nondata_skb() : tcp_init_nondata_skb() really should be limited to skbs put in write/rtx queues (the ones that are only sent via tcp_transmit_skb()) This patch fixes the issue and should even save few cpu cycles ;) Fixes: 971f10eca186 ("tcp: better TCP_SKB_CB layout to reduce cache line misses") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Christoph Paasch <cpaasch@apple.com> Reviewed-by: Christoph Paasch <cpaasch@apple.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* | Merge 4.4.99 into android-4.4Greg Kroah-Hartman2017-11-18
|\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Changes in 4.4.99 mac80211: accept key reinstall without changing anything mac80211: use constant time comparison with keys mac80211: don't compare TKIP TX MIC key in reinstall prevention usb: usbtest: fix NULL pointer dereference Input: ims-psu - check if CDC union descriptor is sane ALSA: seq: Cancel pending autoload work at unbinding device tun/tap: sanitize TUNSETSNDBUF input tcp: fix tcp_mtu_probe() vs highest_sack l2tp: check ps->sock before running pppol2tp_session_ioctl() tun: call dev_get_valid_name() before register_netdevice() sctp: add the missing sock_owned_by_user check in sctp_icmp_redirect packet: avoid panic in packet_getsockopt() ipv6: flowlabel: do not leave opt->tot_len with garbage net/unix: don't show information about sockets from other namespaces ip6_gre: only increase err_count for some certain type icmpv6 in ip6gre_err tun: allow positive return values on dev_get_valid_name() call sctp: reset owner sk for data chunks on out queues when migrating a sock ppp: fix race in ppp device destruction ipip: only increase err_count for some certain type icmp in ipip_err tcp/dccp: fix ireq->opt races tcp/dccp: fix lockdep splat in inet_csk_route_req() tcp/dccp: fix other lockdep splats accessing ireq_opt security/keys: add CONFIG_KEYS_COMPAT to Kconfig tipc: fix link attribute propagation bug brcmfmac: remove setting IBSS mode when stopping AP target/iscsi: Fix iSCSI task reassignment handling target: Fix node_acl demo-mode + uncached dynamic shutdown regression misc: panel: properly restore atomic counter on error path Linux 4.4.99 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
| * tcp: fix tcp_mtu_probe() vs highest_sackEric Dumazet2017-11-18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 2b7cda9c35d3b940eb9ce74b30bbd5eb30db493d ] Based on SNMP values provided by Roman, Yuchung made the observation that some crashes in tcp_sacktag_walk() might be caused by MTU probing. Looking at tcp_mtu_probe(), I found that when a new skb was placed in front of the write queue, we were not updating tcp highest sack. If one skb is freed because all its content was copied to the new skb (for MTU probing), then tp->highest_sack could point to a now freed skb. Bad things would then happen, including infinite loops. This patch renames tcp_highest_sack_combine() and uses it from tcp_mtu_probe() to fix the bug. Note that I also removed one test against tp->sacked_out, since we want to replace tp->highest_sack regardless of whatever condition, since keeping a stale pointer to freed skb is a recipe for disaster. Fixes: a47e5a988a57 ("[TCP]: Convert highest_sack to sk_buff to allow direct access") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Alexei Starovoitov <alexei.starovoitov@gmail.com> Reported-by: Roman Gushchin <guro@fb.com> Reported-by: Oleksandr Natalenko <oleksandr@natalenko.name> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Neal Cardwell <ncardwell@google.com> Acked-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* | Merge 4.4.82 into android-4.4Greg Kroah-Hartman2017-08-14
|\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Changes in 4.4.82 tcp: avoid setting cwnd to invalid ssthresh after cwnd reduction states net: fix keepalive code vs TCP_FASTOPEN_CONNECT bpf, s390: fix jit branch offset related to ldimm64 net: sched: set xt_tgchk_param par.nft_compat as 0 in ipt_init_target tcp: fastopen: tcp_connect() must refresh the route net: avoid skb_warn_bad_offload false positives on UFO packet: fix tp_reserve race in packet_set_ring revert "net: account for current skb length when deciding about UFO" revert "ipv4: Should use consistent conditional judgement for ip fragment in __ip_append_data and ip_finish_output" udp: consistently apply ufo or fragmentation sparc64: Prevent perf from running during super critical sections KVM: arm/arm64: Handle hva aging while destroying the vm mm/mempool: avoid KASAN marking mempool poison checks as use-after-free ipv4: Should use consistent conditional judgement for ip fragment in __ip_append_data and ip_finish_output net: account for current skb length when deciding about UFO Linux 4.4.82 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
| * tcp: fastopen: tcp_connect() must refresh the routeEric Dumazet2017-08-12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 8ba60924710cde564a3905588b6219741d6356d0 ] With new TCP_FASTOPEN_CONNECT socket option, there is a possibility to call tcp_connect() while socket sk_dst_cache is either NULL or invalid. +0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 4 +0 fcntl(4, F_SETFL, O_RDWR|O_NONBLOCK) = 0 +0 setsockopt(4, SOL_TCP, TCP_FASTOPEN_CONNECT, [1], 4) = 0 +0 connect(4, ..., ...) = 0 << sk->sk_dst_cache becomes obsolete, or even set to NULL >> +1 sendto(4, ..., 1000, MSG_FASTOPEN, ..., ...) = 1000 We need to refresh the route otherwise bad things can happen, especially when syzkaller is running on the host :/ Fixes: 19f6d3f3c8422 ("net/tcp-fastopen: Add new API support") Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Wei Wang <weiwan@google.com> Cc: Yuchung Cheng <ycheng@google.com> Acked-by: Wei Wang <weiwan@google.com> Acked-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* | Merge 4.4.68 into android-4.4Greg Kroah-Hartman2017-05-15
|\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Changes in 4.4.68 9p: fix a potential acl leak ARM: 8452/3: PJ4: make coprocessor access sequences buildable in Thumb2 mode cpupower: Fix turbo frequency reporting for pre-Sandy Bridge cores powerpc/powernv: Fix opal_exit tracepoint opcode power: supply: bq24190_charger: Fix irq trigger to IRQF_TRIGGER_FALLING power: supply: bq24190_charger: Call set_mode_host() on pm_resume() power: supply: bq24190_charger: Install irq_handler_thread() at end of probe() power: supply: bq24190_charger: Call power_supply_changed() for relevant component power: supply: bq24190_charger: Don't read fault register outside irq_handle_thread() power: supply: bq24190_charger: Handle fault before status on interrupt leds: ktd2692: avoid harmless maybe-uninitialized warning ARM: OMAP5 / DRA7: Fix HYP mode boot for thumb2 build mwifiex: debugfs: Fix (sometimes) off-by-1 SSID print mwifiex: remove redundant dma padding in AMSDU mwifiex: Avoid skipping WEP key deletion for AP x86/ioapic: Restore IO-APIC irq_chip retrigger callback x86/pci-calgary: Fix iommu_free() comparison of unsigned expression >= 0 clk: Make x86/ conditional on CONFIG_COMMON_CLK kprobes/x86: Fix kernel panic when certain exception-handling addresses are probed x86/platform/intel-mid: Correct MSI IRQ line for watchdog device Revert "KVM: nested VMX: disable perf cpuid reporting" KVM: nVMX: initialize PML fields in vmcs02 KVM: nVMX: do not leak PML full vmexit to L1 usb: host: ehci-exynos: Decrese node refcount on exynos_ehci_get_phy() error paths usb: host: ohci-exynos: Decrese node refcount on exynos_ehci_get_phy() error paths usb: chipidea: Only read/write OTGSC from one place usb: chipidea: Handle extcon events properly USB: serial: keyspan_pda: fix receive sanity checks USB: serial: digi_acceleport: fix incomplete rx sanity check USB: serial: ssu100: fix control-message error handling USB: serial: io_edgeport: fix epic-descriptor handling USB: serial: ti_usb_3410_5052: fix control-message error handling USB: serial: ark3116: fix open error handling USB: serial: ftdi_sio: fix latency-timer error handling USB: serial: quatech2: fix control-message error handling USB: serial: mct_u232: fix modem-status error handling USB: serial: io_edgeport: fix descriptor error handling phy: qcom-usb-hs: Add depends on EXTCON serial: 8250_omap: Fix probe and remove for PM runtime scsi: mac_scsi: Fix MAC_SCSI=m option when SCSI=m MIPS: R2-on-R6 MULTU/MADDU/MSUBU emulation bugfix brcmfmac: Ensure pointer correctly set if skb data location changes brcmfmac: Make skb header writable before use staging: wlan-ng: add missing byte order conversion staging: emxx_udc: remove incorrect __init annotations ALSA: hda - Fix deadlock of controller device lock at unbinding tcp: do not underestimate skb->truesize in tcp_trim_head() bpf, arm64: fix jit branch offset related to ldimm64 tcp: fix wraparound issue in tcp_lp tcp: do not inherit fastopen_req from parent ipv4, ipv6: ensure raw socket message is big enough to hold an IP header rtnetlink: NUL-terminate IFLA_PHYS_PORT_NAME string ipv6: initialize route null entry in addrconf_init() ipv6: reorder ip6_route_dev_notifier after ipv6_dev_notf bnxt_en: allocate enough space for ->ntp_fltr_bmap f2fs: sanity check segment count drm/ttm: fix use-after-free races in vm fault handling block: get rid of blk_integrity_revalidate() Linux 4.4.68 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
| * tcp: do not underestimate skb->truesize in tcp_trim_head()Eric Dumazet2017-05-14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 7162fb242cb8322beb558828fd26b33c3e9fc805 ] Andrey found a way to trigger the WARN_ON_ONCE(delta < len) in skb_try_coalesce() using syzkaller and a filter attached to a TCP socket over loopback interface. I believe one issue with looped skbs is that tcp_trim_head() can end up producing skb with under estimated truesize. It hardly matters for normal conditions, since packets sent over loopback are never truncated. Bytes trimmed from skb->head should not change skb truesize, since skb->head is not reallocated. Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Andrey Konovalov <andreyknvl@google.com> Tested-by: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* | Merge branch 'upstream-linux-4.4.y' into android-4.4Todd Kjos2017-03-02
|\|
| * tcp: fix 0 divide in __tcp_select_window()Eric Dumazet2017-02-18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 06425c308b92eaf60767bc71d359f4cbc7a561f8 ] syszkaller fuzzer was able to trigger a divide by zero, when TCP window scaling is not enabled. SO_RCVBUF can be used not only to increase sk_rcvbuf, also to decrease it below current receive buffers utilization. If mss is negative or 0, just return a zero TCP window. Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Dmitry Vyukov <dvyukov@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* | Merge tag 'v4.4.32' into android-4.4.yDmitry Shmidt2016-11-15
|\| | | | | | | | | | | This is the 4.4.32 stable release Change-Id: I5028402eadfcf055ac44a5e67abc6da75b2068b3
| * tcp: fix wrong checksum calculation on MTU probingDouglas Caetano dos Santos2016-11-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 2fe664f1fcf7c4da6891f95708a7a56d3c024354 ] With TCP MTU probing enabled and offload TX checksumming disabled, tcp_mtu_probe() calculated the wrong checksum when a fragment being copied into the probe's SKB had an odd length. This was caused by the direct use of skb_copy_and_csum_bits() to calculate the checksum, as it pads the fragment being copied, if needed. When this fragment was not the last, a subsequent call used the previous checksum without considering this padding. The effect was a stale connection in one way, as even retransmissions wouldn't solve the problem, because the checksum was never recalculated for the full SKB length. Signed-off-by: Douglas Caetano dos Santos <douglascs@taghos.com.br> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * tcp: fix overflow in __tcp_retransmit_skb()Eric Dumazet2016-11-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit ffb4d6c8508657824bcef68a36b2a0f9d8c09d10 ] If a TCP socket gets a large write queue, an overflow can happen in a test in __tcp_retransmit_skb() preventing all retransmits. The flow then stalls and resets after timeouts. Tested: sysctl -w net.core.wmem_max=1000000000 netperf -H dest -- -s 1000000000 Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* | Merge tag 'v4.4.19' into android-4.4.yDmitry Shmidt2016-08-22
|\| | | | | | | This is the 4.4.19 stable release
| * tcp: consider recv buf for the initial window scaleSoheil Hassas Yeganeh2016-08-16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit f626300a3e776ccc9671b0dd94698fb3aa315966 ] tcp_select_initial_window() intends to advertise a window scaling for the maximum possible window size. To do so, it considers the maximum of net.ipv4.tcp_rmem[2] and net.core.rmem_max as the only possible upper-bounds. However, users with CAP_NET_ADMIN can use SO_RCVBUFFORCE to set the socket's receive buffer size to values larger than net.ipv4.tcp_rmem[2] and net.core.rmem_max. Thus, SO_RCVBUFFORCE is effectively ignored by tcp_select_initial_window(). To fix this, consider the maximum of net.ipv4.tcp_rmem[2], net.core.rmem_max and socket's initial buffer space. Fixes: b0573dea1fb3 ("[NET]: Introduce SO_{SND,RCV}BUFFORCE socket options") Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com> Suggested-by: Neal Cardwell <ncardwell@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* | Merge tag 'v4.4.16' into android-4.4.yDmitry Shmidt2016-08-01
|\| | | | | | | | | | | This is the 4.4.16 stable release Change-Id: Ibaf7b7e03695e1acebc654a2ca1a4bfcc48fcea4
| * tcp: refresh skb timestamp at retransmit timeEric Dumazet2016-05-18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 10a81980fc47e64ffac26a073139813d3f697b64 ] In the very unlikely case __tcp_retransmit_skb() can not use the cloning done in tcp_transmit_skb(), we need to refresh skb_mstamp before doing the copy and transmit, otherwise TCP TS val will be an exact copy of original transmit. Fixes: 7faee5c0d514 ("tcp: remove TCP_SKB_CB(skb)->when") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Yuchung Cheng <ycheng@google.com> Acked-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* | tcp: add a sysctl to config the tcp_default_init_rwndJP Abgrall2016-02-16
|/ | | | | | | | | | | | | | | | | | | | | The default initial rwnd is hardcoded to 10. Now we allow it to be controlled via /proc/sys/net/ipv4/tcp_default_init_rwnd which limits the values from 3 to 100 This is somewhat needed because ipv6 routes are autoconfigured by the kernel. See "An Argument for Increasing TCP's Initial Congestion Window" in https://developers.google.com/speed/articles/tcp_initcwnd_paper.pdf Change-Id: I386b2a9d62de0ebe05c1ebe1b4bd91b314af5c54 Signed-off-by: JP Abgrall <jpa@google.com> Conflicts: net/ipv4/sysctl_net_ipv4.c net/ipv4/tcp_input.c
* tcp: restore fastopen with no data in SYN packetEric Dumazet2015-12-17
| | | | | | | | | | | | | | Yuchung tracked a regression caused by commit 57be5bdad759 ("ip: convert tcp_sendmsg() to iov_iter primitives") for TCP Fast Open. Some Fast Open users do not actually add any data in the SYN packet. Fixes: 57be5bdad759 ("ip: convert tcp_sendmsg() to iov_iter primitives") Reported-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Acked-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: make skb_set_owner_w() more robustEric Dumazet2015-11-02
| | | | | | | | | | | | | | | | | | skb_set_owner_w() is called from various places that assume skb->sk always point to a full blown socket (as it changes sk->sk_wmem_alloc) We'd like to attach skb to request sockets, and in the future to timewait sockets as well. For these kind of pseudo sockets, we need to take a traditional refcount and use sock_edemux() as the destructor. It is now time to un-inline skb_set_owner_w(), being too big. Fixes: ca6fb0651883 ("tcp: attach SYNACK messages to request sockets instead of listener") Signed-off-by: Eric Dumazet <edumazet@google.com> Bisected-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2015-10-24
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Conflicts: net/ipv6/xfrm6_output.c net/openvswitch/flow_netlink.c net/openvswitch/vport-gre.c net/openvswitch/vport-vxlan.c net/openvswitch/vport.c net/openvswitch/vport.h The openvswitch conflicts were overlapping changes. One was the egress tunnel info fix in 'net' and the other was the vport ->send() op simplification in 'net-next'. The xfrm6_output.c conflicts was also a simplification overlapping a bug fix. Signed-off-by: David S. Miller <davem@davemloft.net>
| * tcp: remove improper preemption check in tcp_xmit_probe_skb()Renato Westphal2015-10-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit e520af48c7e5a introduced the following bug when setting the TCP_REPAIR sockoption: [ 2860.657036] BUG: using __this_cpu_add() in preemptible [00000000] code: daemon/12164 [ 2860.657045] caller is __this_cpu_preempt_check+0x13/0x20 [ 2860.657049] CPU: 1 PID: 12164 Comm: daemon Not tainted 4.2.3 #1 [ 2860.657051] Hardware name: Dell Inc. PowerEdge R210 II/0JP7TR, BIOS 2.0.5 03/13/2012 [ 2860.657054] ffffffff81c7f071 ffff880231e9fdf8 ffffffff8185d765 0000000000000002 [ 2860.657058] 0000000000000001 ffff880231e9fe28 ffffffff8146ed91 ffff880231e9fe18 [ 2860.657062] ffffffff81cd1a5d ffff88023534f200 ffff8800b9811000 ffff880231e9fe38 [ 2860.657065] Call Trace: [ 2860.657072] [<ffffffff8185d765>] dump_stack+0x4f/0x7b [ 2860.657075] [<ffffffff8146ed91>] check_preemption_disabled+0xe1/0xf0 [ 2860.657078] [<ffffffff8146edd3>] __this_cpu_preempt_check+0x13/0x20 [ 2860.657082] [<ffffffff817e0bc7>] tcp_xmit_probe_skb+0xc7/0x100 [ 2860.657085] [<ffffffff817e1e2d>] tcp_send_window_probe+0x2d/0x30 [ 2860.657089] [<ffffffff817d1d8c>] do_tcp_setsockopt.isra.29+0x74c/0x830 [ 2860.657093] [<ffffffff817d1e9c>] tcp_setsockopt+0x2c/0x30 [ 2860.657097] [<ffffffff81767b74>] sock_common_setsockopt+0x14/0x20 [ 2860.657100] [<ffffffff817669e1>] SyS_setsockopt+0x71/0xc0 [ 2860.657104] [<ffffffff81865172>] entry_SYSCALL_64_fastpath+0x16/0x75 Since tcp_xmit_probe_skb() can be called from process context, use NET_INC_STATS() instead of NET_INC_STATS_BH(). Fixes: e520af48c7e5 ("tcp: add TCPWinProbe and TCPKeepAlive SNMP counters") Signed-off-by: Renato Westphal <renatow@taghos.com.br> Signed-off-by: David S. Miller <davem@davemloft.net>
* | tcp: remove tcp_mark_lost_retrans()Yuchung Cheng2015-10-21
| | | | | | | | | | | | | | | | | | | | | | Remove the existing lost retransmit detection because RACK subsumes it completely. This also stops the overloading the ack_seq field of the skb control block. Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | tcp: do not set queue_mapping on SYNACKEric Dumazet2015-10-18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | At the time of commit fff326990789 ("tcp: reflect SYN queue_mapping into SYNACK packets") we had little ways to cope with SYN floods. We no longer need to reflect incoming skb queue mappings, and instead can pick a TX queue based on cpu cooking the SYNACK, with normal XPS affinities. Note that all SYNACK retransmits were picking TX queue 0, this no longer is a win given that SYNACK rtx are now distributed on all cpus. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: shrink struct sock and request_sock by 8 bytesEric Dumazet2015-10-12
| | | | | | | | | | | | | | | | One 32bit hole is following skc_refcnt, use it. skc_incoming_cpu can also be an union for request_sock rcv_wnd. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | tcp: attach SYNACK messages to request sockets instead of listenerEric Dumazet2015-10-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If a listen backlog is very big (to avoid syncookies), then the listener sk->sk_wmem_alloc is the main source of false sharing, as we need to touch it twice per SYNACK re-transmit and TX completion. (One SYN packet takes listener lock once, but up to 6 SYNACK are generated) By attaching the skb to the request socket, we remove this source of contention. Tested: listen(fd, 10485760); // single listener (no SO_REUSEPORT) 16 RX/TX queue NIC Sustain a SYNFLOOD attack of ~320,000 SYN per second, Sending ~1,400,000 SYNACK per second. Perf profiles now show listener spinlock being next bottleneck. 20.29% [kernel] [k] queued_spin_lock_slowpath 10.06% [kernel] [k] __inet_lookup_established 5.12% [kernel] [k] reqsk_timer_handler 3.22% [kernel] [k] get_next_timer_interrupt 3.00% [kernel] [k] tcp_make_synack 2.77% [kernel] [k] ipt_do_table 2.70% [kernel] [k] run_timer_softirq 2.50% [kernel] [k] ip_finish_output 2.04% [kernel] [k] cascade Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | tcp: Fix CWV being too strict on thin streamsBendik Rønning Opstad2015-09-28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Application limited streams such as thin streams, that transmit small amounts of payload in relatively few packets per RTT, can be prevented from growing the CWND when in congestion avoidance. This leads to increased sojourn times for data segments in streams that often transmit time-dependent data. Currently, a connection is considered CWND limited only after having successfully transmitted at least one packet with new data, while at the same time failing to transmit some unsent data from the output queue because the CWND is full. Applications that produce small amounts of data may be left in a state where it is never considered to be CWND limited, because all unsent data is successfully transmitted each time an incoming ACK opens up for more data to be transmitted in the send window. Fix by always testing whether the CWND is fully used after successful packet transmissions, such that a connection is considered CWND limited whenever the CWND has been filled. This is the correct behavior as specified in RFC2861 (section 3.1). Cc: Andreas Petlund <apetlund@simula.no> Cc: Carsten Griwodz <griff@simula.no> Cc: Jonas Markussen <jonassm@ifi.uio.no> Cc: Kenneth Klette Jonassen <kennetkl@ifi.uio.no> Cc: Mads Johannessen <madsjoh@ifi.uio.no> Signed-off-by: Bendik Rønning Opstad <bro.devel+kernel@gmail.com> Acked-by: Eric Dumazet <edumazet@google.com> Tested-by: Eric Dumazet <edumazet@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Tested-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2015-09-26
|\| | | | | | | | | | | | | | | | | | | Conflicts: net/ipv4/arp.c The net/ipv4/arp.c conflict was one commit adding a new local variable while another commit was deleting one. Signed-off-by: David S. Miller <davem@davemloft.net>
| * tcp: add proper TS val into RST packetsEric Dumazet2015-09-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | RST packets sent on behalf of TCP connections with TS option (RFC 7323 TCP timestamps) have incorrect TS val (set to 0), but correct TS ecr. A > B: Flags [S], seq 0, win 65535, options [mss 1000,nop,nop,TS val 100 ecr 0], length 0 B > A: Flags [S.], seq 2444755794, ack 1, win 28960, options [mss 1460,nop,nop,TS val 7264344 ecr 100], length 0 A > B: Flags [.], ack 1, win 65535, options [nop,nop,TS val 110 ecr 7264344], length 0 B > A: Flags [R.], seq 1, ack 1, win 28960, options [nop,nop,TS val 0 ecr 110], length 0 We need to call skb_mstamp_get() to get proper TS val, derived from skb->skb_mstamp Note that RFC 1323 was advocating to not send TS option in RST segment, but RFC 7323 recommends the opposite : Once TSopt has been successfully negotiated, that is both <SYN> and <SYN,ACK> contain TSopt, the TSopt MUST be sent in every non-<RST> segment for the duration of the connection, and SHOULD be sent in an <RST> segment (see Section 5.2 for details) Note this RFC recommends to send TS val = 0, but we believe it is premature : We do not know if all TCP stacks are properly handling the receive side : When an <RST> segment is received, it MUST NOT be subjected to the PAWS check by verifying an acceptable value in SEG.TSval, and information from the Timestamps option MUST NOT be used to update connection state information. SEG.TSecr MAY be used to provide stricter <RST> acceptance checks. In 5 years, if/when all TCP stack are RFC 7323 ready, we might consider to decide to send TS val = 0, if it buys something. Fixes: 7faee5c0d514 ("tcp: remove TCP_SKB_CB(skb)->when") Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | tcp/dccp: constify rtx_synack() and friendsEric Dumazet2015-09-25
| | | | | | | | | | | | | | | | This is done to make sure we do not change listener socket while sending SYNACK packets while socket lock is not held. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | tcp: constify tcp_make_synack() socket argumentEric Dumazet2015-09-25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | listener socket is not locked when tcp_make_synack() is called. We better make sure no field is written. There is one exception : Since SYNACK packets are attached to the listener at this moment (or SYN_RECV child in case of Fast Open), sock_wmalloc() needs to update sk->sk_wmem_alloc, but this is done using atomic operations so this is safe. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | tcp: remove tcp_ecn_make_synack() socket argumentEric Dumazet2015-09-25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | SYNACK packets might be sent without holding socket lock. For DCTCP/ECN sake, we should call INET_ECN_xmit() while socket lock is owned, and only when we init/change congestion control. This also fixies a bug if congestion module is changed from dctcp to another one on a listener : we now clear ECN bits properly. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | tcp: remove tcp_synack_options() socket argumentEric Dumazet2015-09-25
| | | | | | | | | | | | | | We do not use the socket in this function. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>