TransWikia.com

Ubuntu/Debian server loses network sporadically when wired and running Docker

Unix & Linux Asked on January 15, 2021

I have spent months on this problem, and I’m just about at my wits’ end. I have a home media server that runs docker to run containers. I have a docker-compose file that I have all my stuff defined in. The box itself is given a static IP by the network (eero in this case). I run docker-compose up -d, and leave it to host my stuff.

Between a week and a day (it’s inconsistent), the machine will just lose connectivity from the network. The current network setup is modem –> eero –> network switch –> server. The only way for me to reconnect to the server is to reboot it. Only then does the network come back online. I had this problem with Debian (happened on both 9 and 10) originally, but I changed my OS since a friend of mine runs Ubuntu without issue. I switched to Ubuntu Server (20), but have the same issue. Briefly, I did look at https://github.com/moby/moby/issues/36153 as a possible root cause, but adding the files suggested didn’t seem to make a difference.

The next consideration was that maybe it was a hardware issue, so I switched from using my onboard ethernet to using a USB-C ethernet adapter. That seemed to work for 3 days, but then I had the same problem.

At this point, I’m lost as to what I can do to narrow down the problem. I’ve looked through syslog, but nothing seems to stand out to me there. I’ve checked the container logs, but all the containers are fine. On Debian I was using Network Manager, but on Ubuntu, I’m using systemd-networkd. Both experience this issue.

My Ubuntu version is Ubuntu 20.04 LTS (GNU/Linux 5.4.0-37-generic x86_64)

My hardware info below in case that helps

H/W path              Device           Class          Description
=================================================================
                                       system         System Product Name (SKU)
/0                                     bus            PRIME X370-PRO
/0/0                                   memory         64KiB BIOS
/0/2c                                  memory         16GiB System Memory
/0/2c/0                                memory         8GiB DIMM DDR4 Synchronous Unbuffered (Unregistered) 2133 MHz (0.5 ns)
/0/2c/1                                memory         [empty]
/0/2c/2                                memory         8GiB DIMM DDR4 Synchronous Unbuffered (Unregistered) 2133 MHz (0.5 ns)
/0/2c/3                                memory         [empty]
/0/2e                                  memory         576KiB L1 cache
/0/2f                                  memory         3MiB L2 cache
/0/30                                  memory         16MiB L3 cache
/0/31                                  processor      AMD Ryzen 5 1600 Six-Core Processor
/0/100                                 bridge         Family 17h (Models 00h-0fh) Root Complex
/0/100/0.2                             generic        Family 17h (Models 00h-0fh) I/O Memory Management Unit
/0/100/1.3                             bridge         Family 17h (Models 00h-0fh) PCIe GPP Bridge
/0/100/1.3/0                           bus            X370 Series Chipset USB 3.1 xHCI Controller
/0/100/1.3/0/0        usb1             bus            xHCI Host Controller
/0/100/1.3/0/0/7                       generic        Belkin USB-C LAN
/0/100/1.3/0/1        usb2             bus            xHCI Host Controller
/0/100/1.3/0.1        scsi0            storage        X370 Series Chipset SATA Controller
/0/100/1.3/0.1/0      /dev/sda         disk           120GB SanDisk SDSSDA12
/0/100/1.3/0.1/0/1    /dev/sda1        volume         511MiB Windows FAT volume
/0/100/1.3/0.1/0/2    /dev/sda2        volume         111GiB EXT4 volume
/0/100/1.3/0.1/1      /dev/sdb         disk           3TB Hitachi HUS72403
/0/100/1.3/0.1/2      /dev/sdc         disk           3TB Hitachi HUS72403
/0/100/1.3/0.1/3      /dev/sdd         disk           3TB Hitachi HUS72403
/0/100/1.3/0.1/4      /dev/sde         disk           3TB Hitachi HUS72403
/0/100/1.3/0.1/5      /dev/sdf         disk           3TB Hitachi HUS72403
/0/100/1.3/0.2                         bridge         X370 Series Chipset PCIe Upstream Port
/0/100/1.3/0.2/0                       bridge         300 Series Chipset PCIe Port
/0/100/1.3/0.2/2                       bridge         300 Series Chipset PCIe Port
/0/100/1.3/0.2/3                       bridge         300 Series Chipset PCIe Port
/0/100/1.3/0.2/4                       bridge         300 Series Chipset PCIe Port
/0/100/1.3/0.2/4/0                     bus            ASM1142 USB 3.1 Host Controller
/0/100/1.3/0.2/4/0/0  usb3             bus            xHCI Host Controller
/0/100/1.3/0.2/4/0/1  usb4             bus            xHCI Host Controller
/0/100/1.3/0.2/6                       bridge         300 Series Chipset PCIe Port
/0/100/1.3/0.2/6/0    enp7s0           network        I211 Gigabit Network Connection
/0/100/1.3/0.2/7                       bridge         300 Series Chipset PCIe Port
/0/100/3.2                             bridge         Family 17h (Models 00h-0fh) PCIe GPP Bridge
/0/100/3.2/0                           display        GP107 [GeForce GTX 1050]
/0/100/3.2/0.1                         multimedia     GP107GL High Definition Audio Controller
/0/100/7.1                             bridge         Family 17h (Models 00h-0fh) Internal PCIe GPP Bridge 0 to Bus B
/0/100/7.1/0                           generic        Zeppelin/Raven/Raven2 PCIe Dummy Function
/0/100/7.1/0.2                         generic        Family 17h (Models 00h-0fh) Platform Security Processor
/0/100/7.1/0.3                         bus            Family 17h (Models 00h-0fh) USB 3.0 Host Controller
/0/100/7.1/0.3/0      usb5             bus            xHCI Host Controller
/0/100/7.1/0.3/1      usb6             bus            xHCI Host Controller
/0/100/8.1                             bridge         Family 17h (Models 00h-0fh) Internal PCIe GPP Bridge 0 to Bus B
/0/100/8.1/0                           generic        Zeppelin/Renoir PCIe Dummy Function
/0/100/8.1/0.2                         storage        FCH SATA Controller [AHCI mode]
/0/100/8.1/0.3                         multimedia     Family 17h (Models 00h-0fh) HD Audio Controller
/0/100/14                              bus            FCH SMBus Controller
/0/100/14.3                            bridge         FCH LPC Bridge
/0/101                                 bridge         Family 17h (Models 00h-1fh) PCIe Dummy Host Bridge
/0/102                                 bridge         Family 17h (Models 00h-1fh) PCIe Dummy Host Bridge
/0/103                                 bridge         Family 17h (Models 00h-1fh) PCIe Dummy Host Bridge
/0/104                                 bridge         Family 17h (Models 00h-1fh) PCIe Dummy Host Bridge
/0/105                                 bridge         Family 17h (Models 00h-1fh) PCIe Dummy Host Bridge
/0/106                                 bridge         Family 17h (Models 00h-1fh) PCIe Dummy Host Bridge
/0/107                                 bridge         Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 0
/0/108                                 bridge         Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 1
/0/109                                 bridge         Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 2
/0/10a                                 bridge         Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 3
/0/10b                                 bridge         Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 4
/0/10c                                 bridge         Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 5
/0/10d                                 bridge         Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 6
/0/10e                                 bridge         Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 7
/0/1                                   system         PnP device PNP0c01
/0/2                                   system         PnP device PNP0b00
/0/3                                   system         PnP device PNP0c02
/0/4                                   communication  PnP device PNP0501
/0/5                                   system         PnP device PNP0c02
/1                    br-10d6cc4b0f64  network        Ethernet interface
/2                    veth80c7cea      network        Ethernet interface
/3                    enx302303052de3  network        Ethernet interface
/4                    vethf4fd33e      network        Ethernet interface
/5                    vethab1d028      network        Ethernet interface
/6                    vethb9ac1e0      network        Ethernet interface
/7                    veth00d454b      network        Ethernet interface
/8                    docker0          network        Ethernet interface

Here is my docker-compose file too. My current docker version is Docker version 19.03.11, build dd360c7, and my docker-compose version is docker-compose version 1.26.0, build d4451659.

version: "3.7"

services:
  plex:
    image: plexinc/pms-docker
    container_name: plex
    volumes:
      - /mnt/plex/config:/config
      - /mnt/plex/Movies:/data/movies
      - /mnt/plex/Shows:/data/tvshows
      - /mnt/plex/transcode:/data/transcode
    ports:
      - 32400:32400/tcp
      - 3005:3005/tcp
      - 8324:8324/tcp
      - 32469:32469/tcp
      - 1900:1900/udp
      - 32410:32410/udp
      - 32412:32412/udp
      - 32413:32413/udp
      - 32414:32414/udp
    restart: unless-stopped
    environment:
      - PUID=1000
      - PGID=1000
      - VERSION=latest
      - TZ=America/Los_Angeles
  homebridge:
    image: oznu/homebridge:latest
    container_name: homebridge
    restart: unless-stopped
    network_mode: host
    environment:
      - TZ=America/Los_Angeles
      - PGID=1000
      - PUID=1000
      - HOMEBRIDGE_CONFIG_UI=1
      - HOMEBRIDGE_CONFIG_UI_PORT=8008
    volumes:
      - /mnt/homebridge:/homebridge
  nzbget:
    image: linuxserver/nzbget:latest
    container_name: nzbget
    volumes:
      - /mnt/nzbget/config:/config
      - /mnt/nzbget/downloads:/downloads
    restart: unless-stopped
    environment:
      - TZ=America/Los_Angeles
      - PUID=1000
      - PGID=1000
    ports:
      - 6789:6789
  sonarr:
    image: linuxserver/sonarr:latest
    container_name: sonarr
    restart: unless-stopped
    depends_on:
      - nzbget
    volumes:
      - /mnt/sonarr/config:/config
      - /mnt/nzbget/downloads:/downloads
      - /mnt/plex/Shows:/tv
    environment:
      - TZ=America/Los_Angeles
      - PUID=1000
      - PGID=1000
    ports:
      - 8989:8989
  radarr:
    image: linuxserver/radarr:latest
    container_name: radarr
    restart: unless-stopped
    depends_on:
      - nzbget
    volumes:
      - /mnt/radarr/config:/config
      - /mnt/nzbget/downloads:/downloads
      - /mnt/plex/Movies:/movies
    environment:
      - TZ=America/Los_Angeles
      - PUID=1000
      - PGID=1000
    ports:
      - 7878:7878
  tautulli:
    image: linuxserver/tautulli:latest
    container_name: tautulli
    depends_on:
      - plex
    restart: unless-stopped
    environment:
      - TZ=America/Los_Angeles
      - PUID=1000
      - GUID=1000
    volumes:
      - /mnt/tautulli/config:/config
      - /mnt/tautulli/logs:/logs:ro
    ports:
      - 8181:8181

If I’ve missed anything, please let me know and I’m happy to provide more information.

EDIT:

I’ve also attempted to update the Realtek driver to the latest last night to see if that may be the cause of the issue because I found the following in journalctl

Jun 14 01:17:25 phoenix kernel: xhci_hcd 0000:01:00.0: xHCI host not responding to stop endpoint command.
Jun 14 01:17:25 phoenix kernel: xhci_hcd 0000:01:00.0: xHCI host controller not responding, assume dead
Jun 14 01:17:25 phoenix kernel: r8152 1-7:1.0 enx302303052de3: Tx status -108
Jun 14 01:17:25 phoenix kernel: r8152 1-7:1.0 enx302303052de3: Tx status -108
Jun 14 01:17:25 phoenix kernel: r8152 1-7:1.0 enx302303052de3: Tx status -108
Jun 14 01:17:25 phoenix kernel: r8152 1-7:1.0 enx302303052de3: Tx status -108
Jun 14 01:17:25 phoenix kernel: xhci_hcd 0000:01:00.0: HC died; cleaning up
Jun 14 01:17:25 phoenix kernel: r8152 1-7:1.0 enx302303052de3: Tx timeout
Jun 14 01:17:25 phoenix kernel: usb 1-7: USB disconnect, device number 2
Jun 14 01:17:25 phoenix kernel: r8152 1-7:1.0 enx302303052de3: Get ether addr fail
Jun 14 01:17:25 phoenix systemd-networkd[933]: enx302303052de3: Link DOWN

I did so following https://www.pcsuggest.com/install-rtl8153-driver-linux/. However, it seems things disconnected this morning so I can’t say for sure if this helped or not.

EDIT 2:

It seems docker may be failing or restarting due to snap?

Jun 24 05:01:47 phoenix docker.dockerd[998]: failed to start containerd: timeout waiting for containerd to start
Jun 24 05:01:47 phoenix systemd[1]: snap.docker.dockerd.service: Main process exited, code=exited, status=1/FAILURE
Jun 24 05:01:47 phoenix systemd[1]: snap.docker.dockerd.service: Failed with result 'exit-code'.
Jun 24 05:01:47 phoenix systemd[1]: snap.docker.dockerd.service: Scheduled restart job, restart counter is at 1.
Jun 24 05:01:47 phoenix systemd[1]: Stopped Service for snap application docker.dockerd.
Jun 24 05:01:47 phoenix systemd[1]: Started Service for snap application docker.dockerd.

After this I can clearly see an ip reassignment trigger which then caused my box to go offline

EDIT 3:

Here is a snippet from the iplog

[2020-07-05T00:24:28.507613] Deleted dev vetha537571 lladdr 02:42:ac:13:00:02 STALE
[2020-07-05T00:24:29.019491] 192.168.7.1 dev enp7s0 lladdr 14:22:db:9c:4d:ed PROBE
[2020-07-05T00:24:29.019674] 192.168.7.1 dev enp7s0 lladdr 14:22:db:9c:4d:ed REACHABLE
[2020-07-05T00:24:32.603688] 172.19.0.3 dev br-c5ca2723d156 lladdr 02:42:ac:13:00:03 STALE
[2020-07-05T00:24:59.227481] 172.19.0.6 dev br-c5ca2723d156 lladdr 02:42:ac:13:00:06 STALE
[2020-07-05T00:25:01.275258] 172.19.0.7 dev br-c5ca2723d156 lladdr 02:42:ac:13:00:07 STALE
[2020-07-05T00:25:30.715499] 172.19.0.3 dev br-c5ca2723d156 lladdr 02:42:ac:13:00:03 PROBE
[2020-07-05T00:25:30.715641] 172.19.0.3 dev br-c5ca2723d156 lladdr 02:42:ac:13:00:03 REACHABLE
[2020-07-05T00:25:34.299181] 192.168.7.1 dev enp7s0 lladdr 14:22:db:9c:4d:ed STALE
[2020-07-05T00:25:38.139499] 192.168.7.50 dev enp7s0 lladdr 24:f5:a2:94:74:e9 STALE
[2020-07-05T00:25:38.139586] 192.168.7.55 dev enp7s0 lladdr 30:23:03:01:33:c5 STALE
[2020-07-05T00:25:39.931537] 192.168.7.1 dev enp7s0 lladdr 14:22:db:9c:4d:ed PROBE
[2020-07-05T00:25:39.931823] 192.168.7.1 dev enp7s0 lladdr 14:22:db:9c:4d:ed REACHABLE
[2020-07-05T00:25:47.099314] 192.168.7.50 dev enp7s0 lladdr 24:f5:a2:94:74:e9 PROBE
[2020-07-05T00:25:47.099401] 192.168.7.55 dev enp7s0 lladdr 30:23:03:01:33:c5 PROBE
[2020-07-05T00:25:47.101034] 192.168.7.50 dev enp7s0 lladdr 24:f5:a2:94:74:e9 REACHABLE
[2020-07-05T00:25:47.102485] 192.168.7.55 dev enp7s0 lladdr 30:23:03:01:33:c5 REACHABLE
[2020-07-05T00:25:57.595220] 172.19.0.6 dev br-c5ca2723d156 lladdr 02:42:ac:13:00:06 PROBE
[2020-07-05T00:25:57.595308] 172.19.0.6 dev br-c5ca2723d156 lladdr 02:42:ac:13:00:06 REACHABLE
[2020-07-05T00:25:58.363503] 172.19.0.7 dev br-c5ca2723d156 lladdr 02:42:ac:13:00:07 PROBE
[2020-07-05T00:25:58.363730] 172.19.0.7 dev br-c5ca2723d156 lladdr 02:42:ac:13:00:07 REACHABLE
[2020-07-05T00:26:00.667505] 172.19.0.3 dev br-c5ca2723d156 lladdr 02:42:ac:13:00:03 STALE
[2020-07-05T00:26:12.955465] 192.168.7.1 dev enp7s0 lladdr 14:22:db:9c:4d:ed STALE
[2020-07-05T00:26:19.099249] 192.168.7.1 dev enp7s0 lladdr 14:22:db:9c:4d:ed PROBE
[2020-07-05T00:26:19.099393] 192.168.7.1 dev enp7s0 lladdr 14:22:db:9c:4d:ed REACHABLE
[2020-07-05T00:26:29.339502] 172.19.0.6 dev br-c5ca2723d156 lladdr 02:42:ac:13:00:06 STALE
[2020-07-05T00:26:29.339583] 172.19.0.7 dev br-c5ca2723d156 lladdr 02:42:ac:13:00:07 STALE
[2020-07-05T00:26:37.531222] 192.168.7.50 dev enp7s0 lladdr 24:f5:a2:94:74:e9 STALE
[2020-07-05T00:26:37.531304] 192.168.7.55 dev enp7s0 lladdr 30:23:03:01:33:c5 STALE
[2020-07-05T00:26:47.003597] 192.168.7.55 dev enp7s0 lladdr 30:23:03:01:33:c5 PROBE
[2020-07-05T00:26:47.003678] 192.168.7.50 dev enp7s0 lladdr 24:f5:a2:94:74:e9 PROBE
[2020-07-05T00:26:47.005742] 192.168.7.55 dev enp7s0 lladdr 30:23:03:01:33:c5 REACHABLE
[2020-07-05T00:26:47.007351] 192.168.7.50 dev enp7s0 lladdr 24:f5:a2:94:74:e9 REACHABLE
[2020-07-05T00:27:00.827525] 172.19.0.3 dev br-c5ca2723d156 lladdr 02:42:ac:13:00:03 PROBE
[2020-07-05T00:27:00.827816] 172.19.0.3 dev br-c5ca2723d156 lladdr 02:42:ac:13:00:03 REACHABLE
[2020-07-05T00:27:12.859480] 192.168.7.1 dev enp7s0 lladdr 14:22:db:9c:4d:ed STALE
[2020-07-05T00:27:19.003172] 192.168.7.1 dev enp7s0 lladdr 14:22:db:9c:4d:ed PROBE

One Answer

This took me some more digging to find out. But eventually I had my computer running with it plugged into my monitor and I saw a CPU lockup the next time I lost network connectivity.

Some quick searching seems to point to the possibility it is a power state problem with Ryzen CPUs https://askubuntu.com/a/1259021

Following that answer, I followed this guide to disable the C6 power state https://forum.manjaro.org/t/fix-ryzen-lockups-related-to-low-system-usage/39723

I'm verging on 3 days of uptime without any issue. Currently on wifi, but intend to switch the machine back to wired. I will update in a month to see how the uptime has been since then. Hopefully this helps the next person who experiences a similar issue.

Answered by David on January 15, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP