TransWikia.com

My /var/log/ is mysteriously filling up GBs in minutes! Any cure before I re-install Debian 7?

Unix & Linux Asked on December 12, 2021

Good morning, fellow *nix enthusiasts!

I have been using Debian 7 for a while now and after a recent upgrade I noticed I constantly kept running out space on my root partition. I mean to the point where I had ‘0’ bytes left on disk! So, after a lot of searching, I was able to zero-in on the /var/log folder. I used ls -s -S to arrange the files by size in this folder and noticed that three files were GBs in size (such as 13-15 GB):

  • syslog
  • messages
  • kern.log

And yes, logrotate is working fine. It is rotating the logs. For example, I see kern.log.1 etc in /var/log. The problem is the logs are filling up so extremely fast that there’s nothing logrotate can do.

Apparently, some logging process in the OS is writing a lot of data which could be because of constant errors or something(??). I don’t know. All I know is my laptop is over-heating simply because there’s so much processing going on all the time due to this constant write process. So, I’m losing CPU power, AND disk space.

My question is: how can I determine what process/daemon is creating this issue? How do I get to the root-cause of the problem so I could correct it? Reading these HUGE log files is not an option. Please. If I try to pull up a 15 GB log file in a text editor like leafpad or notepad on an already busy laptop, it just takes ages and ages to open. That is not practical.

I realize that this question is broad because there could be any process/daemon causing this, but I want to know if anyone has experienced this before, and if there are any usual suspects I could look at.

UPDATE:

Following Eric’s advice, I arranged the files in /var/log by modification time, and ‘syslog’ was the last one. So, I tail‘ed it. The result:

Apr 10 00:53:37 MyMachine kernel: [11608.690733]  [<ffffffffa08e4005>] ? ath9k_reg_rmw+0x35/0x70 [ath9k_htc]
Apr 10 00:53:37 MyMachine kernel: [11608.690742]  [<ffffffff81084f57>] ? process_one_work+0x147/0x3b0
Apr 10 00:53:37 MyMachine kernel: [11608.690750]  [<ffffffff81085764>] ? worker_thread+0x114/0x480
Apr 10 00:53:37 MyMachine kernel: [11608.690756]  [<ffffffff81556065>] ? __schedule+0x2e5/0x790
Apr 10 00:53:37 MyMachine kernel: [11608.690765]  [<ffffffff81085650>] ? create_worker+0x1c0/0x1c0
Apr 10 00:53:37 MyMachine kernel: [11608.690772]  [<ffffffff8108ae91>] ? kthread+0xc1/0xe0
Apr 10 00:53:37 MyMachine kernel: [11608.690780]  [<ffffffff8108add0>] ? kthread_create_on_node+0x1c0/0x1c0
Apr 10 00:53:37 MyMachine kernel: [11608.690788]  [<ffffffff8155a23c>] ? ret_from_fork+0x7c/0xb0
Apr 10 00:53:37 MyMachine kernel: [11608.690795]  [<ffffffff8108add0>] ? kthread_create_on_node+0x1c0/0x1c0
Apr 10 00:53:37 MyMachine kernel: [11608.690800] ---[ end trace 12dc8d8439345c1d ]

Unfortunately, it doesn’t give me much of a hint.

3 Answers

I got the problem solved already by using the instructions here.

Method 1

Add pci=noaer to your kernel command line:

  1. edit /etc/default/grub and and add pci=noaer to the line starting with GRUB_CMDLINE_LINUX_DEFAULT. It will look like this:
    GRUB_CMDLINE_LINUX_DEFAULT="quiet splash pci=noaer" 
    
  2. run
    sudo update-grub
    
  3. reboot

It reduced the log files stopped hugely growing in size.

Method 2

If it also doesn't help you can edit the same

GRUB_CMDLINE_LINUX_DEFAULT="quiet splash pci=noaer pci=nomsi" 

Reference

However, I don't know if this fixed the root cause of the error-messages...

Answered by Kapil on December 12, 2021

You don't need to open the log files in an editor to see what's flooding them. Just look at the last few lines:

tail -n 999 /var/log/syslog | less

Log files from a process always contain the process ID:

Apr 10 00:00:01 harfang /USR/SBIN/CRON[345]: (root) CMD ( /usr/local/bin/midnight-stuff )
Apr 10 00:00:01 darkstar wibbled[1234]: I'm bored
Apr 10 00:00:01 darkstar wibbled[1234]: I'm still bored
Apr 10 00:00:01 darkstar wibbled[1234]: I'm bored
Apr 10 00:00:02 darkstar wibbled[1234]: I'm still bored
Apr 10 00:00:02 darkstar wibbled[1234]: I'm bored

This tells you that process 1234, which is an instance of the wibbled daemon, is producing a lot of log messages. You may want to kill it and check its configuration.

If kern.log is growing a lot, your logs aren't coming from a process but from the kernel. Flooding in the kernel logs is rarer and can be harder to pin down. It can be due to a process that's being respawned in a tight loop and is crashing immediately (perhaps due to low memory on the system). It can also be due to a buggy driver. You need to look at the messages to understand the cause.

In your case, you're seeing a backtrace from a driver. The driver is encountering a non-fatal error incessantly. Try unloading it:

rmmod ath9k

(Why ath9k? Because that's the driver that provides the function ath9k_reg_rmw, but actually because the module name would be mentioned a few lines further up from the bit you included in your question.) If the driver isn't in a module or cannot be unloaded, look for another way to disable it or stop triggering its bug; how to do that depends on what driver it is and what's wrong with it.

Answered by Gilles 'SO- stop being evil' on December 12, 2021

There is actually a strong hint in the syslog snippet you posted. The end of the line

Apr 10 00:53:37 MyMachine kernel: [11608.690733]  [<ffffffffa08e4005>] ? ath9k_reg_rmw+0x35/0x70 [ath9k_htc]

shows the stack trace is due to an unexpected error in a device driver named ath9k_htc. Hopefully, the kernel didn't panicked but the continuous repetition of errors is filling your file system.

I would then blacklist the ath9k_htc wifi driver using this command then rebooting:

echo "blacklist ath9k_htc" | sudo tee -a /etc/modprobe.d/blacklist.conf

Beware though that doing so might prevent your wifi to work if the ath9k_htc driver was nevertheless used and functional despite the errors.

You can check if a wifi device expected by the ath9k_htc driver is present in your machine by running lsusb and see if a device match one of the list available here: https://wiki.debian.org/ath9k_htc

Answered by jlliagre on December 12, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP