Bug 12563 - On-Access Scanning "ERROR: ClamCom: TIMEOUT while waiting on socket (recv)
On-Access Scanning "ERROR: ClamCom: TIMEOUT while waiting on socket (recv)
Status: RESOLVED FIXED
Product: ClamAV
Classification: ClamAV
Component: other
stable
x86_64 GNU/Linux
: P3 blocker
: 0.101.0
Assigned To: ClamAV team
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2020-05-21 15:39 EDT by Aaron Brighton
Modified: 2021-03-24 13:33 EDT (History)
5 users (show)

See Also:
QA Contact:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Aaron Brighton 2020-05-21 15:39:39 EDT
Overview
--------

Enabling On-Acccess Scanning cripples file operations and fails to detect malicious files.  Logs seem to indicate an issue with clamonacc's ability to communicate with clamd service.  clamdscan works without issue.


Steps to Reproduce
------------------

With AppArmor:

sudo -i
apt-get install clamav-daemon
systemctl enable clamav-daemon
printf "ScanArchive true\nDetectPUA true\nOnAccessPrevention true\nOnAccessExcludeUname clamav\nOnAccessIncludePath /opt" >> /etc/clamav/clamd.conf
sed -i 's/LogVerbose false/LogVerbose true/g'
sysemctl start clamav-daemon
clamonacc --verbose --log=/var/log/clamav/clamonacc.log --fdpass
mkdir /opt/testfolder
chown ubuntu /opt/testfolder
su ubuntu
cd /opt/testfolder
wget http://www.eicar.org/download/eicar.com
echo "test" > testfile.com
clamdscan --fdpass --verbose .

Without AppArmor:

sudo -i
systemctl stop apparmor
systemctl disable apparmor
sed -i 's/^GRUB_CMDLINE_LINUX=.*/GRUB_CMDLINE_LINUX="apparmor=0 security=\\"\\""/' /etc/default/grub
update-grub
apt-get remove apparmor
reboot
(rerun commands from the "With AppArmor" from above)

Actual Results
--------------

The wget and echo commands above result in a multi-second pause/delay and eventually complete successfully.

Logs show the following:

/var/log/clamav/clamav.log:
...
Thu May 21 18:12:50 2020 -> Client disconnected (FD 9)
Thu May 21 18:13:50 2020 -> Client disconnected (FD 9)

/var/log/clamav/clamonacc.log:
...
ClamFanotif: attempting to feed consumer queue
ClamWorker: performing scanning on file '/opt/testfolder/eicar.com'
ERROR: ClamCom: TIMEOUT while waiting on socket (recv)
ClamClient: connection could not be established ... return code 12
ClamFanotif: attempting to feed consumer queue
ClamWorker: performing scanning on file '/opt/testfolder/testfile.com'
ERROR: ClamCom: TIMEOUT while waiting on socket (recv)
ClamClient: connection could not be established ... return code 12

$ clamdscan --fdpass --verbose .
/opt/testfolder/./eicar.com: Win.Test.EICAR_HDB-1 FOUND

----------- SCAN SUMMARY -----------
Infected files: 1
Time: 10.013 sec (0 m 10 s)
Start Date: 2020:05:21 19:33:13
End Date:   2020:05:21 19:33:23


Expected Results
----------------

Minimal I/O latency writing the file testfile.com and blocking access to writing eicar.com.


Build Date & Hardware
---------------------

Hardware:
AWS EC2
Instance Type: t3.medium
AMI: ubuntu/images/hvm-ssd/ubuntu-bionic-18.04-amd64-server-20200408 (ami-085925f297f89fce1)

OS/Kernel:
Ubuntu 18.04.4 LTS / 4.15.0-1065-aws

ClamAV Versions Tested:
0.102.3+dfsg-0ubuntu0.18.04.1
0.102.2+dfsg-0ubuntu0.18.04.1
ClamAV 0.103.0-devel-20200521/25819/Thu May 21 12:20:55 2020


Additional Information:

Others seem to have this problem as well: https://forum.openmediavault.org/index.php?thread/31574-clamav-connection-timeout-error/#post240398

Set severity to "blocker" as this is a blocker for us implementing ClamAV due to the compliance requirements of having real-time scanning.
Comment 1 Mickey Sola 2020-05-26 13:59:59 EDT
Hey Aaron,

Thanks for the bug report here.

It would be helpful when reproducing this if you could provide the full config file that results from:

printf "ScanArchive true\nDetectPUA true\nOnAccessPrevention true\nOnAccessExcludeUname clamav\nOnAccessIncludePath /opt" >> /etc/clamav/clamd.conf

Cheers,
Mickey
Comment 2 Aaron Brighton 2020-05-26 15:28:18 EDT
(In reply to Mickey Sola from comment #1)
> Hey Aaron,
> 
> Thanks for the bug report here.
> 
> It would be helpful when reproducing this if you could provide the full
> config file that results from:
> 
> printf "ScanArchive true\nDetectPUA true\nOnAccessPrevention
> true\nOnAccessExcludeUname clamav\nOnAccessIncludePath /opt" >>
> /etc/clamav/clamd.conf
> 
> Cheers,
> Mickey

Hi Mickey,

Thanks for picking this up, it's pretty much defaults with exceptions to the last 5 lines added by the command you referenced.

For the latest development source installation:

ClamAV 0.103.0-devel-20200521/25819/Thu May 21 12:20:55 2020

root@ip-172-31-90-164:~# grep -v "^#\|^$" /etc/clamd.conf
LogTime yes
LogClean yes
LogRotate yes
LocalSocket /tmp/clamd.socket
User clamav
ScanArchive true
DetectPUA true
OnAccessPrevention true
OnAccessExcludeUname clamav
OnAccessIncludePath /opt


For the Ubuntu packages:

0.102.3+dfsg-0ubuntu0.18.04.1
0.102.2+dfsg-0ubuntu0.18.04.1

root@ip-172-31-87-202:~# grep -v "^#\|^$" /etc/clamav/clamd.conf
LocalSocket /var/run/clamav/clamd.ctl
FixStaleSocket true
LocalSocketGroup clamav
LocalSocketMode 666
User clamav
ScanMail true
ScanArchive true
ArchiveBlockEncrypted false
MaxDirectoryRecursion 15
FollowDirectorySymlinks false
FollowFileSymlinks false
ReadTimeout 180
MaxThreads 12
MaxConnectionQueueLength 15
LogSyslog false
LogRotate true
LogFacility LOG_LOCAL6
LogClean false
LogVerbose false
PreludeEnable no
PreludeAnalyzerName ClamAV
DatabaseDirectory /var/lib/clamav
OfficialDatabaseOnly false
SelfCheck 3600
Foreground false
Debug false
ScanPE true
MaxEmbeddedPE 10M
ScanOLE2 true
ScanPDF true
ScanHTML true
MaxHTMLNormalize 10M
MaxHTMLNoTags 2M
MaxScriptNormalize 5M
MaxZipTypeRcg 1M
ScanSWF true
ExitOnOOM false
LeaveTemporaryFiles false
AlgorithmicDetection true
ScanELF true
IdleTimeout 30
CrossFilesystems true
PhishingSignatures true
PhishingScanURLs true
PhishingAlwaysBlockSSLMismatch false
PhishingAlwaysBlockCloak false
PartitionIntersection false
DetectPUA false
ScanPartialMessages false
HeuristicScanPrecedence false
StructuredDataDetection false
CommandReadTimeout 30
SendBufTimeout 200
MaxQueue 100
ExtendedDetectionInfo true
OLE2BlockMacros false
AllowAllMatchScan true
ForceToDisk false
DisableCertCheck false
DisableCache false
MaxScanTime 120000
MaxScanSize 100M
MaxFileSize 25M
MaxRecursion 16
MaxFiles 10000
MaxPartitions 50
MaxIconsPE 100
PCREMatchLimit 10000
PCRERecMatchLimit 5000
PCREMaxFileSize 25M
ScanXMLDOCS true
ScanHWP3 true
MaxRecHWP3 16
StreamMaxLength 25M
LogFile /var/log/clamav/clamav.log
LogTime true
LogFileUnlock false
LogFileMaxSize 0
Bytecode true
BytecodeSecurity TrustSigned
BytecodeTimeout 60000
OnAccessMaxFileSize 5M
ScanArchive true
DetectPUA true
OnAccessPrevention true
OnAccessExcludeUname clamav
OnAccessIncludePath /opt
Comment 3 Aaron Brighton 2020-06-03 11:37:24 EDT
Hi Mickey,

Wanted to provide an update.  I tested this on Amazon Linux 2 (similar to CentOS/Red Hat).  I can reproduce the issue there as well.

In my testing though, I did find a work-around.  This also might help identify specifically where the issue is.

If I change the parameters passed to clamonacc when launching it, specifically if I drop --fdpass, On-Access Scanning works fine.

Instead of: 

clamonacc --verbose --log=/var/log/clamav/clamonacc.log --fdpass

I use:

clamonacc --verbose --log=/var/log/clamav/clamonacc.log

That said, clamdscan works fine with --fdpass.  I would assume clamonacc should be doing more or less the same thing when using --fdpass as clamdscan, however, this clearly isn't the case.  So it's possible comparing the --fdpass implementation between these two utilities will reveal the issue.

Unfortunately, having had to drop --fdpass, clamd now needs to run as a privileged user to be able to access the files we want to scan, so not ideal. We'd like to see the --fdpass functionality restored so that we can drop clamd from running as root.

Hope this update helps, and looking forward to any further investigation/fixes that can be provided as time permits.  Assuming there isn't a user error component to this, it might be worth updating the documentation to reflect the need to run clamd as root when using clamonacc with --fdpass functionality?

Thanks,
Aaron
Comment 4 Aaron Brighton 2020-06-28 18:55:14 EDT
Some additional information:

Previously, I made an incorrect assumption about when the file descriptor is being passed to clamd.

Looks like by default, if you don't specify "--fdpass", the file descriptor is indeed passed.  As per the notes in this blog post from last year.

"Generally, if you are running clamd on the same system as clamonacc, you will be using a local unix socket and file descriptor passing is enabled by default." - https://blog.clamav.net/2019/09/understanding-and-transitioning-to.html

What I continue to find odd, is that if you DO specify "--fdpass" when launching clamonacc (as I was doing initially based on how clamdscan works), the bug detailed in this ticket occurs and clamonacc doesn't work.  Based on the description in that blog post, shouldn't functionality be the same seeing as file descriptor passing is turned on by default?

Is the way clamonacc passes file descriptors different by default, than with "--fdpass" turned on?  Likewise, is the "--fdpass" implementation in clamonacc different than that in clamdscan?

It's all a bit confusing.
Comment 5 jblaine 2020-07-01 16:53:38 EDT
Hi all. I don't know if I should be opening another bug report or glomming onto this one, but we're getting the same thing on CentOS 7.8 running 0.102.3-1 from EPEL. Very basic configuration with clamd + clamonacc (no --fdpass option specified in systemd unit file that came with the package).

OnAccessMaxFileSize 100M
OnAccessIncludePath /home
OnAccessPrevention yes

Aside, is "NEEDINFO" the proper status for this ticket?

Below I make a random 20MB file and try to read it (sending the stream to /dev/null). I hit Ctrl-C and give up at the 3m31s mark. This is on an otherwise unloaded 2CPU 8GB VM running CentOS 7.8. As soon as I start reading it, clamd eats 100% on both CPUs then eventually tapers back to 100% of 1 CPU and ~30% of the other CPU for the clamonacc process.

[jblaine@myhost ~]$ dd if=/dev/zero of=foo bs=1024 count=20000
20000+0 records in
20000+0 records out
20480000 bytes (20 MB) copied, 0.18055 s, 113 MB/s
[jblaine@myhost ~]$ time cat foo > /dev/null
^C
[jblaine@myhost ~]$

These all correspond to that "cat":

Jul  1 16:46:56 myhost.our.org clamonacc: ERROR: ClamCom: TIMEOUT while waiting on socket (recv)
Jul  1 16:47:02 myhost.our.org clamonacc: ERROR: ClamCom: TIMEOUT while waiting on socket (recv)
Jul  1 16:47:08 myhost.our.org clamonacc: ERROR: ClamCom: TIMEOUT while waiting on socket (recv)
Jul  1 16:47:14 myhost.our.org clamonacc: ERROR: ClamCom: TIMEOUT while waiting on socket (recv)
Jul  1 16:47:20 myhost.our.org clamonacc: ERROR: ClamCom: TIMEOUT while waiting on socket (recv)
Jul  1 16:47:27 myhost.our.org clamonacc: ERROR: ClamCom: TIMEOUT while waiting on socket (recv)
Comment 6 Micah Snyder 2020-07-07 17:38:58 EDT
(In reply to Aaron Brighton from comment #4)
> Some additional information:
> 
> Previously, I made an incorrect assumption about when the file descriptor is
> being passed to clamd.
> 
> Looks like by default, if you don't specify "--fdpass", the file descriptor
> is indeed passed.  As per the notes in this blog post from last year.
> 
> "Generally, if you are running clamd on the same system as clamonacc, you
> will be using a local unix socket and file descriptor passing is enabled by
> default." -
> https://blog.clamav.net/2019/09/understanding-and-transitioning-to.html
> 
> What I continue to find odd, is that if you DO specify "--fdpass" when
> launching clamonacc (as I was doing initially based on how clamdscan works),
> the bug detailed in this ticket occurs and clamonacc doesn't work.  Based on
> the description in that blog post, shouldn't functionality be the same
> seeing as file descriptor passing is turned on by default?
> 
> Is the way clamonacc passes file descriptors different by default, than with
> "--fdpass" turned on?  Likewise, is the "--fdpass" implementation in
> clamonacc different than that in clamdscan?
> 
> It's all a bit confusing.

Hi Aaron,

I saw your blog post https://medium.com/@aaronbrighton/installation-configuration-of-clamav-antivirus-on-ubuntu-18-04-a6416bab3b41

Really great write-up. You had some very good ideas and good advice (for us and for others).

I'm less familiar with the clamonacc utility than Mickey is, though unfortunately she's had less time to focus on it as of late. I believe that the `--fdpass` option is supposed to be the same as default behavior (effectively a no-op), but it sounds like it's actually broken.  Omitting `--fdpass` is the way to go.

-Micah
Comment 7 Micah Snyder 2020-07-07 18:24:09 EDT
(In reply to jblaine from comment #5)
> Hi all. I don't know if I should be opening another bug report or glomming
> onto this one, but we're getting the same thing on CentOS 7.8 running
> 0.102.3-1 from EPEL. Very basic configuration with clamd + clamonacc (no
> --fdpass option specified in systemd unit file that came with the package).
> 
> OnAccessMaxFileSize 100M
> OnAccessIncludePath /home
> OnAccessPrevention yes
> 
> Aside, is "NEEDINFO" the proper status for this ticket?
> 
> Below I make a random 20MB file and try to read it (sending the stream to
> /dev/null). I hit Ctrl-C and give up at the 3m31s mark. This is on an
> otherwise unloaded 2CPU 8GB VM running CentOS 7.8. As soon as I start
> reading it, clamd eats 100% on both CPUs then eventually tapers back to 100%
> of 1 CPU and ~30% of the other CPU for the clamonacc process.
> 
> [jblaine@myhost ~]$ dd if=/dev/zero of=foo bs=1024 count=20000
> 20000+0 records in
> 20000+0 records out
> 20480000 bytes (20 MB) copied, 0.18055 s, 113 MB/s
> [jblaine@myhost ~]$ time cat foo > /dev/null
> ^C
> [jblaine@myhost ~]$
> 
> These all correspond to that "cat":
> 
> Jul  1 16:46:56 myhost.our.org clamonacc: ERROR: ClamCom: TIMEOUT while
> waiting on socket (recv)
> Jul  1 16:47:02 myhost.our.org clamonacc: ERROR: ClamCom: TIMEOUT while
> waiting on socket (recv)
> Jul  1 16:47:08 myhost.our.org clamonacc: ERROR: ClamCom: TIMEOUT while
> waiting on socket (recv)
> Jul  1 16:47:14 myhost.our.org clamonacc: ERROR: ClamCom: TIMEOUT while
> waiting on socket (recv)
> Jul  1 16:47:20 myhost.our.org clamonacc: ERROR: ClamCom: TIMEOUT while
> waiting on socket (recv)
> Jul  1 16:47:27 myhost.our.org clamonacc: ERROR: ClamCom: TIMEOUT while
> waiting on socket (recv)

Hi jblaine,

Using your instructions, I was able to reproduce maxing out a CPU while running `time cat foo > /dev/null`, though I didn't observe the "ERROR: ClamCom: TIMEOUT while waiting on socket (recv)" message.  

Do you have one of OnAccessExcludeUID, OnAccessExcludeUname, or OnAccessExcludeRootUID set so that clamonacc doesn't block clamd?

-Micah
Comment 8 jblaine 2020-07-08 19:02:01 EDT
(In reply to Micah Snyder from comment #7)
> Using your instructions, I was able to reproduce maxing out a CPU while
> running `time cat foo > /dev/null`, though I didn't observe the "ERROR:
> ClamCom: TIMEOUT while waiting on socket (recv)" message.  
> 
> Do you have one of OnAccessExcludeUID, OnAccessExcludeUname, or
> OnAccessExcludeRootUID set so that clamonacc doesn't block clamd?

We have OnAccessExcludeRootUID set.
Comment 9 jblaine 2020-07-08 19:06:42 EDT
Sorry, hit save too soon. The whole config deviating from any defaults is:

ExcludePath /var/lib/clamav/quarantined
OnAccessMaxFileSize 200M
OnAccessIncludePath /home
OnAccessExcludeRootUID yes
OnAccessPrevention yes
Comment 10 jblaine 2020-07-21 11:31:27 EDT
Any thoughts, anyone? This issue effectively makes typical OnAccess scanning impossible to use.
Comment 11 Mickey Sola 2020-07-21 13:05:39 EDT
To confirm, you are running clamd as root?
Comment 12 jblaine 2020-07-21 16:17:11 EDT
Correct. Running clamd as root.
Comment 13 jblaine 2020-08-13 10:47:08 EDT
Any thoughts Mickey, Micah, or other developers?
Comment 14 Mickey Sola 2020-08-14 13:03:48 EDT
Looking at this issue in more depth is at the top of my todo list for today. Should have more thoughts on what's happening in clamonacc for you late, hopefully with an idea of how to fix it (or an actual fix if it turns out to be something simple).
Comment 15 jblaine 2020-08-21 17:33:19 EDT
Find anything?
Comment 16 Chris 2020-09-03 07:43:02 EDT
I was facing the same issue and resolved it by making the following changes to my config file:

OnAccessExcludeRootUID yes
OnAccessExcludeUname root 
OnAccessRetryAttempts 3

Then restarted both clamd and clamonacc services and it worked.

Centos 8 SELinux enabled.

Apologies for the brevity. Spent almost a full day trying to figure this out. If I have time I will come back with more details. Just wanted to share to hopefully help someone else out.
Comment 17 Mickey Sola 2020-09-03 10:35:45 EDT
Long story short, fdpassing is broken. Default behaviour doesn't use fdpassing, but instead continuous streaming to a local socket. My mistake. This should have been caught ages ago, tbh.

Working on a fix.

In the interim, don't use the fdpass option and things should work just fine, albeit slower than they would if fdpassing weren't broken.
Comment 18 Chris 2020-09-03 10:58:37 EDT
(In reply to Mickey Sola from comment #17)
> Long story short, fdpassing is broken. Default behaviour doesn't use
> fdpassing, but instead continuous streaming to a local socket. My mistake.
> This should have been caught ages ago, tbh.
> 
> Working on a fix.
> 
> In the interim, don't use the fdpass option and things should work just
> fine, albeit slower than they would if fdpassing weren't broken.

Yes. forgot to mention in my earlier comment that removing the --fdpass option was also something I did to get everything working.

Thanks Mickey.
Comment 19 jblaine 2020-09-10 17:45:02 EDT
Thanks for the update. I'll have to figure out why I didn't get email about them 7 days ago.

We're already not using --fdpass though (unless I'm misunderstanding something). Our clamonacc unit file, which comes from the EPEL RPM 'clamav' reads:

[Unit]
Description=Clam AntiVirus userspace daemon for OnAccess Scanning
Documentation=man:clamd(8) man:clamd.conf(5) https://www.clamav.net/documents/
ConditionPathExists=/etc/clamd.d/scan.conf
After=clamd@scan.service

[Service]
ExecStart=/usr/bin/clamonacc --foreground

[Install]
WantedBy=multi-user.target
Comment 20 jblaine 2020-10-08 11:33:22 EDT
We are still being affected by this and are not using --fdpass.
Comment 21 Mickey Sola 2020-10-08 13:30:22 EDT
Hi J,

Our focus right now is on fixing the fdpass issue. Good progress has been made there. I know what's exactly causing it (spoiler, it's related to curl limitations and the kernel interface for passing file descriptors from one process to another) and have successfully built out a (hardcoded) prototype which fixes the issue. Next steps require a bit of refactoring, and then testing and cleanup work, but at least there's progress.

As for your specific issue, like Micah, I had difficult reproducing your problem. I would recommend giving the Retry option a go to see if that resolves things for you at all.

Best,
Mickey
Comment 23 jblaine 2021-03-24 13:33:29 EDT
This issue persists with 0.103.1 on RHEL/CentOS 7.9 for us (package from EPEL).

From the systemd clamonacc.service file:
    ExecStart=/usr/sbin/clamonacc -F --config-file=/etc/clamd.d/scan.conf

Our test:

foo is a 20MB file

$ time cat foo > /dev/null

real    0m43.890s
user    0m0.003s
sys     0m0.018s
$

And in the log data:

clamonacc: ERROR: ClamCom: TIMEOUT while waiting on socket (recv)
clamonacc: ERROR: ClamCom: TIMEOUT while waiting on socket (recv)

The same operation with 0.103.1 on RHEL 8 (from EPEL) takes 0m0.006s real time.