Bugzilla – Bug 12251
Filesize >2GB ignores --alert-exceeds-max --max-filesize --max-scansize settings
Last modified: 2019-03-11 17:27:51 EDT
ClamAV should be using the large-file-extension stat64() so that the code can at successfully obtain any too-large file size, and then respond with the proper: Heuristics.Limits.Exceeded FOUND as documented, rather than incorrectly reporting a misleading error: Can't get file status ERROR in particular, when much smaller limits --alert-exceeds-max=yes --max-filesize=1024M --max-scansize=1024M are specified. SUSPECTED BUG CAUSE It appears that your code is relying on the old stat() structure, where the file size value was limited to 2GB by the signed 32bit "off_t" type (at least when compiled with the Posix compatible sys/types.h of Visual C++). Referencing any larger file, as supported by any contemporary file system, will thus cause the stat() to fail, instead of returning the actual file size, which could then be compared to -max-filesize. PS: Even more confusing, trying to specify a value >4GB (e.g., --max-filesize=8096M --max-scansize=8096M) will report: WARNING: Numerical value for option max-filesize too high, resetting to 4G WARNING: Numerical value for option max-scansize too high, resetting to 4G which is of course also bogus, because the "hard-coded" true maximum is 2 GB.
Hi Andy, Thanks for raising this issue. On Linux systems, stat() when compiled with -D_FILE_OFFSET_BITS=64 automatically converts off_t to a 64bit width. I'm not certain the same is true on all UNIX systems and I don't believe it to be true on Windows systems. I suspect the code was written with Linux in mind and didn't account for the need to use the _stat64() function and struct stat64 type explicitly on Windows. We have had some file size related issues reported from Windows users sometime last year who were trying to set max filesize to 4GB. The bugzilla bugs are still open, though I haven't looked to track them down. I suspect it's the same issue at play. I wonder if the best approach would be to write a compatibility library for file operations that will use the correct APIs for each system and then replace all calls throughout clamav-devel to use the the new API.
(In reply to Micah Snyder from comment #1) Hi Micah - I admit that I'm out of my league here. From what I understand, in Linux you'd normally activate the three POSIX macros /D "_FILE_OFFSET_BITS=64" /D "_LARGEFILE_SOURCE" /D "_LARGEFILE64_SOURCE" The first two make the large-file version of the API available, under their own function names. The latter transparently maps the large-file functions to the old function names, and also changes the off_t type to be 64 bits. As you said, apparently the header files of POSIX compatibility layer for Visual C++ does not implement that _FILE_OFFSET_BITS macro. I wonder if operating system-dependent preprocessing would be feasible, e.g: #if /* linux */ #define _FILE_OFFSET_BITS 64 #endif #include <stdio.h> #if /* windows */ #define fseeko _fseeki64 #define ftello _ftelli64 #define stat _stat64 … etc … typedef __int64 off_t; #endif #if /* other */ #define fseeko fseek #define ftello ftell #define stat stat … etc … typedef long off_t; #endif (Again, please do take as a grain of salt - because this is not my field of expertise...)
Andy, No worries it's usually a learning process for me as well. Writing 100% cross-platform compatible code is always a bear. For the preprocessor stat64 definitions, ClamAV sets them on posix systems using an autoconf check, if the feature is available: https://github.com/Cisco-Talos/clamav-devel/blob/dev/0.102/m4/reorganization/code_checks/stat64.m4#L27 RARLab's UnRAR library takes this approach when attempting to seek in a file: https://github.com/Cisco-Talos/clamav-devel/blob/dev/0.102/libclamunrar/file.cpp#L477 Using the native Win32 API in a compatibility layer is one way to do it. Is _fseeki64 available on x86 Windows systems? If we redefined any of the functions and types it may introduce difficult to spot bugs where the header was included out of order, or wasn't included at all. We'd have to use an original name. Traditionally, ClamAV uses the `cli_` prefix (e.g. `cli_malloc`) for such things.
(In reply to Micah Snyder from comment #3) Hi Micah - here the link to function list of Microsoft's *universal* C runtime library: https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/crt-alphabetical-function-reference?view=vs-2017#f _fseeki64() is indeed included. Windows x86 vs. x64 just determines the addressability of memory, related to the the address bit-length of the processor. That is completely separate from the number of bits that integer values (such as the file offset) may hold. Yes, both Windows x86 and x64 do support large file systems, including those related C functions implementing 64 bit offset values.