DiskSorter Logo
Flexense Data Management Software

File Classification Performance Options

DiskSorter is optimized for modern multi-core and multi-CPU systems and is capable of classifying files stored on multiple disks, directories or network shares in parallel using all CPUs installed in the computer. DiskSorter provides a number of different performance optimization options allowing one to tune the file classification operations for user-specific hardware and storage configurations.

File Classification Performance Options

In order to customize the file classification performance optimization options, open the file classification operation dialog, press the 'Options' button and select the 'Advanced' tab. The 'File Classification Threads' option controls how many parallel threads are used to classify files. The 'Directories Scanning Threads' option controls how many parallel threads are used to scan input disks, directories and network shares. In the 'Fault-Tolerant' directory scanning mode, DiskSorter uses an individual processing thread for each input disk, directory or network share, but limits the maximum number of parallel scanning threads to the specified value. In the high-performance directory scanning mode, DiskSorter always uses the specified number of parallel directory scanning threads even when processing a single input disk, directory or network share.

NVMe SSD Disk File Classification Performance

For example, when classifying files stored on a high-speed NVMe SSD disk, DiskSorter reaches up to 63,000 files/sec using a single file classification thread. With two parallel file classification threads, the performance scales up to 98,000 files/sec and with four parallel file classification threads, the performance increases up to 123,000 files/sec showing a very good multi-threaded performance scalability. With six processing threads the file classification performance reaches up to 127,000 files/sec and with eight processing threads the performance increases up to 129,000 files/sec, which makes it possible to classify and categorize 10 millions files within two minutes.

When classifying files stored on regular SATA SSD drives, which are significantly slower than NVMe SSD drives, the performance of the file classification process reaches up to 24,000 files/sec using a single file classification thread and scales up to 69,000 files/sec with four parallel file classification threads. With eight parallel file classification threads, the performance reaches up to 98,000 files/sec, which allows to classify and categorize huge numbers of files relatively fast.

SATA SSD Disk File Classification Performance

Classifying files stored on a NAS storage device via a network is more complicated because the user needs to take into account the speed and the latency of the network. If the computer, on which DiskSorter is installed, is connected to the NAS storage device via a high-speed, low-latency network, the performance of the file classification operations may reach up to 12,000 files/sec with one file classification thread, scale up to 49,000 files/sec with four parallel file classification threads and increase up to 82,000 files/sec with eight parallel file classification threads.

NAS Server File Classification Performance

On the other hand, if DiskSorter will need to access network shares via the Internet or via a long-distance, high-latency network, the performance of the file classification operations will be relatively slow. One of the options to increase the performance of the file classification operations in such configurations is to set the 'High-Performance' directory scanning mode and increase the number of parallel directory scanning threads to 16 or even 32 disregarding how many CPUs are actually installed on the computer.

Classifying millions of files stored in one or more large NAS servers may be a very time consuming operation and one of the ways to speed-up the file classification process is to perform the file classification operation via the 2.5 Gigabit Ethernet network. With 2.5 Gigabit Ethernet the performance of the DiskSorter file classification operations continues to scale up to 80,000 Files/Sec with 8 parallel file classification threads.

2.5 Gigabit Ethernet NAS Server File Classification Performance

Due to a very wide adoption of laptops and NAS servers with built-in WiFi network interfaces, many users may consider classifying files stored in NAS servers via the wireless network. But, the latency of the wireless network is much higher and therefore it will take much more time to complete the file classification operation via the wireless network. The question is how much longer the user will need to wait and if it will save any significant amount of time to perform the file classification operation via a wired network.

WiFi NAS Server File Classification Performance

Based on our benchmarks, via a 5 GHz wireless network, DiskSorter reaches up to 1,500 Files/Sec with a single file classification thread and scales up to 12,000 Files/Sec with 8 parallel file classification threads, which is approximately 5 times slower compared to the standard Gigabit Ethernet and 7 times slower when compared to the 2.5 Gigabit Ethernet. So, if the user needs to classify up to 1 million files, it is possible to use a wireless network. On the other hand, if the user needs to classify a large NAS server with many millions of files, a low-latency Gigabit Ethernet or 2.5 Gigabit Ethernet is required.

Modern USB flash drives provide plenty of the storage space and are reasonably fast allowing one to store vast amounts of data for backup purposes. Sometimes, it may be required to classify and categorize files on a USB flash drive in order to free the used disk space. When classifying files stored on a USB flash drive, DiskSorter can reach up to 15,000 files/sec with a single file classification thread. With two parallel file classification threads, the performance increases up to 18,000 files/sec, which is good, but more than two file classification threads will slightly degrade the performance of the file classification operations.

USB Flash Disk File Classification Performance

Today, modern IT environments widely deploy virtual servers and/or virtual workstations. Most of the popular virtualization platforms provide a high level of performance, but anyway some performance degradations are inevitable when a file classification operation is executed on a guest virtual machine compared to the same file classification operation executed directly on the host computer.

Virtual Machine File Classification Performance

For example, when a virtual machine with 4 virtual CPUs is stored on an NVMe SSD disk and classifying files stored on a virtual local disk drive, which is physically stored on the same NVMe SSD disk, the performance of the file classification operations reaches up to 25,000 files/sec using a single file classification thread. With two parallel file classification threads, the performance of the file classification operations scales up to 37,000 files/sec and with four parallel file classification threads, the performance of the file classification operations increases up to 48,000 files allowing one to classify and categorize huge numbers of files relatively fast.