Introduction to the world of Malware

7 min readOct 19, 2020

Using the Internet exposes many users to malware. The latter is a software used to violate the security of computer systems in terms of confidentiality, integrity and availability of data. It can add, modify or remove any program from the system to intentionally interfere with its functions.

Many protection systems have been designed and implemented to block these attacks. Malware detectors are protection systems widely used in IT environments, often known as antivirus or anti-malware. While anti-viruses are often successful in detecting previously seen malware, they fail to detect new malware that can cause big harm to the victims.

The ransomware attack in May 2017 is a good example. It has infected around 230,000 machines in 150 countries. It caused the loss of $ 4,000 million. The NHS hospital in England lost around 92 million euros and canceled 19,000 scheduled appointments. This attack and others have forced cybersecurity experts and vendors to reconsider the quality and the effectiveness of their cybersecurity solutions.

1. DEFINITIONS AND FUNDAMENTAL CONCEPTS

In an IT context, security includes cybersecurity and physical security. The goal of cybersecurity is to mitigate risk and protect IT resources against malicious attacks. Cybersecurity solutions include: network security and control, data protection (DLP), intrusion detection and prevention (IDS, IPS), identity & access management (IAM) and malware detection (anti-malware, antivirus).

1.1. Definition of malware

The term malware is an abbreviation for Malicious Software. A malware aims to harm computers by stealing information, corrupting files, or simply performing malicious activities. McGraw & Morrisett define malicious code as: any code added, modified or removed from a system or software in order to intentionally cause harm or to corrupt the intended function of the system.

1.2. Types of malware

Nowadays, Malwares come in many forms. Though the types differ, identifying the exact type of a malware in practice is not straightforward for two reasons. The first is the fact that malwares can contain malicious code of different types. The second reason is the camouflage applied by malwares creators to bypass anti-malwares. In the following we present a set of the most common types:

Viruses: Viruses infect computers and files by replicating themselves. A virus cannot exist independently without a host. It links itself to executable files and even other types of files.

Worms: Worms can exist and reproduce independently of other files. They spread through storage devices or over the network, infecting as many users as possible.

Trojan horses: A Trojan horse is often injected into an application which appears to be useful. It steals confidential information, observes user activity and makes modifications to host system files.

Rootkit: A rootkit gives the attacker remote access and control over the victim. It opens a backdoor to install malware or to use the system for other attacks.

Spyware: It is a malware that monitors and collects personal information about the user and sends it back to the attacker.

Ransomware: It is a type of malware that prevents users from accessing their systems or personal files and demands payment of a ransom in order to restore access.

1.3. Malware detectors

A malware detector is just an implementation of certain malware detection technique(s). It tries to protect the system by identifying malicious elements. Nowadays, malware detectors are integrated into more sophisticated antivirus solutions. In general, we refer to anti-malwares by the names of commercial products (Kaspersky, Avast, ESET…) and we do not differentiate between the terms anti-malware and antivirus. They usually do not perform detection only, they include three fundamental tasks:

Prevention: prevent the spread of malware by controlling network traffic and data from the Internet.

Detection: analyze files that reside on the system and inspect running processes and even network traffic to identify possible malware using one or more detection techniques.

Cleaning: remove the malware and any traces it left behind, this may include correcting the damage caused.

2. MALWARE ANALYSIS AND DETECTION TECHNIQUES

There are two types of malware detection methods: Signature-based approaches and behavioral anomaly (heuristic) approaches. The first generations of antivirus were signature-based. A signature is a set of binary characteristics extracted from an executable to characterize it. Its major drawback is the inability to detect new malwares. In addition, it is extracted by manual analysis which can take several days. To overcome these limitations, later generations implemented the second detection method which is based on malicious behavior. They are heuristics that try to classify executable files based on the knowledge of some rules and some specifications on the behavior of malwares. Since it is a heuristic, it cannot detect a behavior different from the predefined (programmed) one. Specifications-based heuristics is a sub-type that has performed well in minimizing false alerts. This is why many security experts consider it as a third approach.

Finally and regardless of the approach used, malware detection can be done in a static, dynamic or hybrid way. These points are discussed in what follows.

2.1. Static analysis

This involves analyzing the program by examining it as a binary file that has a known and predefined format. In this technique, reverse engineering is performed using a disassembly tool, a debugger, as well as assembly code analysis tools in order to understand the structure of malware. Although simple, static analysis is sufficient to detect a large class of malware and to generate their signatures. However, it remains limited and almost ineffective against camouflaged or protected programs.

2.2. Dynamic analysis

The programs are analyzed in a simulated environment such as a virtual machine, a simulator, an emulator or a Sand Box. Then, analysts use inspection and access control tools to understand the behavior of the program. The dynamic analyses allows to detect new malwares. However, it takes longer. It takes time to prepare the environment, run malware, and examine the executable trace. Dynamic analysis is also used to understand the effect of malware on a system so that it can produce patches.

2.3. Static Analysis vs Dynamic Analysis

For its simplicity and speed, static analysis is always used as a first scan in malware detection. However, most malware has techniques to overcome it. Packaging, compression, obfuscation and encryption are used by malware creators to protect their programs. Despite this, static analysis always leads to a minimum of false positives.
Dynamic analysis, which tracks the actions of software under inspection, has overcome the drawbacks of static analysis. Packaged or encrypted malware should revert to its binary (unencrypted) state when it loads into memory. Unfortunately, many malware come with a dynamic scanner detection algorithm. Worse than that, they can determine the type of antivirus that scans them and even the virtualization technique used which allows them to exploit its vulnerabilities to pass malicious code to the host system. Other malicious programs dynamically change their behavior when they discover they are under analysis.

3. EVOLUTION OF MALWARES

When malware developers realize that their malware is going to be detected, they try to bypass detection by applying various camouflage techniques. In this section, we present some of these strategies.

3.1. Obfuscation

In this technique, malware developers try to alter the structure and presentation of the code to skew the result of the scan. Obfuscation actions can bypass signature-based detection. Several techniques exist, we present the best known: dead-code insertion, functions and macros reordering, code transposition, instruction substitutions, code insertion, register swiping.

3.2. Encryption

Encrypted malware consists of a decryption algorithm, an encryption algorithm, an encryption key, and an encrypted malicious code. They copy each other and generate new keys using a generation algorithm. They can be detected due to their decryption algorithm which remains unchanged.

3.3. Morphism

A morphic virus changes the decryptor used for each reproduction, which makes its detection more difficult. The only downside with morphic viruses is the use of a limited number of decryptors. Two other morphism variants appeared which made the task of static detection almost impossible:

Polymorphic viruses: There is no limit to the number of decryptors they can generate. This type of virus uses different obfuscation techniques to change its appearance.

Metamorphic viruses: Metamorphic malware is the most complex. They modify themselves so that the new instance looks nothing like the original one.

5. LIMITATIONS OF TRADITIONAL DETECTION TECHNIQUES

The detection techniques presented above depend largely on the expertise and human effort in the area of malware analysis. Generating signatures and formulating rules and specifications for heuristics are two time consuming tasks since they are often done manually. During this time, thousands of new malwares appear that remain undetectable until the antivirus is updated. As for dynamic analysis, it suffers from a high false positive rate, latency in the analysis and also the inability to discover intelligent malwares. The heavy reliance of anti-malwares on human effort makes the rate of evolution of anti-viruses very low compared to that of malwares. Today, malware creators have found techniques to automatically generate large samples of malwares that are protected from anti-viruses and may reproduce differently from one environment to another. Because of these limitations, cybersecurity experts have started to switch to more powerful and more automated techniques that adapt quickly to the evolution of malwares. These new techniques are generally based on machine learning. The latter has shown great efficiency in several areas: computer vision, image processing, natural language processing, etc. This success has attracted the attention of producers of cybersecurity solutions to create detectors based on machine learning.

6. SUMMARY AND CONCLUSION

Due to the limitations of conventional detection techniques, machine learning methods are combined with existing detection methods to increase the effectiveness of anti-malwares. Signature-based methods are effective in detecting known malwares, but they are unable to detect unknown or polymorphic malwares. Although heuristic-based methods can detect new malwares, they have a high rate of false positives and false negatives, which requires the development of more precise methods.