Threat Hunting Series: The Threat Hunting Process
In the previous posts of the series, I covered the basics of threat hunting and the core competencies a threat hunter should have. This post will show you the structural process I follow for threat hunting. Anyone who works solely as a threat hunter understands how chaotic the task can get when there is no structure. The threat hunting process doesn’t have to be complicated. The aim of having a process is to guide us through every step of the way, from building the initial hypothesis to analyzing the data and to the final findings.
Something I haven’t mentioned in the previous posts, which I think is a good idea to do now, is that all the information I am trying to communicate is neither new nor revolutionary.
I used the resources I posted in “Threat Hunting Series: What Makes a Good Threat Hunter” to learn and then apply this knowledge to my day job. We all stand on the shoulders of giants, and Chris Sanders is one of them, from whom I learned a ton. Some of the concepts I will cover below are based on his methodology.
The threat hunting mental models
Before diving into the actual process, I want to cover the two different mental models that a threat hunter can apply to the threat hunting process. Based on the initial hypothesis, I use these two models to separate the different types of threat hunting.
Attack-based hunting is applicable when hunting for a certain attack technique. In most cases, I find myself hunting for an attack that is known and documented. Attack-based hunting is faster, especially if Indicators of Attack(IOAs) are readily available through third-party research.
If the attack I am hunting for is not well researched, I would have to spend more time emulating the technique in my lab environment. By doing so, I have a better understanding of the attack technique and see the artifacts generated upon successful execution in a windows/Linux/Mac environment.
Attack emulation in threat hunting is an important topic that deserves a post on its own. Keep an eye out for that post coming up.
Below are two examples of attack-based threat hunting. The first one is a good example of attack-based threat hunting. It has a proactive approach looking for a specific technique and is not IOC-based.
The second one is a bad example based on what we know so far about threat hunting and the attack-based threat hunting model. Hunting for specific IOCs is a reactive task.
- Hunting for suspicious process execution activity originating from Microsoft Word documents.
- Mitre ATT&CK ID: T1204.002 (Execution)
- Hunting for domains/IPs associated with the recent campaign of <insert fav malware here>.
- Checking for pre-defined malicious IOCs is not threat hunting. Check out my previous posts here and here to find out why.
- “Hunting” (not really) across all hosts in our environment for a malicious word document that was detected on one of the hosts.
- A detection event cannot be the trigger of a threat hunting operation.
Unlike attack-based threat hunting, this mental model is more advanced since it does not follow a predetermined path. During data-based hunting, the threat hunter is not searching for specific evidence of an attack technique but instead looking for abnormal activity in the dataset of interest.
When using data-based threat hunting, the threat hunter should be familiar with various attack techniques and how they can manifest within the available data sources. Once the data is collected, threat hunters could use certain data analysis techniques depending on how they would like to view and analyse the information at hand.
As with the attack-based mental model, I put together a couple of examples to help people understand what data-based threat hunting should be based on.
- Search for suspicious process execution of unknown binaries launched from non-system directories.
- Search for suspicious process execution of PowerShell that downloads and executes the payload in memory.
- This could be a good example of attack-based threat hunting, but the hypothesis is too specific to be considered data-based threat hunting.
The threat hunting process
The steps involved in threat hunting are listed below. I’ll go through each one, explain how they work, and then give some examples.
1) Establish a hypothesis
The hypothesis drives the threat hunt. This is where threat hunters decide what they will hunt for in the environment. As was already established, the threat hunter assumes that this malicious activity has occurred within the network.
2) Establish evidence
Based on the hypothesis, the threat hunter should research the evidence of the expected malicious activity. Searching for existing writeups from other researchers could be enough to collect the IOAs needed to start hunting.
However, on some occasions, the attacks are not well-documented, and the reports that describe them don’t have enough information. This makes it challenging to understand the attack technique and create threat hunting queries.
In these cases, the threat hunter should be able to emulate the attack in a lab environment and establish the evidence based on the generated telemetry.
3) Identify Sources
Identify the data sources that should contain evidence of the malicious activity. Some examples of data sources are:
- Network traffic logs
- Process execution logs
- Authentication logs
4) Identify Fields
After establishing the type of attack or the specific IOAs of an attack on our hunting operation, we can concentrate on the specific fields we should query. Whether the data source has network or process execution-related logs, we can choose the individual fields that will help us spot the malicious activity.
5) Query the data
We now have all of the information we need to build our queries. We could adjust a couple of core components when forming these queries. The first variable is the timeframe. This is how far back we choose to search in the available data.
The second one is the scope of the search. We can make the query more or less specific. For example, we could be specific and focus on the tools of the attack or be less specific and focus on the attack technique itself. In detection engineering, this is known as “capability-abstraction”. SpecterOps has a lot of resources that explain what capability-abstraction is and how one could use it to create well-informed detection rules. We can use capability-abstraction in threat hunting to structure our hypothesis based on the attack techniques. Hunting for the technique could help us uncover other tools that may have been used to compromise our network.
For example, a specific query would include the name of the DLL threat actors are using to load into memory. On the other hand, a less specific query would focus on the process execution method without including any command line details.
A picture is worth a thousand words, so the example below could help illustrate the difference between a specific and a less specific query. The first example at the top includes the targeted query with the specific command line arguments. The second image is an example of a less specific query that contains the execution flow of a word document running commands on the host using cmd.exe.
As we can see from this example, the second, less specific query showed more activity linked to the attack technique we are hunting for.
The more flexible the query is, the more false-positive results we may have. In contrast, the more targeted the query is, the fewer false-positive results we will have. Although having a targeted query may cause us to overlook instances where the attack technique we are hunting for manifests differently. A balance between those two is key when deciding on the final version of the query.
I usually start with a broader query and a short timeframe (<1day). Depending on the returned results, I will either make the query more specific or keep it the same and expand the timeframe. Applying this method makes analyzing the results easier (fewer data and FPs become apparent), and we avoid potential impact on the back-end databases serving the requested data.
6) Analyze the data
Once we have the results from our queries, we can start manipulating the data to make it as easy as possible to analyze and spot anomalies. We can apply several analysis techniques depending on what we are hunting for and which mental model we follow. This article from CyborgSecurity — Threat Hunting Tactics & Techniques — does a great job explaining the different threat hunting analysis techniques we can use to spot malicious activity. In short, some of my favourite data-based hunting analysis techniques are frequency analysis and stack counting.
In future posts, I want to cover some free tools and methods anyone can use to help with analysis.
The threat hunting process in action
In this section, I’ll use the two threat hunting examples I gave above, based on the two different mental models, to explain the different steps of the process.
Attack Based Example
Establish a hypothesis
Hunting for suspicious process execution activity originating from Microsoft Word documents.
- Mitre ATT&CK ID: T1204.002 (Execution)
- Winword.exe creates child processes.
- Winword.exe injects malicious code into other processes.
- Winword.exe reaches out to unknown public servers to download binaries.
- Process execution logs
- Evidence of macro execution via “TrustRecords” entry in the registry below:
ProcessName: (cmd.exe|powershell.exe|rundll32.exe|regsvr32.exe|wscript.exe|cscript.exe|mshta.exe etc.)
Query the data
Example ELK query:
(process.parent.executable:*\\WINWORD.EXE AND process.executable:(*\\cmd.exe OR *\\powershell.exe OR *\\rundll32.exe OR *\\regsvr32.exe OR *\\mshta.exe OR *\\certutil.exe OR *\\wscript.exe OR *\\cscript.exe))
Example Splunk query:
Source=*sysmon* EventCode=1 ((ParentImage=”*\\WINWORD.EXE”) AND (Image=”*\\cmd.exe” OR Image=”*\\powershell.exe” OR Image=”*\\rundll32.exe” OR Image=”*\\regsvr32.exe” OR Image=”*\\mshta.exe” OR Image=”*\\certutil.exe” OR Image=”*\\wscript.exe” OR Image=”*\\cscript.exe”))
Analyze the data
The figure below is an example of the findings we would expect to see if suspicious behavior was observed in the network:
An example of a suspicious event from a real intrusion:
Establish a hypothesis
Hunting for previously undetected malicious binaries executing from temp directories.
Using clustering analysis to look for outliers based on the process name and the path of the process. We hope to identify binaries executed in isolation by a small number of hosts in the environment.
- Process Execution Logs
- Count of ProcessName occurrence
- Count of hosts associated with the same process
Query the data
In elasticsearch, we will need to create a visualization to get the data above and their aggregate values. The below query will be in Splunk for demonstration purposes.
source=*sysmon* EventCode=1 (Image=”C:\\$Recycle.bin\\*” OR Image=”C:\\ProgramData\\*” OR Image=”C:\\Users\\Public\\*” OR Image=”C:\\Users\\*\\AppData\\Local\\Temp\\*” OR Image=”C:\\Users\\*\\AppData\\Roaming\\Temp\\*”) | stats count by host Image
The query is looking for all executables that are launched from the below directory paths:
Analyze the data
Like attack-based hunting, analyzing the data might take some time, depending on the returned results. When we first execute a new threat hunt, it can take some time to investigate the results, and we might even fall into rabbit holes. However, with experience, and getting to know the environment we are hunting in, there will be less friction, and the analysis speed will improve.
After understanding and practicing the threat hunting process for some time, each phase will become easier to step through. It is important to note that the threat hunting operation doesn’t end with the analysis.
When threat hunters identify malicious activity upon analyzing the data, they must document their findings and pass them on to the appropriate teams to begin remediation and further investigate the attack. Another critical step is to propose possible detections and report any potential telemetry gaps.
This is the threat hunting process I follow, and it is working for me. Threat hunting doesn’t have to be complicated, but, as I tried to communicate through this post, a structural approach using the two mental models can make things easier.
Follow me here and on Twitter for updates on the next posts for this series.
tsale - Overview
You can't perform that action at this time. You signed in with another tab or window. You signed out in another tab or…