As of today, phishing emails are the most widely used infection vector. This means that the number of alerts related to emails to analyze is growing faster and faster. The problem is that analyzing an email is a complex and tedious process that can make an analyst waste the majority of its time on repetitive tasks. The solution is, obviously, the automation of this process. To do that, we developed an open-source automated phishing email analysis tool, which is ThePhish. It is based on three open-source platforms, namely TheHive, Cortex and MISP.

Why automation?
Everyone knows what an email is, but not everyone knows what an email really contains. We are used to see the body of the email, which is the content that is displayed by the mail client. Actually, if we stopped there, we would be missing maybe the most important part of the email when it comes to the analysis: the header. It is composed of dozens of header fields, and being able to understand where to look at among them is crucial. We are usually looking for observables like IP addresses, email addresses, domains, URLs and file attachments that can be found in the header and/or in the body of the email. It is important to identify the relevant ones and analyze them with many different tools and services. However, doing that can literally take hours! That’s why we developed ThePhish.
The underlying tools: TheHive, Cortex and MISP
TheHive
 TheHive is a scalable, open-source and free Security Incident Response Platform (SIRP) created by TheHive Project. It allows managing alerts related to security events coming from a multitude of sources. In addition, specialized programs called alert feeders can be built to consume such a security event, parse it and create an alert in TheHive. Alerts can be ignored, marked as read, previewed and imported. When an alert is imported, it becomes a case that needs to be investigated. The core construct of TheHive is infact the case, which is also the core construct of most security investigations. It can contain observables, such as IP addresses, email addresses, URLs, domains, files and many more. They can be tagged, analyzed and even flagged as Indicators of Compromise (IoCs). It is possible to access to all the functionalities provided by TheHive both through a web interface and a REST API. Moreover, a Python API client is available that is called TheHive4py.
TheHive is a scalable, open-source and free Security Incident Response Platform (SIRP) created by TheHive Project. It allows managing alerts related to security events coming from a multitude of sources. In addition, specialized programs called alert feeders can be built to consume such a security event, parse it and create an alert in TheHive. Alerts can be ignored, marked as read, previewed and imported. When an alert is imported, it becomes a case that needs to be investigated. The core construct of TheHive is infact the case, which is also the core construct of most security investigations. It can contain observables, such as IP addresses, email addresses, URLs, domains, files and many more. They can be tagged, analyzed and even flagged as Indicators of Compromise (IoCs). It is possible to access to all the functionalities provided by TheHive both through a web interface and a REST API. Moreover, a Python API client is available that is called TheHive4py.
Cortex
 Cortex is a powerful open-source observable analysis and active response engine created by TheHive Project that allows analyzing observables at scale by querying a single tool instead of several. It provides a web interface from which it is possible to analyze observables one by one or in bulk mode, but it can also be used to automate these operations and submit large sets of observables from TheHive or through a REST API. Moreover, a Python API client is available that is called Cortex4py. The usage of Cortex is based on neurons, which are autonomous applications managed by and run through the Cortex core engine. They can be of one of two types:
Cortex is a powerful open-source observable analysis and active response engine created by TheHive Project that allows analyzing observables at scale by querying a single tool instead of several. It provides a web interface from which it is possible to analyze observables one by one or in bulk mode, but it can also be used to automate these operations and submit large sets of observables from TheHive or through a REST API. Moreover, a Python API client is available that is called Cortex4py. The usage of Cortex is based on neurons, which are autonomous applications managed by and run through the Cortex core engine. They can be of one of two types:
- Analyzers: They allow analyzing different types of observables by automating the interaction with a service or a tool so as to speed up the analysis and make it possible to contain threats before it is too late. Cortex comes with more than a hundred analyzers for popular services such as VirusTotal, emailrep.io, urlscan.io, AbuseIPDB and PhishTank.
- Responders: They are installed along with the analyzers. However, they are only useful when Cortex is used in conjunction with TheHive, in fact they perform different actions and apply to various elements of TheHive.
MISP

Malware Information Sharing Platform (MISP) is a free and open-source software helping information sharing of threat intelligence including cyber security indicators. Its aim is to help improve the countermeasures used against targeted attacks. It makes it possible to store IoCs in a structured manner, and thus enjoy the correlation, automated exports and synchronize to other MISP servers. The basic building block of MISP is the event. Each event is made up of a list of attributes, which are atomic pieces of data that could be IoCs. It also makes it easier to share with, but also to receive from trusted partners and trust groups so as to enable fast and effective detection of attacks. MISP provides an intuitive web interface, but also a REST API that can be used for automation and feeding devices.
Putting it all together
The real advantage of using TheHive, Cortex and MISP is obtained when they are used together. The following picture summarizes the possible interactions, which are described below:

- Events coming from SIEMs, emails, IDSes and other platforms are consumed by alert feeders, which can use the TheHive4py library to create alerts in TheHive. Alerts can also be generated thanks to TheHive polling events from MISP.
- An analyst can create a case from that alert (or from scratch) and decide to analyze the observables that it contains through Cortex. It is also possible to interact with MISP through Cortex by using a special analyzer (the MISP Search analyzer). This analyzer makes it possible to search observables within a MISP instance. The observbles can also be analyzed with the invocation of MISP expansion modules. Moreover, it is possible to launch responders to automate certain actions, like sending a notification email.
- The analyst can then decide whether to export the case along with its observables marked as IoC from TheHive to MISP.
- It is also possible to enrich MISP events by calling Cortex analyzers from within MISP.
How ThePhish can help
ThePhish is a web application that automates the entire analysis process. It extracts the observables from the header and the body of the email and elaborates a verdict, which is final in most cases. In addition, the analyst can intervene in the analysis process and obtain further details on the email being analyzed if necessary. Here is shown how ThePhish works at high-level:

- An attacker starts a phishing campaign and sends a phishing email to a user.
- A user who receives such an email can send it as an attachment in EML format to the mailbox used by ThePhish.
- The analyst interacts with ThePhish and selects the email to analyze.
- ThePhish extracts all the observables from the email and creates a case on TheHive. The observables are analyzed thanks to Cortex and its analyzers.
- ThePhish calculates a verdict based on the verdicts of the analyzers.
- If the verdict is final, ThePhish closes the case and notifies the user. In addition, if it is a malicious email, it exports the case to MISP.
- If the verdict is not final, ThePhish requires the analyst’s intervention. He must review the case on TheHive along with the results given by the various analyzers to formulate a verdict. Then, he can send the notification to the user, optionally export the case to MISP, and close the case.
ThePhish relieves the analyst from manually extracting all the observables from the email and adding them one by one in a case on TheHive. Moreover, he doesn’t need to start the various analyzers on each observable, send notifications to users, nor interacting with MISP. Even in the case in which his intervention is required, the majority of the work will have already been performed. This means that he can focus only on things that matter to elaborate a final verdict.
ThePhish example usage
This example aims to demonstrate two aspects:
- How a user can send an email to ThePhish
- How an analyst can actually analyze that email using ThePhish
A user sends an email to ThePhish
A user can send an email to the email address used by ThePhish to fetch the emails to analyze. He must forward the email as an attachment in EML format. This is to prevent the contamination of the email header. In this case, the used mail client is Mozilla Thunderbird and the used email address is a Gmail address.

The analyst analyzes the email
The analyst clicks on the “List emails” button to obtain the list of emails to analyze.

When the analyst clicks on one of the “Analyze” buttons, the analysis starts. Its progress is shown on the web interface.

In the meantime, ThePhish extracts the observables from the email and interacts with TheHive to create the case.

ThePhish creates three tasks inside the case.

Then, ThePhish starts adding the extracted observables to the case.

At this point ThePhish notifies the user via email that the analysis has started thanks to the Mailer responder.

The description of the first task allows the Mailer responder to send the notification via email.

ThePhish starts the analyzers on the observables. The analysis progress is shown on the web interface while the analyzers are started.

The analysis progress can also be viewed on TheHive, thanks to its live stream.

Once all the analyzers have terminated their execution, ThePhish calculates the verdict. Since the verdict is “malicious”, ThePhish marks all the observables that it finds to be malicious as IoC.

The case is then exported to MISP as an event, with a single attribute represented by the observable mentioned above.


Then, ThePhish sends the verdict via email to the user thanks to the Mailer responder.

Finally, ThePhish closes the case. The description of the third task allows the Mailer responder to send the verdict via email. Moreover, the case has been closed after five minutes and resolved as “True Positive” with “No Impact”. This means that ThePhish detected the attack before it could do any damage.

Once the case is closed, the analyst can view the verdict on the web interface. He can also view the entire log of the analysis progress.

At this point, the analyst can go back and analyze another email. The above-depicted case was related to a phishing email. However, a similar workflow can be observed when the analyzed email is classified as “safe”. Indeed, ThePhish closes the case and sends the verdict to the user.

Then, the analyst can view the verdict on the web interface.

On the other hand, when an email is classified as “suspicious”, the verdict is only displayed to the analyst.

Now the analyst needs to use the buttons on the left to use TheHive, Cortex and MISP for further analysis. This is because the analysis has not been completed yet. Indeed, the last task and the case have been left open. They infact need to be closed by the analyst himself once he elaborates a final verdict.

The analyst can view the reports of all the analyzers on TheHive and Cortex. In case this was not enough, he could also download the EML file of the email and analyze it manually.

When the analyst terminates the analysis, he populates the description of the last task. It will infact contain the body of the email to send to the user. Then he starts the Mailer responder, exports the case to MISP if necessary and then closes the case.
Implementation
The following picture is worth a thousand words:

ThePhish is a web application written in Python 3. The web server is implemented using Flask, while the front-end part of the application is implemented using Bootstrap. Apart from the web server module, the back-end logic of the application is constituted by three Python modules that encapsulate the logic of the application itself and a Python class used to support the logging facility through the WebSocket protocol. Moreover, ThePhish uses several files for the configuration of the following aspects:
- The logging functionality used to show the analysis progress
- The connection with the underlying instances of TheHive, Cortex and MISP
- The fine tuning of the results given by the Cortex analyzers
- The observables to whitelist
When the analyst navigates to the base URL of the application, the browser establishes a bi-directional connection with the server. This is done by using the Socket.IO JavaScript library. For this to work, the server application uses the Flask-SocketIO Python library, which provides a Socket.IO integration for Flask applications. ThePhish uses this connection to display the progress of the analysis on the web interface.
Every time the analyst performs an action on the web interface, an asynchronous AJAX request is sent to the server. This allows the analyst both to visualize the list of emails to analyze and to make the analysis start.
ThePhish interacts with TheHive and Cortex thanks to TheHive4py and Cortex4py. Moreover, it interacts with an IMAP server to retrieve the emails to analyze.
Installation
You can install ThePhish in one of two possible ways:
- From scratch, if you have up-and-running instances of TheHive, Cortex and MISP
- By using Docker and Docker Compose
It’s also on GitHub!
ThePhish has been made available on GitHub as an open-source project under the AGPL license at this repository. It is possible to refer to that repository for a complete installation and configuration guide.
Conclusions
In this post we have presented a tool that allows automating the entire analysis process and obtain a verdict. It requires the analyst’s intervention when necessary, but with the most tedious and mechanical tasks already performed. The tool is available on GitHub so that anyone can contribute to improve it over time. The development of ThePhish will in fact not stop here, as there is great room for improvements. We will make further changes in the future in order to add new functionalities to ThePhish. Also, we will support any new feature introduced by TheHive and Cortex and support new analyzers. Furthermore, we will fix bugs that might be present in the current release or in any future release of ThePhish.


