File Extraction

Concept

When turned on, file extraction will save on disk the files observed on the wire by each Stamus Network Probe using Suricata signatures.

The files will be extracted and stored locally on the Network Probe performing the extraction and the REST-API on the manager, Stamus Central Server, allows to centrally retrieve chosen files.

As of U38, the currently supported protocols for file extraction are:

  • HTTP

  • SMTP

  • FTP

  • NFS

  • SMB

File Extraction Activation

To activate file extraction, go under Probe Management, menu Appliances.

Edit the desired Probe, or the desired Template, and go under the Settings tab.

Check the checkbox “Activate file extraction” and a few more options will expand as illustrated by the below screenshot

Activate File Extraction

All extracted files will have their sha256 fingerprint computed, this option cannot be deactivated. This mechanism is used to only store once each and every single file in case of multiple downloads of the same file.

You can then choose to also compute common hashes such as md5 and/or sha1, that is especially useful if you need to integrate with third-party solutions to which you want to send those hashes. This information, as well as the original filename, will be available in the fileinfo structure of the json metadata.

Note

File extraction has a performance impact on the Stamus Probe it is enabled on.

Important

Extracted files are de-duplicated. In other words if the same file is extracted 5 times it will only be saved to disc once on the particular Stamus Probe.

Finally, specify a size limit, in bytes, if you want to disregard large files such as ISO files and Apply changes to make the changes take effect.

Note

When 2 Network Probes, from 2 different capture locations, see the same file, this file will be extracted and stored on each of those 2 Network Probes. The deduplication of files is performed on a per probe basis.

How File Extraction Works

File extraction is performed through the usage of Suricata rules and only the rules using the keyword filestore will perform file extraction.

For example, the following rule will extract and de-duplicate all executables observed on the wire that are downloaded via HTTP from the networks $INFRA_SERVERS and $DC_SERVERS (defined as configuration variables):

alert http [$INFRA_SERVERS,$DC_SERVERS] any -> $EXTERNAL_NET any (msg:"FILE Executable public download from critical infra detected"; flow:established,to_server; http.method; content:"GET"; file.mime_type; content:"executable"; filestore; sid:1; rev:1;)

Best Practices recommends to always make rules as specific as possible as illustrated by the previous example in order to optimise performances and reduce false positives.

Note

We recommend using the new file.mime_type keyword instead of file.magic for better performance.

From the Hunting interface, the files will also be accessible directly from the alerts tab and can be downloaded as illustrated below

Downloading an extracted file

Important

Some of those files may be potentially dangerous, proceed with caution when downloading them!

Note

When a file has been extracted, the boolean stored in the fileinfo structure (JSON View / Related Events) will be set to true. Note that this indicates that the file was once stored and may have been purged at the time you are looking at this alert. While this is unlikely for recent alerts, it could be the case for the oldest ones. There is a cleaning process that removes the oldest files extracted when the disk space limit is reached (10% of the /var/log/suricata partition)

Hint

If you need to keep files for longer periods, you need to setup a third-party server on which to export regularly the files using the REST-API for example. The partition on which the files are stored on the Network Probes can only be extended on virtual machines.

Getting Files from the REST-API

Retrieving a file from the REST-API is a 2 step process. First, we need to instruct the manager, SCS, to retrieve the file from the Network Probe. Once this file has been retrieved on SCS, we can download it.

To query this endpoint, we will need 2 components:

  • The hash (sha256) of the file to retrieve

  • The probe name (string) on which the file is stored

First, we can ensure the status of the file using the following cURL command:

curl X GET -k "https://<SCS_ADDRESS>/rest/rules/filestore/<HASH>/status/?host=<PROBE_NAME>" -H "Authorization: Token <TOKEN>" -H 'Content-Type: application/json'

If the file is still present (i.e. not deleted), we will receive the following response:

{"status":"available"}

Now, we first need to instruct SCS to get the file. To do so, we will use the retrieve endpoint:

curl -X GET -k "https://<SCS_ADDRESS>/rest/rules/filestore/<HASH>/retrieve/?host=<PROBE_NAME>" -H 'Authorization: Token <TOKEN>' -H 'Content-Type: application/json'

The file should be downloaded onto SCS and you will get this result

{"retrieve":"done"}

Finally, to download the file, use the download endpoint:

curl -X GET -O -J -k "https://<SCS_ADDRESS>/rest/rules/filestore/<HASH>/download/?host=<PROBE_NAME>" -H "Authorization: Token <TOKEN>" -H 'Content-Type: application/octet-stream'

Hint

In order to get the required metadata to build the complete Rest API call correctly, the alerts rest API endpoint can be used.

This can be achieved by filtering on unique key value pairs like the signature id and the flow id, for example.

https://<SCS ADDRESS>/rest/rules/es/alerts_tail/?qfilter=alert.signature_id:2029743 AND flow_id:2077942411095108

The above specimen will filter all event_type: alerts` on {“signature_id”: 2029743} and {“flow_id”: 2077942411095108} After that information is at hand we can proceed with the same technique this time on the fileinfo event like so:

https://<SCS ADDRESS>/rest/rules/es/events_from_flow_id/?qfilter=flow_id:2077942411095108 AND fileinfo.stored:true

This will provide the file hashes that are required so the actual file is retrieved.

Note

There could be many fileinfo` event types for one single network flow. In a network flow, especially if there are multiple files transferred or associated with the flow, Suricata can generate multiple “fileinfo” events, each corresponding to a different file within that particular flow. Therefore, it’s possible to have multiple “fileinfo” events associated with a single network flow.

An example of a complete python script implementation:

import requests
from requests.packages.urllib3.exceptions import InsecureRequestWarning
import sys
import os

def print_helptext():
    if len(sys.argv) < 4:
        print(
            f"How to run the script: {sys.argv[0]} <hostname/ip of SCS> <token> <sha256 checksum of the file>"
        )
        print(
            f"\nExample: python3 {sys.argv[0]} 192.168.0.12 7408a4b978abdc03ee39e1fea419512e5734f51e 7f87640e9b74a059a10fb37ad083eb0843045493589d5b6c1f971ad2c13ee127"
        )
        quit()
    else:
        global url
        url = "https://" + sys.argv[1] + "/rest"
        global scs_ip
        scs_ip = sys.argv[1]
        global token
        token = sys.argv[2]
        global file_hash
        file_hash = sys.argv[3]

def check_host_is_up(hostname, waittime=1000):
    if (
        os.system(
            "ping -c 1 -W " + str(waittime) + " " + hostname + " > /dev/null 2>&1"
        )
        == 0
    ):
        HOST_UP = True
    else:
        HOST_UP = False
        raise Exception("Error. Host %s is not up..." % hostname)
    return HOST_UP


def check_url_is_reachable(url):
    try:
        get = requests.get(url, verify=False)
        if get.status_code == 200:
            return f"{url}: is reachable"
        else:
            return f"{url}: is Not reachable, status_code: {get.status_code}"
    except requests.exceptions.RequestException as e:
        raise Exception(f"{url}: is Not reachable \nErr: {e}")

def check_request(response):
    if response.status_code == 200 or 201:
        print("Request is successful!")
        print("Response:")
        print(response.text)
    else:
        print(f"Request failed with status code: {response.status_code}")
        print(response.text)


class RestCall:
    def __init__(self, token, url):
        self.token = token
        self.url = url
        self.headers = {"Authorization": f"Token {self.token}"}


    requests.packages.urllib3.disable_warnings(InsecureRequestWarning)
    verify = (
        True  # Change this to False if you use https with a self signed certificate
    )

    def get_filehashes_exist(self, file_hash):
        print("INFO: Get the file hashes from event type fileinfo")
        rest_point = f"/rules/es/events_tail/?qfilter=event_type:fileinfo AND fileinfo.stored%3Atrue AND fileinfo.sha256:{file_hash}"

        response = requests.get(
            self.url + rest_point, headers=self.headers, verify=self.verify
        )

        check_request(response)
        response = response.json()
        results = response["results"]

        if results:
            return results[0]["host"]
        else:
            raise Exception("ERROR: There's no file, in any fileinfo event type, matching the given sha256. Exiting ...")

    def get_extracted_file_status(self, file_hash, probe_name):
        print("INFO: Get the status of the extracted file")
        rest_point = f"/rules/filestore/{file_hash}/status/?host={probe_name}"
        response = requests.get(
            self.url + rest_point, headers=self.headers, verify=self.verify
        )
        check_request(response)
        response = response.json()
        return response

    def retrieve_extracted_file(self, file_hash, probe_name):
        print("ACTION: Retrieve the extracted file")
        rest_point = f"/rules/filestore/{file_hash}/retrieve/?host={probe_name}"
        response = requests.get(
            self.url + rest_point, headers=self.headers, verify=self.verify
        )
        check_request(response)
        response = response.json()
        return response


    def download_extracted_file(self, file_hash, probe_name):
        print("ACTION: Download extracted file")
        rest_point = f"/rules/filestore/{file_hash}/download/?host={probe_name}"
        response = requests.get(
            self.url + rest_point, headers=self.headers, verify=self.verify
        )
        check_request(response)
        response = response.text
        extracted_file = file_hash + ".data"
        with open(extracted_file, "w") as file:
            file.write(response)
        return response


if __name__ == "__main__":
    print_helptext()
    SCS_Rest = RestCall(token, url)

    check_url_is_reachable(url)
    probe_name = SCS_Rest.get_filehashes_exist(file_hash)
    SCS_Rest.get_extracted_file_status(file_hash, probe_name)
    SCS_Rest.retrieve_extracted_file(file_hash, probe_name)
    SCS_Rest.download_extracted_file(file_hash, probe_name)