GopherCap¶
GopherCap is an open source project maintained by Stamus Networks for accurate, modular and scalable PCAP manipulation. The official repository can be found at https://github.com/StamusNetworks/gophercap
Gophercap is a lightweight tool for working with PCAP files. First and
foremost, it can map and replay large asynchronous datasets that were
written with concurrent writers. It was developed by Stamus Networks for
solving once such complex replay problem where regular tcpreplay
features were not sufficient.
Gophercap does not aim to be a generic traffic replay tool like
tcpreplay
. The latter is much more mature and widely adopted, and
therefore also a better option in simple replay scenarios where user
simply needs to generate live traffic from a single sequential PCAP
file.
By comparison, gopherCap
was written from research and data
engineering perspective, relying on strong concurrency primitives and
pleasant IO interaction in Go programming language to handle large PCAP
sets that span to hundreds of files and several terabytes disk usage.
Furthermore, it is written with modern CLI frameworks that encapsulate
functionality into smaller subcommands and allow user to configure it by
passing a YAML file, using command line arguments, etc.
Following sections will explain currently implemented subcommands in more detail and will provide some simple usage examples.
exampleConfig¶
As name implies, this subcommand generates a YAML configuration file
skeleton. Configuration file path must be defined via global
--config
flag.
gopherCap --config /tmp/gopher.yml exampleConfig
User should observe following log message if all goes well.
INFO[0000] Writing config to /tmp/gopher.yml
User can then observe default configuration parameters and modify
according to needs. Gopher uses a single configuration dictionary for
all subcommands, with those parameters organized by respective command.
Parameters used by multiple (but not necessarily all) subcommands are
located in global
section.
cat /tmp/gopher.yml
global:
dump:
json: /tmp/dump.json
file:
regexp: ""
map:
dir:
src: ""
file:
suffix: pcap
workers: 4
replay:
disable_wait: false
loop:
count: 1
infinite: false
out:
bpf: ""
interface: eth0
time:
from: ""
modifier: "1"
scale:
duration: 1h0m0s
enabled: false
to: ""
tarball:
dryrun: false
in:
file: ""
out:
dir: ""
gzip: false
Note that CLI flags override values in configuration dictionary. Meaning
user can define sensible defaults with --config
but still override
specific options in some replay cases. For example, user could always
use different --file-regexp
values for different replay runs without
reconfiguring.
User is not limited to YAML configuration files. Other structured formats like TOML or JSON are also supported. That format is automatically detected from file suffix.
gopherCap --config /tmp/gopher.json
gopherCap --config /tmp/gopher.toml
Finally, exampleConfig
uses no flags other than global --config
.
Not all configuration options may be explained in following section.
Each subcommand supports --help
flag to list updated information of
individual flags.
map¶
Gophercap handles large asynchronous PCAP sets by first doing a full
pass over entire data. This is mostly needed to map out first and last
timestamps of each PCAP file to deduce global start and stop of entire
set. This information will allow replay
command to decide how long
individual PCAP readers should wait before reading, thus solving the
async problem. As it already needs to parse entire PCAP file, it also
collects other information during that pass. For example, maximum packet
size for MTU configuration, number of bytes, number of packets,
duration, etc.
Note that capinfos
collects similar information and is likely a
better solution for inspecting small individual PCAPs. However,
Gophercap has many advantages:
It’s was much faster according to our tests, likely due to less operations it does, and is thus better for handling large files;
Gzip compression detection. Meaning, it does basic file magic check for each PCAP and can thus handle
.pcap
and.pcap.gz
files with no interaction needed from user. This saves a lot of disk space when dealing with multi-terabyte datasets, albeit at cost of mapping speed as Gophercap still needs to decompress each file. However, this happens on the fly over raw byte stream in memory, and thus uncompressed data never touches a disk;Output is structured JSON with global interval and duration derived from all files in set;
Gopher will recursively search all PCAP files from root directory defined in
map.dir.src
that matchmap.file.suffix
andglobal.file.regexp
;Worker pool,
map.file.workers
ensures that only N PCAP files are read at the same time. Thus allowing users to balance CPU core utilization against IO limitations;
Mapping only needs to be done once, unless new files are added to PCAP set!
Assuming that user has a folder structure in
/home/snuser/malware-samples
, following command would find and map
all files with pcap
suffix that match Hancitor and maldoc naming
pattern while ensuring that maximum 2 files are parsed concurrently.
gopherCap map \
--dir-src /home/snuser/malware-samples \
--file-suffix pcap \
--dump-json /tmp/malware-samples.json \
--file-regexp "maldocs.+hancitor" \
--file-workers 2
This CLI command is equal to following configuration invocation.
gopherCap --config map.yml map
global:
dump.json: /tmp/malware-samples.json
file.regexp: 'maldocs.+hancitor'
map:
dir.src: /home/snuser/malware-samples
file:
suffix: pcap
workers: 2
Then observe the metadata.
cat /tmp/malware-samples.json | jq .
{
"beginning": "2020-07-13T21:47:09.005658Z",
"end": "2021-01-30T16:11:29.98529Z",
"files": [
{
"path": "/home/snuser/malware-samples/maldocs/hancitor/2020/July/e57d44fd470e7364a235353ded942f0f.pcap",
"root": "/home/snuser/malware-samples",
"err": null,
"snaplen": 262144,
"packets": 303,
"size": 86230,
"max_packet_size": 1033,
"beginning": "2020-07-13T21:47:09.005658Z",
"end": "2020-07-13T22:02:10.247467Z",
"pps": 0.33620277818247557,
"duration": 901241809000,
"duration_human": "15m1.241809s",
"delay": 0,
"delay_human": "0s"
},
{
"path": "/home/snuser/malware-samples/maldocs/hancitor/2021/January/e688ebdab6916fc89610c89ccb94ce16.pcap",
"root": "/home/snuser/malware-samples",
"err": null,
"snaplen": 262144,
"packets": 358,
"size": 25376,
"max_packet_size": 418,
"beginning": "2021-01-30T16:06:30.926979Z",
"end": "2021-01-30T16:11:29.98529Z",
"pps": 1.1970909579570252,
"duration": 299058311000,
"duration_human": "4m59.058311s",
"delay": 17345961921321000,
"delay_human": "4818h19m21.921321s"
}
]
}
Note that delay
value is very large for second PCAP file. That’s
because the example PCAP set does not have async problem at all. Rather,
it’s simply a collection of different malware samples, and not a product
of same capture process where PCAPs rotated in different times. Gopher
can still replay this set without waiting 6 months for second reader to
start, but that needs a special flag. More on that in next section.
Replay¶
Suppose we have a multi-terabyte packet capture from a red-blue exercise
that was created with Suricata PCAP writer in multi mode. In other
words, separate PCAP file per worker, each rotating at different times
whenever max file size is reached, sessions properly balanced between
workers. Assuming that user has already mapped this set as instructed in
previous section, and that mapping is located in
/home/snuser/exercise/gophercap.json
, then we can use following
configuration to replay all discovered PCAP files at the same rate as
they were originally written. Furthermore, the configuration limits
replay to specific file pattern, the writer will not write any packet
seen from noisy or sensitive segment 10.0.10.0/24
, and the replay
will loop infinitely.
gopherCap --config gopher.yml replay
global:
dump.json: /home/snuser/exercise/gophercap.json
file.regexp: 'meerkat-20012\d+-\d+\.pcap'
replay:
disable_wait: false
loop.infinite: true
out:
bpf: "not net 10.0.10.0/24"
interface: dummy0
Like map
, replay
is agnostic to Gzip compression. Compressed
files dynamically opened with gzip reader while uncompressed files are
read as-is. No user interaction needed.
Unlike tcpreplay
, gophercap does not currently support defining
specific rates, like --pps
or --bps
. Instead, it supports time
scaling. In other words, user can define --time-modifier
to speed
up or slow down the replay whereas packets still preserve temporal
properties between them. Furthermore, combination of
--time-scale-enabled
and --time-scale-duration
would dynamically
calculate appropriate modifier to achieve desired result. For example,
consider following configuration that is based on previous example:
global:
dump.json: /home/snuser/exercise/gophercap.json
file.regexp: 'meerkat-20012\d+-\d+\.pcap'
replay:
disable_wait: false
loop.infinite: true
out:
bpf: "not net 10.0.10.0/24"
interface: dummy0
time:
from: ""
modifier: 1
scale:
duration: 15m
enabled: true
to: ""
This will ensure that each each replay iteration is completed in
approximately 15 minutes. Packet rate will be calculated dynamically to
reach this goal and minor drift cannot be avoided due to calculations
between each packet. Note that replay.time.scale.enabled
will always
override whatever value user defines via replay.time.modifier
key or
--time-modifier
flag. User can also use replay.time.from
and
replay.time.to
to only replay files from specific period, for
example daytime. However, this feature currently does not scan
individual packets and simply relies on PCAP file beginning and end
values. Thus, a PCAP file is ignored even if defined period begins
inside that file and gophercap will simply start from next file from
that.
But, can user replay multiple PCAPs that were not written by same
capture process? For example, user might want to use multiple PCAPs
written at different times to generate simulated real-time traffic.
Consider example JSON mapping in map
section - waiting 6 months for
second file replay to start would be very bad. This can be achieved with
--wait-disable
flag or replay.disable_wait
option. When set to
true
, all files will start replaying at the same time and will be
forced into the same interval.
replay.disable_wait: true
Combined with timescaling, this feature is quite useful for traffic generation. Malware PCAP mapping example could easily be replayed with following configuration, which can be useful when writing or validating Suricata signatures, testing post-processing tools, etc. User can for example mix known malware C2 beacon PCAPs with normal traffic to set up lab and training environments.
global:
dump.json: /tmp/malware-samples.json
file.regexp: 'maldocs.+hancitor'
replay:
disable_wait: true
loop.infinite: true
out.interface: dummy0
time:
scale:
duration: 15m
enabled: true
tarExtract¶
Gophercap can replay individual gzip-compressed PCAP files, but not when those files are in Tar archives. Likely never will, as Tar format is sequential and thus would break concurrency features. However, consider following example scenario -
4 terabyte hard drive holds a 1 terabyte gzipped Tar archive of PCAP files;
uncompressed, those files sum 4 terabytes disk usage;
only 1/4 of those files are relevant for replay, everything else is noise;
compressed, that 1 terabyte PCAP set only requires 200 gigabytes;
disk available only has 300 gigabytes free space;
This problem motivated creation of this subcommand. It will scan a
tar.gz
file and only extract files that match user-defined regular
expression pattern. Those files can be written directly to gzip
compressed files. Thus, total disk requirement when solving the problem
is only 200 gigabytes. No interim storage or temporary files needed.
Following example would extract files that match specific date pattern
from /mnt/big.tar.gz
to separate gzipped output files in
/mnt/small
.
global:
dump.json: /tmp/malware-samples.json
file.regexp: 'meerkat-20012\d+-\d+\.pcap'
tarball:
dryrun: false
in:
file: /mnt/big.tar.gz
out:
dir: /mnt/small
gzip: true
Note that while developed for extracting PCAPs, nothing is really stopping user from using this in other contexts as well.