Monitoring¶
Monitoring the overall system state the Clear NDR® Central Server and Stamus Probes involves the continuous observation and collection of various system metrics and logs to assess its health and performance. This process helps detect issues, troubleshoot problems, and optimize overall system performance.
Observe system health over Rest API¶
Clear NDR® Central Server provides Rest API health check calls to ensure all services and connections are running properly on both CNCS and Stamus Probes.
The list of checks available could be obtained via the Rest API URL call /rest/appliances/appliance/<probe_pk>/troubleshoot_steps/ for the Probe and /rest/appliances/troubleshoot/steps/ for the CNCS. The param is the value of the check that we want to perform
This will give a list of all the steps available for the particular device we want to monitor. After we have the steps we can engage a check via the Rest API URL call: /rest/appliances/appliance/<probe_pk>/troubleshoot/?query=<param> for the Probe and /rest/appliances/troubleshoot/?query=<param> for the CNCS.
| URL | HTTP Method | Description | 
|---|---|---|
| 
 | GET | Retrieve dictionary of key/value for each troubleshooting step for a Probe | 
| 
 | GET | Retrieve dictionary of key/value for each troubleshooting step for a CNCS | 
| 
 | GET | Execute troubleshooting step for a Probe | 
| 
 | GET | Execute troubleshooting step for a CNCS | 
Hint
Built a script to loop over the troubleshooting steps. Execute the script on a regular basis to create a monitoring tool.
Automation code samples¶
Below are two script examples of Python and Bash respectively that can be used to monitor via RestAPI the health of the system. Those scripts go over and return the status of critical services and application checks of the Clear NDR® Central Server.
import requests
from requests.packages.urllib3.exceptions import InsecureRequestWarning
from time import sleep
import sys
import os
def print_helptext():
   if len(sys.argv) < 3:
      print(
            f"How to run the script: {sys.argv[0]} <hostname/ip of CNCS> <hostname/ip of Probe> <token>"
      )
      print(
            f"\nExample: python3 {sys.argv[0]} 192.168.0.12 192.168.0.13 1308a4b978feab03ee39e1fea490512e5734f51e"
      )
      quit()
   else:
      global url
      url = "https://" + sys.argv[1] + "/rest"
      global scs_ip
      scs_ip = sys.argv[1]
      global probe_ip
      probe_ip = sys.argv[2]
      global token
      token = sys.argv[3]
def check_host_is_up(hostname, waittime=1000):
   if (
      os.system(
            "ping -c 1 -W " + str(waittime) + " " + hostname + " > /dev/null 2>&1"
      )
      == 0
   ):
      HOST_UP = True
   else:
      HOST_UP = False
      raise Exception("Error. Host %s is not up..." % hostname)
   return HOST_UP
def check_url_is_reachable(url):
   try:
      get = requests.get(url, verify=False)
      if get.status_code == 200:
            return f"{url}: is reachable"
      else:
            return f"{url}: is Not reachable, status_code: {get.status_code}"
   except requests.exceptions.RequestException as e:
      raise Exception(f"{url}: is Not reachable \nErr: {e}")
def check_request(response):
   if response.status_code == 200 or 201:
      print("Request is successful!")
      print("Response:")
      print(response.text)
   else:
      print(f"Request failed with status code: {response.status_code}")
      print(response.text)
class RestCall:
   def __init__(self, token, url):
      self.token = token
      self.url = url
      self.headers = {"Authorization": f"Token {self.token}"}
   requests.packages.urllib3.disable_warnings(InsecureRequestWarning)
   verify = (
      True  # Change this to False if you use https with a self signed certificate
   )
   def get_probe_pk_by_hostname(self, probe_hostname):
      print("ACTION: Get PK of Probe by given hostname")
      rest_point = "/appliances/appliance/"
      response = requests.get(
            self.url + rest_point, headers=self.headers, verify=self.verify
      )
      check_request(response)
      response = response.json()
      results = response["results"]
      for dict_item in results:
            if dict_item["address"] == probe_hostname:
               probe_pk = dict_item["appliance_id"]
      return probe_pk
   def troubleshoot_probe(self, probe_pk):
      print("ACTION on Probe: Troubleshoot")
      rest_point = f"/appliances/appliance/{probe_pk}/troubleshoot_steps/"
      response = requests.get(
            self.url + rest_point, headers=self.headers, verify=self.verify
      )
      check_request(response)
      response = response.json()
      params_list = []
      for d in response:
            for k, v in d.items():
               if "param" in k:
                  params_list.append(v)
      for step in params_list:
            rest_point = f"/appliances/appliance/{probe_pk}/troubleshoot/?query={step}"
            response = requests.get(
               self.url + rest_point, headers=self.headers, verify=self.verify
            )
            response = response.json()
            firstkey = next(iter(response))
            results = response[firstkey]["results"]["status"]
            print(
               "%s : %s"
               % (
                  response[firstkey]["title"],
                  str(response[firstkey]["results"]["status"]),
               )
            )
            if results is not True:
               raise Exception("Error. Probe troubleshoot is failing. Exiting...")
      return True
   def troubleshoot_scs(self):
      print("ACTION: Troubleshoot CNCS")
      rest_point = "/appliances/troubleshoot/steps/"
      response = requests.get(
            self.url + rest_point, headers=self.headers, verify=self.verify
      )
      check_request(response)
      response = response.json()
      params_list = []
      for d in response:
            for k, v in d.items():
               if "param" in k:
                  params_list.append(v)
      for step in params_list:
            rest_point = f"/appliances/troubleshoot/?query={step}"
            response = requests.get(
               self.url + rest_point, headers=self.headers, verify=self.verify
            )
            response = response.json()
            firstkey = next(iter(response))
            results = response[firstkey]["results"]["status"]
            print(
               "%s: %s"
               % (
                  response[firstkey]["title"],
                  str(response[firstkey]["results"]["status"]),
               )
            )
            if results is not True:
               raise Exception(
                  "Error. Clear NDR\ :sup:`®` troubleshoot is failing. Exiting..." + str(response)
               )
      return True
if __name__ == "__main__":
   print_helptext()
   CNCS_Rest = RestCall(token, url)
   check_url_is_reachable(url)
   check_host_is_up(probe_ip)
   CNCS_Rest.troubleshoot_scs()
   probe_pk = CNCS_Rest.get_probe_pk_by_hostname(probe_ip)
   CNCS_Rest.troubleshoot_probe(probe_pk=probe_pk)
#!/bin/bash
#Bash script to monitor the health of a Clear NDR\ :sup:`®` Central Server (CNCS)
#The following parameters should be passed on script execution
MANAGER_IP=$1
USER=$2
PASS=$3
VERBOSE=$4
SLEEP="sleep 3"
display_usage() {
echo -e "\nDescription:
      (1) Collect Troubleshooting Steps
      (2) Loop through all the troubleshoot steps
      (3) Generate and download troubleshoot file report
      (4) Check if downloaded file exists and is the correct size
      \nUsage:\n [manager_ip] [user] [password] (-v | --verbose)\n"
}
#define error codes
EXIT_WUSAGE="exit 10"
EXIT_WERROR_NOSSH="exit 40"
EXIT_WFAILCHECK="exit 55"
EXIT_WSUCCESS="exit 0"
# check whether user had supplied -h or --help . If yes display usage
if [ "$1" == "--help" ] || [ "$1" == "-h" ] || ([ $# -ne 3 ] && [ $# -ne 4 ]);then
   display_usage
   $EXIT_WUSAGE
fi
# check whether user had supplied -v or --verbose for verbose mode
if [ "$#" -eq 4 ] &&  ([ "$VERBOSE" == "--verbose" ] || [ "$VERBOSE" == "-v" ]);then
   set  -ex
   CURL=""
elif [ "$#" -eq 3 ];then
   CURL="-s"
else
   display_usage
   $EXIT_WUSAGE
fi
# check for ssh connectivity to the Clear NDR\ :sup:`®` probe
$(sshpass -p ${PASS} ssh -o "StrictHostKeyChecking no" -q -q ${USER}@${MANAGER_IP} exit)
if [ "$?" -ne 0 ]; then
   echo "SSH to SEE failed. Please check if it is up."
   $EXIT_WERROR_NOSSH
fi
#Check if file report laready exists and remove it if so
if [ -f scirius-enterprise* ]; then
   rm -rfv scirius-enterprise*
fi
#Generate Token
CR_TOKEN=$(sshpass -p ${PASS} ssh -o "StrictHostKeyChecking no" -t ${USER}@${MANAGER_IP} 'sudo lxc-attach -n scirius -- \
      sudo -u www-data /usr/share/python/scirius-pro/bin/manage.py drf_create_token scirius')
TOKEN=$(echo ${CR_TOKEN} | grep -o -P '(?<=token).*(?=for user)')
#Echo out generated Token
echo -e "\nToken Generated"
#Display Troubleshoot Steps List
TROUBLESHOOT_STEPS=$(curl ${CURL} -k https://${MANAGER_IP}/rest/appliances/troubleshoot/steps/ -H "Authorization: Token ${TOKEN}" -H 'Content-Type: application/json' -X GET)
echo -e "\nTroubleshooting Steps: "$TROUBLESHOOT_STEPS
${SLEEP}
##get param value from json , build array then loop through array
for row in $(echo "${TROUBLESHOOT_STEPS}" | jq -r '.[] | @base64'); do
   _jq() {
   echo ${row} | base64 --decode | jq -r ${1}
   }
   TROUBLESHOOT=$(curl ${CURL} -k https://${MANAGER_IP}/rest/appliances/troubleshoot/"?query=$(_jq '.param')"  -H "Authorization: Token ${TOKEN}" -H 'Content-Type: application/json' -X GET)
   echo -e "\nTroubleshooting: "$TROUBLESHOOT
   #check if the t-shoot steps is not failing
   if ! echo $TROUBLESHOOT | grep -q "\"status\":true" && echo $TROUBLESHOOT | grep -q "\"status\":false";then
      $EXIT_WFAILCHECK
   fi
done
#Generate and download troubleshoot file report
TROUBLESHOOT7=$(curl ${CURL} -k -O -J https://${MANAGER_IP}/rest/appliances/troubleshoot/report/  -H "Authorization: Token ${TOKEN}" -H 'Content-Type: application/json' -X GET)
echo -e "\nGenerateing and downloading troubleshoot file report: "$TROUBLESHOOT7
#Check if file has been downloaded and size is the expected one
if [ -f *tar.gz* ] && [ $(stat -c%s *tar.gz*) -gt 20000 ]; then
   ls -lha *tar.gz*
else
   echo "Dowloaded report file cannot be found or size less then 20 kbytes!"
   $EXIT_WFAILCHECK
fi
#!/bin/bash
#Bash script to monitor the health of a Clear NDR\ :sup:`®` Probe
#The following parameters should be passed on script execution
MANAGER_IP=$1
USER=$2
PASS=$3
PK_APPL=$4
VERBOSE=$5
display_usage() {
echo -e "\nDescription:
      (1) Collect Troubleshooting Steps
      (2) Execute the Troubleshoot steps of the selected probe
      (3) Generate and download troubleshoot file report
      (4) Check if downloaded file exists and is the correct size
      \nUsage:\n [manager_ip] [user] [pass] [probe-PK] (-v | --verbose)\n"
}
#define error codes
EXIT_WUSAGE="exit 10"
EXIT_WERROR_NOSSH="exit 40"
EXIT_WFAILCHECK="exit 55"
EXIT_WSUCCESS="exit 0"
# check whether user had supplied -h or --help . If yes display usage
if [ "$1" == "--help" ] || [ "$1" == "-h" ] || ([ $# -ne 4 ] && [ $# -ne 5 ]);then
   display_usage
   $EXIT_WUSAGE
fi
# check whether user had supplied -v or --verbose for verbose mode
if [ "$#" -eq 5 ] &&  ([ "$VERBOSE" == "--verbose" ] || [ "$VERBOSE" == "-v" ]);then
   set  -ex
   CURL=""
elif [ "$#" -eq 4 ];then
   CURL="-s"
else
   display_usage
   $EXIT_WUSAGE
fi
# check for ssh connectivity
$(sshpass -p ${PASS} ssh -o "StrictHostKeyChecking no" -q -q ${USER}@${MANAGER_IP} exit)
if [ "$?" -ne 0 ]; then
   echo "SSH to Manager failed. Please check if the Probe is up."
   $EXIT_WERROR_NOSSH
fi
#Generate Token
CR_TOKEN=$(sshpass -p 'snpasswd' ssh -o "StrictHostKeyChecking no" -t snuser@${MANAGER_IP} 'sudo lxc-attach -n scirius -- \
      sudo -u www-data /usr/share/python/scirius-pro/bin/manage.py drf_create_token scirius')
TOKEN=$(echo ${CR_TOKEN} | grep -o -P '(?<=token).*(?=for user)')
#Echo out generated Token
echo "Token Generated"
#Display Troubleshoot Steps List
TROUBLESHOOT_STEPS=$(curl ${CURL} -k https://${MANAGER_IP}/rest/appliances/appliance/$PK_APPL/troubleshoot_steps/ -H "Authorization: Token ${TOKEN}" -H 'Content-Type: application/json' -X GET)
echo -e "\nTroubleshooting Steps: "$TROUBLESHOOT_STEPS
##get param value from json , build array then loop through array
for row in $(echo "${TROUBLESHOOT_STEPS}" | jq -r '.[] | @base64'); do
   _jq() {
   echo ${row} | base64 --decode | jq -r ${1}
   }
   TROUBLESHOOT=$(curl ${CURL} -k https://${MANAGER_IP}/rest/appliances/appliance/$PK_APPL/troubleshoot/"?query=$(_jq '.param')"  -H "Authorization: Token ${TOKEN}" -H 'Content-Type: application/json' -X GET)
   echo -e "\nTroubleshooting: "$TROUBLESHOOT
   if ! echo $TROUBLESHOOT | grep -q "\"status\":true" && echo $TROUBLESHOOT | grep -q "\"status\":false";then
      $EXIT_WFAILCHECK
   fi
done
#Generate and download troubleshoot file report
TROUBLESHOOT17=$(curl -k ${CURL} -O -J https://${MANAGER_IP}/rest/appliances/appliance/$PK_APPL/troubleshoot_report/  -H "Authorization: Token ${TOKEN}" -H 'Content-Type: application/json' -X GET)
echo "Generating and downloading troubleshoot file report: "$TROUBLESHOOT17
#Check if file has been downloaded and size is the expected one
if [ -f *tar.gz* ] && [ $(stat -c%s *tar.gz*) -gt 20000 ]; then
   ls -lha *tar.gz*
else
   echo "Dowloaded report file cannot be found or size less then 20 kbytes!"
   $EXIT_WFAILCHECK
fi
Monitoring over SNMP¶
SNMP (Simple Network Management Protocol) is a widely used protocol for network management. The purpose of SNMP is to allow administrators to monitor and manage devices from a central location.
SNMP provides a standardized way for network devices to report their status, performance, and other information to a central monitoring system. This information can then be used to identify and troubleshoot issues on the devices, optimize network performance, and plan expansion.
SNMP works by using a client-server model, where SNMP agents run on the network devices and report information to an SNMP manager or monitoring system.
SNMP is available on the CNCS/Probe and could be easily installed with:
sudo apt-get install snmpd
To configure the daemon, edit /etc/snmp/snmpd.conf. Save the configuration file and restart the daemon.
sudo systemctl restart snmpd
Hint
Refer to SNMP’s user manual to properly configure the SNMP daemon according to the specification of your environment.
