Automate Nagios Monitoring with Ansible

Posted on June 24, 2016 by Will Foster

Infrastructure monitoring is important and there’s lots of tools for it. I’m fond of Nagios – it’s very mature and customizable but it’s also a giant pain in the ass to setup. Here’s how to save time deploying and managing Nagios with Ansible.

What Does it Do?

Automated deployment of Nagios server on CentOS7 or RHEL7
Automated deployment of Nagios client on CentOS6/7/8, RHEL6/7/8, Fedora or FreeBSD
Generates hosts and service checks based on Ansible inventory
Generates configuration/services based on easy-to-manage jinja2 templates
Deploys comprehensive, configurable checks for all hosts/services
Sets up comprehensive checks for the Nagios server itself
Wraps Nagios interface with Apache/SSL
Sets up firewall rules for Nagios/HTTP
Currently monitors 10 types of resources:
- Generic Linux server (check ping, ssh, uptime, load, users, processes, disk space, swap, zombie procs, mdadm raid)
- Webservers (same as Linux server plus TCP/80 for webservers)
- Elasticsearch (same as Linux server plus elasticsearch)
- FreeNAS Appliances (check ping, ssh, volume, disk and alert status)
- ELK server (same as Linux server plus elasticsearch and Kibana)
- Jenkins CI (same as linux server plus TCP/8080 and optional reverse proxy with authentication)
- DNS servers (same as Linux server plus DNS service checks)
- Network switches (check ssh, ping)
- Out-of-band interfaces (check ssh, ping, http)
- Dell iDRAC server health monitoring (using SNMP, see example below)
  - (all configurable: CPU, DISK, RAID, TEMP, FANS, MEM, POWER)
- SuperMicro health monitoring (using IPMI).
  - Split out by SuperMicro server type and checks

Getting Started
First, clone the git repo locally. You’ll need Ansible installed prior to this.

git clone https://github.com/sadsfae/ansible-nagios

Setup your Hosts Inventory
Next you’ll want to edit a few variables in your hosts (inventory) file. Most importantly you’ll want to change host-01 to whatever your nagios server should be.

cd ansible-nagios
sed -i 's/host-01/yournagioshost/' hosts

Add Resources to be Monitored (optional)
Add servers/resources you want monitored to your inventory file, they’ll be automatically added to monitoring in the Nagios configuration. Servers/resources simply need to be reachable via SSH, have proper SSH keys setup for the root user and Python installed.

[nagios]
host-01

[webservers]
webserver01

[switches]
switch01 ansible_host=192.168.0.101
switch02 ansible_host=192.168.0.102

[oobservers]
webserver01-idrac ansible_host=192.168.0.105

[servers]
server01

[servers_with_mdadm_raid]

[dns_with_mdadm_raid]

[dns_with_mdadm_raid]

[freenas]

[elasticsearch]

[elkservers]

[jenkins]

[idrac]
database01-idrac ansible_host=192.168.0.106

[supermicro-6048r]

Any hosts you add to under the [servers], [dns], [jenkins], or [webservers] groups will inherit a full battery of common service checks.

Each inventory hostgroup is a unique category, you cannot have a host listed under more than one so pick the one that fits each of your hosts best.

Note: For switches, idrac and out-of-band interfaces you must use the ansible_host=ip.address variable as well as illustrated above, this is because most appliances will not have Python installed (so Ansible can gather facts from them). In Nagios it will still use the alias name listed.

Change Nagios Password and Options (optional)
Part of the setup is generating the Nagios admin user, you’ll want to change this value from the default. Configurable variables are located in install/group_vars/all.yml. You can change this at any time if you like however.

sed -i 's/changeme/yourpasswordhere/' install/group_vars/all.yml

You might want to change the notification settings if you wish to receive external email alerts, and the contacts.cfg.j2 file is templated for further modification.

admin_name_01: nagiosadmin
admin_email_01: nagios@localhost

Lastly it creates a read-only guest user for nagios, you can turn this off by setting the following to false or changing the settings as you see fit:

nagios_create_guest_user: true
nagios_ro_username: guest
nagios_ro_password: guest

If you don’t want to automatically create iptables or firewalld rules for either nagios server or the NRPE nagios client modify these:

manage_firewall: true
manage_firewall_client: true

Run the Playbook
Now you’re ready to deploy Nagios. Run the Ansible playbook.

ansible-playbook -i hosts install/nagios.yml

Access your Nagios Instance
Now you should be able to access your Nagios interface via https://yourserver/nagios. Below is an example I’ve setup locally on a few home servers.

If you haven’t setup any external hosts to monitor in your Ansible inventory you’ll just see the localhost where Nagios is running and slew of standard checks.

Known Issues
SELinux occasionally breaks checks as policy files are updated. If you get the following error during your playbook run:

avc:  denied  { create } for  pid=8800 comm="nagios" name="nagios.qh

You need to do the following (or disable SELinux entirely):

cat /var/log/audit/audit.log | audit2allow -M mynagios
semodule -i mynagios.pp

Now restart Nagios:

systemctl restart nagios
systemctl restart httpd

I don’t want to make any assumptions about your environment for you so that’s why I am not going to be forcing setenforce 0 or the like.

Deployment Video
I’ve put together a simple video showing the deployment, it takes about 2minutes to deploy a full Nagios stack and set of remote server/resource checks.

Dell iDRAC Server Health Example
I’ve recently added support for monitoring all server health values via SNMP against the iDRAC interface, this is built upon someone else’s already-existing idrac check. The big benefit of this is you don’t need to stress the actual server at all or install NRPE in cases where you need to optimize performance but still want to have a battery of health checks in case something breaks.

These checks are all configurable in install/group_vars/all.yml. Here’s what it looks in the Nagios dashboard:

Here is what it looks like from the service details view (quite exhaustive).

Further Automation and Extending
Remember that all the configurations are generated via jinja2 templates, if you want to expand this to include different server types and checks you can simply branch out your hostgrouptype.conf.j2 files as I have done in a minimal fashion and add the corresponding entries to your Ansible inventory.

We will be utilizing this along with Ansible dynamic inventory and Foreman to auto-generate our list of monitored services to do various tasks, one of which will be to generate and maintain an up-to-date monitoring system with Nagios automatically as hosts change or get added.

Questions or suggestions are welcome, feel free to add a comment below, or file an issue on Github.

About Will Foster

hobo devop/sysadmin/SRE

View all posts by Will Foster →

This entry was posted in open source, sysadmin and tagged ansible, automation, devops, idrac monitoring, ipmi, linux, monitoring, nagios, snmp, supermicro monitoring, sysadmin. Bookmark the permalink.

35 Responses to Automate Nagios Monitoring with Ansible

Wayne Rousey says:

September 21, 2016 at 4:02 pm

Very helpful – cheers!

LikeLike

Reply
driveby says:

November 20, 2016 at 9:30 pm

Eager to give it a spin.
What component I would add is livestatus for running integration tests against it.

LikeLike

Reply
- Will Foster says:
  
  November 20, 2016 at 10:05 pm
  
  Hey @driveby, that would be useful. Right now I’m just focusing on expanding the templated checks (I’ll be adding IPMI / idrac environmental checks for servers soon). I also test everything extensively in VMs or against bare-metal (in case of the IPMI checks) before I push anything but there’s no replacement for proper integration testing for sure.
  
  LikeLike
  
  Reply
GL says:

July 15, 2017 at 11:43 am

Impressive. One thing I’m wondering, about how long did it take to setup this configuration? I’m working towards similar ends, but this is beyond. Very nice.

LikeLike

Reply
- Will Foster says:
  
  July 15, 2017 at 10:02 pm
  
  Hi GL, the entire setup takes about 2-3 minutes with a few dozen hosts. Edit your inventory file and off it goes, note that you shouldn’t put the same host into more than one group (e.g. servers, webservers etc) due to how it’s templated. If you see any check failures or issue with the Nagios service you may need to set setenforce 0 and change SELinux to permissive in /etc/selinux/config until you can apply the proper policy or label(s) needed.
  
  LikeLike
  
  Reply
  - GL says:
    
    July 16, 2017 at 11:02 am
    
    ah, i meant time to write the ansible / nagios configuration :) weeks? months?
    
    LikeLike
  - Will Foster says:
    
    July 17, 2017 at 8:29 am
    
    ah, i meant time to write the ansible / nagios configuration :) weeks? months?
    
    I think it took a day or two, once I figured out how to iterate configuration loops in Jinja2 with Ansible facts it came together shortly afterwards.
    
    LikeLike
GL says:

July 15, 2017 at 3:18 pm

ps – i like how the entire package is self contained… cool

LikeLike

Reply
jsingamsetty says:

August 4, 2017 at 5:44 am

Hi i could see below error

AnsibleUndefinedVariable: ‘dict object’ has no attribute ‘ansible_default_ipv4′”}

TASK [nagios-client : Setup NRPE client configuration] ***************************************************************************
fatal: [SErver02]: FAILED! => {“changed”: false, “failed”: true, “msg”: “AnsibleUndefinedVariable: ‘dict object’ has no attribute ‘ansible_default_ipv4′”}

LikeLike

Reply
- Will Foster says:
  
  August 4, 2017 at 7:38 pm
  
  Can you paste your ‘hosts’ file? Opening a Github issue is probably the best place to help with this.
  
  LikeLike
  
  Reply
- Jcob says:
  
  August 16, 2018 at 3:55 pm
  
  Did you ever find a solution for this?
  
  LikeLike
  
  Reply
  - Will Foster says:
    
    August 20, 2018 at 12:11 am
    
    Likely that this error happens when you try to add a host that doesn’t support Python (so no fact collection is possible) without using the ansible_host variable, e.g. switches, routers, out-of-band devices, etc.
    
    For those they have their own inventory group / format.
    
    Per the README example:
    
    [switches]
    switch01 ansible_host=192.168.0.100
    switch02 ansible_host=192.168.0.102
    
    LikeLike
AK says:

August 31, 2017 at 11:10 am

Hi,
Nice work done, please let me know how to add other centos versions and other types of servers to be monitored?

LikeLike

Reply
- Will Foster says:
  
  August 31, 2017 at 11:40 am
  
  Hi,
  Nice work done, please let me know how to add other centos versions and other types of servers to be monitored?
  
  Hi AK,
  
  Adding other server types is easy, you just need to add them underneath the “hosts” file for the right category, for example if you have three linux servers and three linux web servers you’d split them out underneath the proper category (note you cannot have the same server in more than one category).
  
  [servers] server01 server02 server03
  [webservers] server04 server05 server06
  
  If you wanted to extend the types of server checks you’ll just need to write jinja2 templates for them here and ensure they are generated in the main Nagios Ansible tasks here. You can use one of the existing ones to model a new one off of that.
  
  LikeLike
  
  Reply
Dheerendra says:

September 6, 2017 at 9:57 am

Excellent work! Any suggestions on, how can i make it work for CentOS6.6. I was thinking of editing main.yml file.
Thanks in Advance!

LikeLike

Reply
- Will Foster says:
  
  September 6, 2017 at 7:29 pm
  
  Hi Dheerendra,
  
  I haven’t tested this on RHEL/CentOS6 and don’t really plan to support it but you’ll at least need to edit two areas where we check for this:
  
  Edit main.yml and remove the OS/distribution check here
  
  Edit main.yml and modify the EPEL RPM repository for CentOS6/RHEL6
  
  You may also want to substitute systemctl commands with service commands.
  
  Let me know how it goes.
  
  LikeLike
  
  Reply
Narpet says:

March 7, 2018 at 5:30 pm

We use nagiosdev, nagiosqa and nagiosprod useraccounts instead of nagios i. Also the group names would be nagiosdev, nagiosqa and nagiosprod. How to do this.

LikeLike

Reply
- Will Foster says:
  
  March 7, 2018 at 6:38 pm
  
  Hi Narpet, this is a one-line change here:
  
  https://github.com/sadsfae/ansible-nagios/blob/master/install/group_vars/all.yml#L17 in the setting nagios_username:
  
  Just change nagiosadmin to whatever you want the primary username to be per environment.
  
  After that just run the playbook once you’ve added each server underneath the proper Ansible inventory group that corresponds to what you want monitored (only put a server in one group).
  
  For groups, are you referring to Nagios groups?
  
  This playbook isn’t setup in a way to put all servers into the same groups, nor does Nagios by design want you to do this.
  Typically you want to put servers you are monitoring underneath the appropriate Ansible inventory group that corresponds with the server, device or service type you are monitoring.
  
  On the Nagios side this lets you organize sets of similar servers doing the same function into the same Nagios Host Group based on their function, but it’s also why you probably want a separate Nagios instance per environment. This has other added benefits like separate alerting, contact and notification behavior that you probably want to have between different environments (e.g. you probably don’t want full alerting on in development but you probably do want it in prod).
  
  In your case with different environments it might make more sense to have separate Nagios instances per environment: nagiosdev, nagiosqa and nagiosprod.
  
  You could collapse all of this if you really wanted to however but you’d need to edit and manage the hostgroup_name variable in the templates section of the playbook and do some restructuring of how that maps to the Ansible inventory groups:
  
  https://github.com/sadsfae/ansible-nagios/tree/master/install/roles/nagios/templates
  
  As things are set up this is designed to have Ansible inventory groups map to Nagios Host groups, adding each server underneath only one inventory group that describes it best. You can see that some of the generic health checks are inherited and only differentiate based on the primary service (e.g. jenkins, DNS, etc).
  
  I hope this helps.
  
  LikeLike
  
  Reply
Narpet says:

March 12, 2018 at 3:29 pm

Thanks. This is great. I am starting to test this.

LikeLike

Reply
Narpet says:

March 12, 2018 at 5:57 pm

How do i disable installing nagios core. Also i cannot access any repo outside the company. I have to install the linux-nrpe-agent.tar.gz package. How do i automate this method. Please advise.

LikeLike

Reply
- Will Foster says:
  
  March 14, 2018 at 4:53 am
  
  I think you’re better off pointing your system to some local RPM repository ocation if you cannot access public mirrors.
  
  You’d want to comment this out and install lay down a repository config that points to some local location where the EPEL / Nagios RPMs are located.
  
  https://github.com/sadsfae/ansible-nagios/blob/master/install/roles/nagios/tasks/main.yml#L18:21
  
  I’d run through the playbook on a host you can use to have access externally to get a list of all the packages you’d need for that repository.
  Then you’d want to create an RPM repository out of that location.
  
  Lastly, you can use the Ansible repository module:
  
  http://docs.ansible.com/ansible/latest/yum_repository_module.html
  
  So far as nagios-core, I guess you mean nagios and nagios-common? If you don’t want to not install certain packages you can edit the items array here, however i would advise against this because everything you need is required for proper nagios operation.
  
  https://github.com/sadsfae/ansible-nagios/blob/master/install/roles/nagios/tasks/main.yml#L52
  
  Your best bet is to just ensure all needed packages are located somewhere locally inside your security requirements and install that way, managing a local RPM repository and copying the packages there is probably the easiest way.
  
  LikeLike
  
  Reply
  - Bivabari says:
    
    April 15, 2019 at 10:04 am
    
    Hello ,I have already exists nagios core sever .
    Only wanted to deploy nrpe client and plugins ,could you pls suggest me?
    
    LikeLike
  - Will Foster says:
    
    April 15, 2019 at 6:47 pm
    
    HI Bivabari, the playbook I maintain has both server + client packages tightly integrated, you can decouple these and try running only the nagios-client playbook but you may be better off just using Ansible to install the packages and drop any configs in a standalone way.
    
    e.g. to install them on a set of hosts, make an inventory file called hosts with a section called [nagios_clients]:
    
    ansible nagios_clients -m yum -a “name=nrpe state=latest”
    
    Now if you wanted to drop the configs (assuming the same config can be deployed everywhere) have a copy of it locally called nrpe.cfg
    
    ansible nagios_clients -m copy -a ‘src=nrpe.cfg dest=/etc/nagios/nrpe.cfg’
    
    Now set nrpe to start and to start on boot
    
    ansible nagios_clients -m service -a “name=nrpe state=started”
    
    ansible nagios_clients -m service -a “name=nrpe state=enabled”
    
    LikeLike
dontknow says:

October 1, 2018 at 10:52 am

What if you just want to add a bunch of hosts to a already up and running nagios?

LikeLike

Reply
- Will Foster says:
  
  October 1, 2018 at 1:31 pm
  
  What if you just want to add a bunch of hosts to a already up and running nagios?
  
  This is really for a new installation because it makes a lot of changes to your Nagios configs, templates them and the ServiceGroup and HostGroup changes may not jive with what you’re currently using.
  
  LikeLike
  
  Reply
Bivabari says:

April 17, 2019 at 9:43 am

Hello ,
Thanks for the update .
ansible nagios_clients -m yum -a “name=nrpe state=latest”————In this need little clarity.
nagios_clients——>is this a yml file .can you pls help what could be format inside nagios_clients

And also ,Can I use your package in this ,by removing nagios core part only keeping nagios_client code.Its failing in my case.

[root@goldfinch install]# cat nagios.yml
—
#
# Playbook to install nagios server, clients and
# generate service checks based on Ansible inventory
#

# we need to collect facts from all hosts we reference
# https://github.com/ansible/ansible/issues/9260
# we skip switches/oobservers because they normally don’t
# have python installed.

– hosts: all
remote_user: “{{ ansible_system_user }}”
tasks: []

# role for nagios clients via NRPE
– hosts: all
remote_user: “{{ ansible_system_user }}”
roles:
– { role: nagios-client }
– { role: firewall_client, when: manage_firewall_client }

LikeLike

Reply
- Will Foster says:
  
  April 17, 2019 at 1:12 pm
  
  nagios_clients refers to an inventory file containing hosts you want to run ansible against:
  
  https://docs.ansible.com/ansible/latest/user_guide/intro_inventory.html
  
  This is a YAML file containing the destination hosts underneath a header like this, with the filename being “hosts”
  
  [nagios_clients]
  host01
  host02
  host03
  
  You can try this, but I would probably run this standalone using the ad-hoc module commands in my previous reply since all you’re doing is installing it, dropping a configuration file and starting the service. This would assume you have a config file already setup.
  
  ansible nagios_clients -m yum -a “name=nrpe state=latest”
  ansible nagios_clients -m copy -a ‘src=nrpe.cfg dest=/etc/nagios/nrpe.cfg’
  ansible nagios_clients -m service -a “name=nrpe state=started”
  ansible nagios_clients -m service -a “name=nrpe state=enabled”
  
  I don’t think I can help you much deviating from the scope of the playbook as the two are tightly integrated together, with the NRPE checks being generated/managed on the Nagios server side with the rest of the Ansible playbooks, however if you can get things to work this way that’s great.
  
  LikeLike
  
  Reply
varun rayala says:

May 15, 2019 at 4:57 am

I already have nagios server and now i need to write an ansible playbook to schedule downtime and
silence and unsilence checks on other hosts
i tried below config but not sure where i need to define nagiso server,url,user and password details
# set 30 minutes of apache downtime
– nagios:
action: downtime
minutes: 30
service: httpd
host: ‘{{ inventory_hostname }}’

LikeLike

Reply
- Will Foster says:
  
  May 17, 2019 at 12:40 am
  
  My hunch is you’d want to look into the uri module for Ansible, it will let you perform POST requests – you’d want to model the API/URL for Nagios per service and probably pass the service and host(s) you want to silence or unsilence as a variable(s) when running the playbook.
  
  https://docs.ansible.com/ansible/latest/modules/uri_module.html
  
  This is something you’d write yourself and isn’t in the scope of the playbook here.
  
  You might also want to look at interacting with the API directly, there’s a library or two floating around for this but I haven’t tried it.
  
  https://pypi.org/project/nagios-api/
  
  LikeLike
  
  Reply
LL says:

June 17, 2020 at 8:46 am

Hi, thanks for this. I’ve noticed httpd and Nagios being installed on ALL servers too. Is it just my setup ?

LikeLike

Reply
- Will Foster says:
  
  June 18, 2020 at 12:29 pm
  
  Hi LL, you should only have one server and everything else is a client. (unless you want separate Nagios servers per sub-domain/environment etc).
  
  e.g.
  
  [nagios]
  my-nagios-server
  
  Everything else would be a client of some sort below this in the example inventory file, corresponding to the checks or category they fit into – and only have one unique category/client/inventory group per host (don’t list it in more than one location, this is why there is some overlaps in checks like the “server” role being inherited plus other functionality).
  
  HTTPD is required because Nagios does not have it’s own web server it’ just Perl CGI.
  
  Hope this helps, if you have any more questions please let me know.
  
  LikeLike
  
  Reply
txfunnkkvy says:

November 20, 2021 at 9:09 am

Muchas gracias. ?Como puedo iniciar sesion?

LikeLike

Reply
LL says:

September 25, 2023 at 4:45 am

Thanks, any chance of adding PnP4Nagios ?

LikeLike

Reply
- Will Foster says:
  
  September 25, 2023 at 1:56 pm
  
  > Thanks, any chance of adding PnP4Nagios ?
  
  Hi LL, I took a look at PNP4Nagios and it looks useful but I don’t think it’s maintained anymore?
  
  https://github.com/lingej/pnp4nagios
  
  LikeLike
  
  Reply
  - LL says:
    
    September 27, 2023 at 12:38 am
    
    hi, thanks for the reply and consideration. something I saw that looks interesting is Nagios Core – Performance Graphs Using InfluxDB + Nagflux + Grafana + Histou
    https://support.nagios.com/kb/article/nagios-core-performance-graphs-using-influxdb-nagflux-grafana-histou-802.html#RHEL
    I may give that a go :)
    
    LikeLike