In this article we discuss the High Availability/Disaster Recovery for the NetBackup Master Server. NetBackup is a scalable, heterogeneous backup system which can be used to backup primary but also secondary site. DR for NetBackup Master Server should be prepared and can be based on follow possibilites:
Disaster Recovery with full or partial catalog recovery
Full catalog recovery is the simplest DR and primarily used to recover the catalog if the data is corrupted or storage is lost at the production site. Full catalog recovery is recommended for single domain configurations and used if the DR site has the same number of media servers with the same names as those used at the production site. The media servers that do not exist in the DR environment are deactivated to avoid unnecessary pooling. All device records are removed because the device configuration at the DR site can be different to the production site. Device discovery is run to update the EMM database.
Partial catalog recovery is recommended for multi-domain configurations and used for DR sites where the server layout is different from the production site with fewer media servers, different library types, etc. Partial catalog recovery recovers only the flat file components and not the relational database. The details of the existing infrastructure (servers, devices etc.) at the DR site is not lost during the recovery process.
For both types of Catalog recovery, it is required to transfer client and Catalog backups from the production/primary site to the secondary site. It can be done via replication, tape duplication and import.
NetBackup Clustered Master Server with catalog replication
NetBackup supports clustering of the master servers using the following clustering technologies:
- Microsoft Cluster Server
- Veritas Cluster Server on Windows 2003, 2008, 2008R2
- Veritas Cluster Server on UNIX/Linux
- HP Service Guard
The NetBackup master server nodes on both the sites must be configured as clustered master servers, although they can be single node clusters on each site.The NetBackup master server can only run on one node of the cluster at any one time. In a replicated environment, the cluster members at both sites effectively form a single cluster.
NetBackup supports max 4 nodes per Clustered Master Server so there are possible below cases:
- Single node cluster at both sites - this configuration requires two nodes—one node at each site.
- Dual node on primary site and single node on secondary site - this configurations requires three nodes - two nodes at primary site, one node at secondary site.
- Dual nodes at both sites - this configuration requires 4 nodes - two nodes at each site.
The catalog replication can be done via e.g. mirroring, synchronized array-based replication (Stretched SAN), Veritas Volume Replicator (VVR).
NetBackup Non-Clustered Master Server with catalog replication
NetBackup supports to be installed as Non-Clustered Master Servers with catalog replication done via 3rd solution (e.g. synchronized array-based replication). In practical, there are two master servers, one of each sites but only one is active and there is configured the catalog replication to the second site. NetBackup Master Server installation is done on virtual host name so it is required to create a DNS alias name for the Master Server. The DNS alias name ensures a failover to the secondary master server.
NetBackup Master Server as a virtual machine secured by VMware DR solutions
Recently, more customer have implemented virtualized Master Server on VMware. This is a very useful option because you can secure NetBackup VM using VMware High Availability HA and Site Recovery Manager.
VMware High Availability (HA) monitors ESXi hosts and/or VMs and automatically restarts failed NetBackup Master Server VM on other ESXi hosts when a server failure is detected as well as automatically restarting virtual machines when an operating system failure is detected. So the NetBackup Master Server VM can be up and running again in just the time that it takes them to reboot.
Site Recovery Manager (SRM) is used to failover VMs to the secondary site incase primary site goes down. SRM uses array-based replication or vSphere replication (replication by ESXi hosts).
NetBackup Master Server Standby
Symantec supports to have a Master Server in a "cold stand-by" state. This is a NetBackup Master Server which is installed and configured with exactly similar configuration (hostname, IP, hardware, NBU version) as "Active Master Server" but will always remain shutdown unless required to power on. You do not need an additional license.
You need to transfer backup images (clients backups and NetBackup catalog backups.) to the second site (generally tapes).
This is the cheapest DR for a small NetBackup environments (one master/media server), active/passive sites and often used when there are not available below solutions:
- Streched SAN
- Array-based replication, AIR or VVR
- Stretched LAN or good WAN
When you need to failover to the second site, you need to run a standard Catalog recovery from tape at the secondary site.
NetBackup Media Server reinstalled as a Master Server
It is possible to use one Media Server at DR site, re-install it as a Master Server and restore Catalog. This is a time consuming process because you need to uninstall NetBackup Media Server binaries and install it again as Master Server. Practically, Master and Media Server have the same binaries (and it is possible to "convert" master as a media only...) but the officially supported way is reinstallation.
There is a one requirement - you need able to use the same Master Server hostname in DR. I have had a situation when my customer could not use the same hostname because security policies. Simply put: the hostname has been specific for site and hardware.
NetBackup DR possibilites - comparison
DR Solution | Requirements | Pros | Cons |
---|---|---|---|
Full Catalog Restore | - Catalog backup needs to be transfered via replication or off-site vault - the same Master Server hostname needs to be used in DR site | - good for small, single NBU domain environment | - RTO depends on size of Catalog - procedure: manual failover |
Partial Catalog Restore | - Catalog backup needs to be transfered via replication or off-site vault | - good for multiple NBU domains | - RTO depends on size of Catalog |
Clustered Master Server | - special Windows Edition with clustering supports (e.g. Windows Enterprise 2008 or 2012 Standard) or Veritas Cluster Server - Stretched SAN or VVR | - good RTO and RPO - easy failover and maintenance (e.g. site) | - expensive: infrastructure and software |
Non-Clustered Master Server | - 3rd replication (array-based, VVR) - DNS alias name | - costs (no additional license required e.g. clustering) - good RTO and RPO | - procedure: manual failover |
Master - Cold Stand-by | - the NBU server (idle) with the same hostname etc at the secondary site | - good for active/passive sites - no additional NetBackup license required | - RTO depends on size of Catalog - manual failover via Catalog restore - wasting hardware (server) |
Media Server in DR reinstalled as Master Server | - the same Master Server hostname needs to be used in DR site | - good for single NBU domain environment | - low RTO, depends on speed of reinstall and reconfigure Media Server as Master Server; size of Catalog (restore) - procedure: manual |
Conclusion
As I described above, there are some ways to implement a DR for NetBackup Master Server. The best solution should be planned based on:
- Budget 😉
- Recovery Time Objective (RTO) and Recovery Point Objective (RPO) requirements
- Type of NetBackup domain: single or multiple
It is not a seldom situation when Disaster Recovery Plan (DRP) for NetBackup has to be changed. Example, my customer wanted to migrate standalone Master Server to the cluster (clustering a standalone Master Server) to having an easier failover between sites, better RTO and RPO. This is also possible but it can be done only using NetBackup Catalog Manipulation services with assistance of Authorized Consultant.