Exchange 2007 Cluster Adventures

In late 2009, I was hired by a company to be their senior server engineer. One of my first projects was to implement Exchange Server 2007 on a Windows Server 2008 Single Copy Cluster for high availability. That was the easy part; read on to learn how life progressed through the next few years.

Hardware

The company purchased the server hardware before I was hired. Fortunately, it was reasonable hardware and would work for the time being, though I would have liked to work with slightly better hardware.

As purchasedWould have Preferred
Intel Xeon E5504 2.0 GHz
4 GB RAM
RAID 1 disk for O/S and Applications
4 Gb Fiber to SAN
Redundant Power Supply
Dual Intel Xeon E5504 2.0 GHz
12 GB RAM
RAID 1+0 for O/S and Applications
Dual 8 Gb Fiber to SAN
Redundant Power Supply

Exchange by nature of the beast will consume every drop of available memory, just like SQL Server. 4 Gigabytes of Memory was barely enough for Exchange 2007 and the Operating System to behave. As time progressed and user demand became more intense, the memory was upgraded to 20 GB on each node of the cluster.

The Intel Xeon E5504 2.0 GHz processor is a Quad-core processor that would be sufficient for a smaller environment. Higher core count processors weren’t available at the time, therefore two processors would have been a better choice for the size of our environment. Today, this server is struggling to meet the demands of our user base and we frequently receive complaints.

When it comes to disk arrays, typically the more spindles (disks) in an array means better performance. Using RAID 1+0 allows the use of multiple smaller, faster disks to help improve O/S performance and add resiliency.

Since the servers were being used as a cluster, management felt that only a single SAN connection was necessary from each node. Using two connections, along with appropriate driver software, would allow redundancy in case of fiber or HBA failure plus the possibility of doubling the bandwidth. Just as any database, Exchange is intensive for disk I/O, dual paths allow double throughput and improves user experience.

Redundant power supplies are essential for critical systems. Under normal circumstance if a power supply fails, the remaining power supply(s) will continue powering the server without any downtime. Hot-pluggable redundant power supplies can even be replaced without shutting down the system.

Initial Exchange 2007 Cluster Deployment

Exchange 2007 Server would not install on a Windows 2008 Server; Exchange Server 2007 was designed and released prior to Windows 2008 and therefore wanted Windows Server 2003. Being six years after release, I wasn’t willing to deploy on Windows Server 2003.

I had to slipstream service pack one into the installer to install on Windows Server 2008. I had not slipstreamed in an installer before, so that was my first adventure. Thank the geeks of the world for the internet where I found the instructions for slipping SP1 into the Exchange install source.

Drive Configuration

LUN = Logical Unit Number (“Drive” space on a SAN [Storage Array Network])

Using Microsoft’s Technet Guide for Exchange 2007 cluster storage design and mailbox information in the Exchange 2003 server, I was able to do a little math and determine that four private information stores would work well plus one public information store. LUNs were created on our SAN for a “Log” volume and a “Store” volume that each information store would possess.

DrivePurpose
A
B
C
D
E
F
G
H
I
J
K
L
M
N
O
P
Q
R
S
T
U
V
W
X
Y
Z
Reserved by OS
Reserved by OS
Local drive for OS
Local drive for Apps
SAN LUN for Private Store 1
SAN LUN for Private Store 1 Logs
<unused>
Reserved for Home Drive mapping
SAN LUN for Private Store 2
SAN LUN for Private Store 2 Logs
SAN LUN for Private Store 3
SAN LUN for Private Store 3 Logs
SAN LUN for Private Store 4
SAN LUN for Private Store 4 Logs
<unused>
<unused>
SAN LUN for Cluster Quorum
<unused>
<unused>
<unused>
<unused>
<unused>
<unused>
<unused>
<unused>
Local Optical Drive

As you see, we used more than half the letters of the alphabet right off the bat.
Note: Windows Cluster will not allow the use of letter A and B for clustered drives.

Using SAN for drive space instead of local disk allows sharing disk between systems for SCC (Single Copy Cluster) and also expansion of drives as space requirements change.

Mailbox Limits

Our initial four information stores kept growing and it was necessary to add more drives for storing out of control mailboxes. As two new drives were mounted on the server for store number five (and its logs), I pleaded with company management to allow mailbox limits be enabled. Users were not managing their mail and the mailboxes were out of control.

Microsoft recommends that an Exchange Server 2007 Information Store not exceed 100 GB to ensure reasonable backup and recovery ability when not using continuous replication.1

Using Microsoft’s guidelines on maximum database size for us as a starting point, I did a thorough analysis with number of mailboxes and current mailbox sizes plus planning for 5% growth and 10% exceptions. I recommended a 500 MB mailbox limit with “Manager Approved” exceptions of 750 MB; anything higher requiring executive approval. This would allow our five private information stores to serve the user base for several years, maintaining high performance and availability.

Culture shock is the biggest barrier when implementing mailbox and/or message size limits. “We’ve always done it this way” attitudes of persons that simply aren’t able to compromise for the good of the organization. I was approved to implement a 5 GB mailbox limit. A far cry from the 500 MB I proposed and there is not enough letters in the alphabet to provide enough information stores given the 100 GB database recommendation.

Today we have ten private information stores with the largest store at 170 GB. There are no letters left in the alphabet. I am considering using Mount points for our future Exchange 2013 environment, but that’s a completely different conversation for another day.

Redundancy for Disaster Recovery

Single Copy Cluster is as the name suggests – There is only one copy of the data on disks shared between the two systems in the cluster. If the SAN should fail or some other catastrophic event occurs, e-mail functionality needs to resume promptly. Replication is needed to copy the mailbox data somewhere else safely.

We have a secondary data center located about one thousand miles away with a 100 Mb connection to our main data center. Since it is a different subnet and AD site, it was decided to deploy Standby Continuous Replication.

Initial testing showed that the 100 Mb connection to the standby server would be too slow for our growing information stores. After a couple years delay, the company finally allowed us to upgrade the pipe to a 500 Mb connection. We had much better bandwidth and tolerable lag (26 MS); I was ready to begin setting up the standby server.

I built the remote data center CAS and HUB servers plus the mailbox server as a standalone system (not clustered). I setup SCR from the cluster to the standby server; typically notice of having to seed the database before synchronization can occur. Great the initial setup is complete, now I just need to seed the database and we’re golden.

Seeding the database wasn’t as easy as one would expect. When I tried to start the process, I’d get an error saying the target machine is invalid. Ah – I’m supposed to initiate the seeding from the target server; okay. I connect to the target server and initiate the process from there. The seeding couldn’t occur because the standby (target) server was not part of the mailbox cluster (source).

Readiness Check Failure

I hadn’t setup a cross-site cluster before. This would be a good learning opportunity. Exchange was uninstalled from the standby server and the server was added into the cluster as a third node. This was looking great – I had a three node cross-site cluster. Well, until I went to reinstall Exchange Server. Attempting to install Exchange as a Passive node in the cluster fails prerequisite checks. One error that keeps me from proceeding.

“This cluster spans multiple Active Directory sites. Exchange Server 2007 cannot be installed.”

A passive node that is designated as an SCR target must be a member of a failover cluster that does not have any clustered mailbox servers. This is referred to as a standby cluster.2

It is unfortunate that Clustering is not a common enough Windows technology that there would be sufficient online references that would have saved me all this time. I was giving up and started looking into third party software, when suddenly I remembered one of the things I read. The standby server isn’t supposed to be in the production cluster, but instead a standby cluster.

Knowing that clustered mailbox servers cannot contain the HUB and CAS roles, I spun up a virtual machine for these roles in the secondary site. My next step was to make the SCR target server a single node standby cluster. I have been working with Windows Clusters for more than a decade; this part was easy. I made sure to install Exchange as a “Passive” mailbox role.

I started with my smallest information store – enabled replication to the standby server, then went to the standby server and issued an update command to seed the database. Low and behold, forty minutes later this 25 GB information store was replicated to the standby server. The remaining stores were handled off hours to ensure replication traffic didn’t impact production, due to the large size of the stores.

Lessons Learned

There are definitely a number of key items learned over the years with this environment.

Don’t Go Cheap

Performance hardware is expensive, but e-mail is critical to most organizations and downtime costs the company more money than a proper deployment. Hot-swappable RAID drives and redundant power supplies are essential to uptime.

After determining the kind of hardware required for your implementation, enhance it dramatically. You aren’t just building the server for today’s business, but that of several years from now. Make sure that the hardware being used for your Exchange Server exceeds your current needs and can survive growth of the business and updates to the system software.

Research for Proper Architecture

Slow processor and barely enough memory, Exchange Server will keep running, slow but running. When the information store outgrows the storage, Exchange stops and you have a serious problem. Review Microsoft’s documentation of the version of Exchange Server being implemented for guidance in storage sizing and configuration.

Exchange Server 2007
Exchange Server 2010
Exchange Server 2013

Establish and Enforce Controls

Storage management involves accounting for mailbox data, search indexes, log files and other essential data for Exchange Server to survive. Establishing a fair and appropriate mailbox (and message) size limit will help keep the storage under control without crippling the enterprise. Executive buy-in is critical for success.

Have a Good DR Plan

In order to be a good DR plan, it needs to be setup and work. Sometimes you may have to set everything up and test it. Learn from the test and fix any opportunities before a live situation occurs. Search the internet and hopefully someone has a fix, but sometimes you just have to roll up your sleeves and figure it out yourself.

SCR Works on Single Copy Cluster

The key to making SCR work with a Single copy cluster source is to make the target server a single-node, standby cluster. If the standby cluster is in another site, then a HUB and CAS server will need to be created as a prerequisite to support that site.

Final Word

The purpose of this article is to describe the experiences with this Exchange 2007 implementation, including long term effects.

Although Exchange 2007 Server is a couple versions behind and nearing end of support life3 with Microsoft, a lot of the information in this post can be extrapolated and used to help people learn from our suffering for other Exchange versions and environments.

This article was originally authored by me on a popular technology web-site at the time. This site has met its demise, but I saved the article for sharing with the populous since Exchange 2007 is still in use in the world.

  1. Recommendations for Configuring Storage Groups and Databases
  2. Exchange 2007 Help: Planning for Standby Continuous Replication
  3. Extended support ends April 2017

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.