Clustering has been a major part of designing and supporting an effective architecturally
sound high availability environment for a long time and from what I understand it’s
not a large part of the MCSA testing. However that fact is somewhat irrelevant to
me, being a seeker of knowledge and skills rather than simply obtaining certificates.
Thus the concept of learning to design and
implement technology that is a large part of a real world application of
Windows Server 2012 is very appealing to me. I realize this may seem silly as
im unemployed and hoping to possibly get a job at some point and that
certifications certainly do improve the odds of that. But whatever I’m a scholar
you have never heard of Failover Clustering you may be wondering what the basic
premise of the technology is. A failover cluster is a group of independent
computers (known as nodes for our purposes) that work together to increase
availability and scalability of clustered roles (https://technet.microsoft.com/en-us/library/Hh831579.aspx). We (implying both IT professionals and desktop
users in corporate environments, well really even Google users) rely on FoC for
high availability for almost any critical applications such as Exchange Server
and Sql that require connections to non-local information systems (meaning not
stored on the local machines hard disk). In the past we used multiple physical
servers usually connected to a single storage unit that was also disk fault
tolerant using a raid array and SCSI connected hard disks. There have not been
many updates to this basic premise however the technology is now easier to use
than ever thanks to technology known as virtualization and branded by Microsoft
as Hyper-V. Now we have physical hard disks configured in fault tolerant arrays
hosting virtualized hard disk’s known as VHD or VHDx files that are also set up
in a fault tolerant array. This provides for two layers of information fail
over support, if a physical hard disk crashes we have a physical back up of the
data and if a virtual disk becomes corrupt we also have a failover copy of that
information as well. This allows
administrators to provide uptimes approaching 99.99% for critical applications
in order to meet the high standards of today’s business needs.
Clustered nodes can be connected using physical hardware or
virtualized hardware. A basic example (fig.1) would include three computers
each with 3 NIC cards, one talking to the other nodes in the cluster, one to
the database known as a cluster shared volume or CSV for short or the quorum
resource) containing the information about the cluster configuration (and one
taking incoming traffic from the network. One downside to this model was that
if the quorum disk failed, so did the cluster. A legacy two node cluster could
not function without it. So if just the disk failed but both nodes remained,
the cluster would cease to function. The
data on the quorum resource (CSV) includes a set of cluster configuration
information plus records (Sometimes called checkpoints) of the most recent
changes made to that configuration. A node coming online after an outage can
use the quorum resource as the definitive source for recent changes in the
configuration. It is also possible to set up fail over nodes in a configuration
using multiple local volumes and skipping the CSV (fig. 2). This also has
benefits but requires more replication across servers to ensure that every node
has a similar database. The point of this being that in case one of the nodes fails
for some reason one of the other two nodes would notice a problem with the
faulty node and seamlessly pick up the role that node was hosting (which machine
picks it up is determined by using something called quorum votes, more on this
later). This will obviously cause an
increase in network traffic to the node picking up the role which is certainly something
to consider when designing hardware specifications to ensure a functional level
of NLB (Network Load Balancing). However the node may or may not have been a
node that was previously hosting that role for the rest of the network and in
that case the hardware impact would be less critical. Clustered nodes should be
heavily monitored in a proactive fashion to verify that they are working and
general best practice is considered to be using a Microsoft product known as
System Center that alerts network administrators to any potential issues that
may occur resulting in a node fail over situation. However this product costs
as well so budget restraints could be a factor. If you are using System Center
and a node fails for some reason an administrator is automatically notified of
the failure while System Center attempts to resolve the issue (service is hung,
the machine freezes, ect.). If System Center fails to resolve the issue the
administrator can then machine can be restart, rebuild or take whatever action
is necessary to repair the node and as mentioned previously, the role will be
shifted to another node as long as the cluster is properly configured.
of this sounds very confusing for several reasons however a primary reason being
that there are two layers of technology involved, a virtualized layer, known as a guest cluster, that set
up almost exactly like a physical layer that’s sitting inside a server install that’s on a physical server. If
you’re like me you may need a more relatable explanation or visualization of
this. So here’s a picture (in-case you havent seen it) of something some genius programmer created. You can
play the video game Doom from a laptop while actually inside the videogame. So
its like playing doom doom. Maybe that helps? If your playing the game its
really obvious which layer of the game your interacting with. Like sitting at a
server interacting with Hyper-V machines that are essentially set up the same
way you would set up a physical machine.
So where kind of left with more than a few questions here
but me being a part of the omfg wtf r u doing here nubsauce train to fail town
users group and basically taking educated guesses as to how this technology
works only enables me to talk about a few things. Besides the fact that entire
technical manuals could be written on the subject not to mention the countless
technet articles and youtube videos on the subject. Maybe in the future I’ll add
addendums/updates to this post but for now we will ramble on as we can. One of
the obvious things is how the servers know that they are functioning? The most
basic way that the servers know that the other servers are still online is
through the use of something called a “heartbeart” the way that I understand
this technology is fairly basic. A server pings the other server on their
private network and says hey you still there and the server responds with something
like “yeah bro im still here stop buggin me bro”and this happens every second. If
this fails then the process of quorum voting comes into play. This seems like a
very mysterious process that involves a bunch of math and im not exactly sure
how the servers are self-aware (see HAL) enough to assume that they have the
extra processing power or know that another node would have enough processing
power but apparently they are able to do this without much trouble (aside from programmer
and technological explanation headaches). There is a default setting that Microsoft
has configured in Failover Cluster Manager as well as a few custom options
however the default is obviously recommended unless you’re a mathematician or
something because im convinced that the process involved in quorum voting is
nothing short of wizard magic, same for dns resolution.
So if your computational status is anything like my nubsauce
w/ x-tra Polynesian self and are convinced that computers are full of wizard
magic and mystery math then you’ll probably get really excited by the notion of
the appropriately named High Availability Wizard. This marvelous device will
help you set up and configure failover clustering as such:
In the High
Availability Wizard, you can choose from the generic options described in the
previous note, or you can choose from the following services and applications:
- DFS Namespace Server:
Provides a virtual view of shared folders in an organization. When a user
views the namespace, the folders appear to reside on a single hard disk.
Users can navigate the namespace without needing to know the server names
or shared folders that are hosting the data.
- DHCP Server:
Automatically provides client computers and other TCP/IP-based network
devices with valid IP addresses.
- Distributed Transaction Coordinator (DTC): Supports distributed applications that perform
transactions. A transaction is a set of related tasks, such as updates to
databases, that either succeed or fail as a unit.
- File Server:
Provides a central location on your network where you can store and share
files with users.
- Internet Storage Name Service (iSNS) Server: Provides a directory of iSCSI targets.
- Message Queuing:
Enables distributed applications that are running at different times to
communicate across heterogeneous networks and with computers that may be
- Other Server:
Provides a client access point and storage only. Add an application after
completing the wizard.
- Print Server:
Manages a queue of print jobs for a shared printer.
- Remote Desktop Connection Broker (formerly TS Session Broker): Supports session
load balancing and session reconnection in a load-balanced remote desktop
server farm. RD Connection Broker is also used to provide users access to
RemoteApp programs and virtual desktops through RemoteApp and Desktop
- Virtual Machine:
Runs on a physical computer as a virtualized computer system. Multiple
virtual machines can run on one computer.
- WINS Server:
Enables users to access resources by a NetBIOS name instead of requiring
them to use IP addresses that are difficult to recognize and remember
As noted in Technet article: https://technet.microsoft.com/en-us/library/Cc731960.aspx
are also a few youtube videos that display how to walk through this wizard but
some of them aren’t in English. If interested google is ur friend. But heres a
few that I liek any way.
– good info, skip the 3rd party nonsense.
– hommie sounds like the Pastor Rod Parsley and talks to the beat of Ghetto D so
if your into that and wana go to choych watch this. Also in a more serious
sense it was very helpful for understanding quorum voting.
And that friends, is the basic understanding of how ive
wasted time studying failover clustering. 15 pages of the book im cureently
. Several days of actual studification of online resources.
Update: so this is cool but i cant get it to frame into this post correctly so click the link and figure out how to watch it if your interested
These guys really know what they are talking about they have a useful way of speaking, meaning its actually understandable.
Update 3: the more flashcards I make the more info I come across! good times any way. This is seems like some basic info from Microsoft with lots of info on fail over clustering. So far it doesnt seem as useful in a pratical sense as the powershell videos but proably worth watching none the less Server 2012 Jumpstart