Yes, that’s right! Our Datacentre is built from Open-Source technologies. Yes we run Microsoft, everyone needs Microsoft for somethings but the meat and bones of our Datacentre is built completely with free software.
Read on to see how we did it!
A Datacentre is comprised of many technologies, compute, storage, networking, security, monitoring and the lsit goes on; so here we go, how did we do it?
OpenStack is a cloud based platform that unifies, controls and coordinates the various processes and task used to administrate and manage a datacenter. The OpenStack project was initially starte by Rackspace to meet their requiresments and has been further developed by many large organisations and the open community.
OpenStack is built on a modular design allowing the mix and matching of computer, storage and network technologies with a open API in to it platform.
OpenStack is really an article or many articles in its own right so we won’t dive to far in to it here. OpenStack is very much the Operating System of our Datacentre, normaly a Operating System is limited by one physical computer but when everything needs interconnecting in a Datacenter, this Operating System spans multiple serves and networking devices.
What is Compute; it would be rather expensive for our customer and us ro provide one physical server to each customer! Compute allows Virtulization Technologies to be used; this enables a single servers processor and memory to be shared by multiple customers. Customers will be assigned what is known as Virtual Machines, a allocation of resources that is isolated and independant from other customers Virtual Machines. Within these Virtual Machines, or VMs for short, customers can install their own operating systems, applications and data.
Each Virtual Machine is seen as a file by our Compute nodes; and these files need to be stored somewhere. If these were to be stored locally on our Compute node and a Compute node were to fail, then the customers VMs will become inaccessible, this leads us to Shared Storage. Shared Storage ensures customers VMs remain accessible at all times.
The tool on our compute node that manages the allocation of CPU and Memory is known as a Hypervisor. There are many Hypervisors out there, each with their advantages and disadvantages.
For Microsoft VMs, we have choosen Microsoft Datacentre Hyper-V, for Linux based VMs they will be running of ReadHats KVM Hypervisor.
This is one of the most important decissions we had to make; storage would have easily become the prmary bottle neck in our Datacentre.
When choosing our storage we had the below requirements
- Shared Storage Model
- It needed to be Highly Available
- Replication of Data was needed to send backups over to our secondary Datacentre
- We needed a hybrid model to use a combination of SSD’s and Larger Copacity mechnical storage
- Software Raid
With the above requirements in mind, our journey started researching varous open-source technologies. We eventually discovered Gluster, a Storage Clustering solution. Gluster enabled us to use Highy Availability mechnasiums and replicate storage accross an entire cluster of servers. So, now we have our data spread accross our servers, we now needed technologies to get the best performance our of our servers.
Sourcing a solution for Raid, we discovered the fantastic technology called ZFS. This technology is a file system than allows Data to be replicated accross hard disks in a single server. This ensures a server remains online during a hard disk faliure, again increating uptime. On learning more about ZFS and testing in our lab, we started working with Tiered Storage functions of ZFS. This enabled us to use a combination of storage levels; Tier 1: Ram, Tier 2: SSD and Tier 3: Mechanical Storage.
Using a Tiered Storage model allows binarary that is accessed more regularly to be cached in faster, more responsive storage and binaray that is not accessed as often to reside in slower, mechnical storage with greater copacity. When working with storage systems, data is looked as as simply binrary and not as files. Ok, now your thinking we need a lot of SSDs? Not quiet!
ZFS and its fantastic suiet of technologies has another element to it, Deduplication! and remember, ZFS only sees 1’s and 0’s; its not intrested in files at this level. Using our storage cluster for storing Virtual Machines, there will be many duplicates of software such as parts of the operating system, staff downloading or uploading the same files to different locations are just some examples of where duplicate binrary residing on our storage system. ZFS to the rescude ensures there is only one instance of binrary stored and areas of re-occurance has a simple refernce put in its place refering back to a chunk of binrary.
So for example; our Datacentre may have 100 installations of Windows Server accross 100 Virtual Machines, can you imaging having 100 instances of Windows Server stored in our Storage Cluster? There will be binrary for the Kernal, Start Menu, Notepad and just about the entire Operating System. ZFS will store one lot of Windows Server on each server accross our Storage Cluster, saving us almost 90% storage utilization. This is a money saver, it means we can use a smaller Tier 1 and 2 Cache and utilizes our larger copacity Tier 3 storage more effectivley.
This allows us to maximize our storage dentisty and provide fantastic performance; but lets stop there, its not all good!
Deduplication consumes many resources and is very processor intensive. This adds additional overhead, because of this our Storage Servers needed very powerful processors and a lot of RAM. We had to determin how much overhead this would have and add that onto our running spec used to serve our customers.