Editor’s Note: As of January 2022, iland is now 11:11 Systems, a managed infrastructure solutions provider at the forefront of cloud, connectivity, and security. As a legacy iland.com blog post, this article likely contains information that is no longer relevant. For the most up-to-date product information and resources, or if you have further questions, please refer to the 11:11 Systems Success Center or contact us directly.
What is data seeding?
Many companies are experiencing the benefits of cloud backup and Disaster Recovery solutions with the use of Veeam Backup and Replication, Zerto, and Double-Take. However, each of these solutions requires the first initial backup or replica of servers, and this can be a huge speed bump for many customers. Often, customers will request to seed the data in an effort to speed up the initial sync, and I’ve had a lot of experience helping customers through this process at 11:11 Systems. Usually, this can be accomplished by creating backups or transferring VM files to an encrypted disk provided by 11:11. This disk will be sent to 11:11 and the VM files, backups, or replica data will be transferred to 11:11 servers and used as a seed for the initial backup/replica.
In theory, this seems like a quick workaround for customers with limited bandwidth and/or customers with a large amount of data to replicate. However, many times this seeding process has created more headaches and delays to replication than desired. 11:11 still uses this method in certain cases, but seeding has become a last resort for new backup and replication customers.
Why would I want to try and manually seed my replica/backup?
Regardless of which solution you use, the fact is you have to replicate or backup your servers for the first time. This process generally requires a significant amount of time, which is dependent on the amount of bandwidth you have available and the amount of data you need to replicate.
A good benchmark to use is if you have 100Mbps of bandwidth available, you can typically replicate or transfer about 1TB per day.
You can use online tools to estimate the amount of time required for a backup or replica, such as this Cloud Calculator.
Let’s work through an example: say you have 10 servers that total up to 5TB of used storage and have 100Mbps of bandwidth. Technically, it’s possible to replicate all of these servers within a week. However, many customers must split their available bandwidth and reserve a smaller amount for their production environment. DR solutions can use a large amount of bandwidth, especially on their initial syncs, and this may cause issues for your end-users or employees during work hours. So, for your 5TB of data to replicate, you may only be able to allocate 50Mbps or 25Mbps of your bandwidth to replication. This will cause the initial sync to run for 10 or 20 days. If you are using Zerto or Double-Take, you can expect this to take even longer as these solutions use real-time replication. So, as your servers are replicating their data for the first time, new data created or changes made to this server get added to the amount of data to be replicated. For some customers, it’s even possible that the progress moves backward for replication as new data is being created faster than it can be replicated.
For Veeam customers, Veeam creates a snapshot and replicates that image of the server, but new changes are not added to the current job. This means that the servers being replicated will have a snapshot on them for several days and may cause a performance hit once it is removed. If this Veeam Replica/Backup takes five days, that means the second run will have five days’ worth of changes to replicate, which can be a significant amount. This time can be longer/shorter depending on the compression and deduplication utilized during a replica or backup. Regardless, the initial replica can be very time-consuming, and some customers may not have the bandwidth or time to work through it.
What are the problems with data seeding?
Many customers who deal with the issues described above see data seeding as the only possible solution to complete their initial syncs. This is often due to time constraints. They need to have a DR environment or backup ready to use before a specific deadline. Other issues might be that they are not able to complete the initial sync with their bandwidth vs. the amount of space they need to replicate or backup.
For customers trying to complete the initial sync faster with data seeding, the seeding process can often take a longer time. How is this possible? The first step for the seeding process is for 11:11 to ship out an encrypted storage device, usually a disk, QNAP, or other SAN device. Once the disk arrives at your office or data center, this data will need to be transferred onto an external device. Many methods can be used for this, whether it’s a VM backup with Veeam or another application, OVA export, or just copying VMDK and VMX files from your VMware infrastructure onto the disks. How you transfer these files to the disk will determine the amount of time needed.
For our 10 VM, 5TB of used storage example above, a transfer with a USB 2.0 device would take about 23 hours to transfer all of the data. Let’s say that all data gets transferred to the disk on October 17, the next step is to ship this drive to an 11:11 datacenter. With a time constraint, you may wish to use overnight shipping, which can incur considerable additional costs. Once it arrives at an 11:11 facility, we then have to mount the disk at our data center and transfer the data. Ideally, the data will be at the target site and ready to seed on October 19, but this time could be longer depending on the time to ship the device, when the disk is able to be mounted, and the transfer from the disk to the target repository or data stores. The last step will be to set up the replica job to map the data.
Once all of that is set, we are ready to begin the first replica/backup with the seeded data. However, the seeded data is now at least two or three days old at this point. Veeam, Zerto, and Double-Take all have a “mirroring” process that will be performed during a seed where the software will compare the source server with the target server. This requires checking each block on the target and source, finding the differences, and eventually replicating any changes.
So, if this is a Zerto replica, you will see a Delta Sync process start, and once this starts, you are not able to failover until it completes. Zerto is comparing both sides, matching the differences, and adding in changed data just as it does with the initial sync. Again, this process is dependent on the amount of data seeded, the number changes, and available bandwidth. So for my 5TBs, we can say about 500GB worth of data has changed (about 10% of my used data). If I can only allow 25% of my bandwidth to be used in replication, then I can expect the Delta Sync to take at least two days of replication for that 500GB after Zerto has compared both sides. Again, I also have to deal with current changes being added to this as well. Essentially, after paying for shipping and data center costs, waiting for data to transfer, and finally seeding my replicas, I may be in the same situation as with the initial sync.
How do I complete an initial sync without seeding?
If I don’t use seeding, how do I complete the initial sync? Bandwidth, optimization, and patience. The easiest solution, but sometimes most difficult, is to upgrade your bandwidth. Obviously, the bigger pipe you have, the faster you can replicate and continue replication when adding more and more servers. However, upgrading may not be possible for some customers. In that case, you can work with 11:11 to try and optimize the solution during your initial syncs. We can work to configure the backup or replication job to allocate the bandwidth to specific servers and stagger the initial syncs. For instance, instead of trying to replicate all 10 Servers and 5TBs of data in my example, we can focus on two or three VMs at a time to complete a sync and progress from there. It may also be best to wait until a weekend when you can remove any throttles and use all of the available bandwidth without affecting the production end-user’s performance. Once the replication or backups have been optimized and scheduled as best as possible, the last option is to give the job time and monitor.
For Veeam customers, you will want to ensure that your local job and offsite jobs to 11:11 do not overlap. If you are replicating a server to 11:11, a local backup on that same server could cause snapshot issues or break the replica job. Similarly, when performing a backup or backup copy job to 11:11, a local backup may lock the VBK files of the server and cause the job to 11:11 to break. Refraining from local backups or replicas while sending the data will prevent Veeam from locking backup files that might cause a job failure.
Support and advice are key to a successful setup
11:11 Systems typically suggests first trying to seed over the wire if at all possible to avoid further complications and delays. The initial sync process can be a time- and resource-consuming process, but it is an essential step for your Disaster Recovery and/or Backup Solution. When beginning the replication and backups with 11:11, our support team is always available to help and make recommendations to complete this process as quickly and easily as possible. Even with the possible issues involved in data seeding, this is still always an option.