| Deduplication: Doing More with
Less
Andy Kratzer, Senior Software Engineer, EAGLE Software Many of our customers are looking for ways to make the most of their storage infrastructure. Deduplication technologies can drastically reduce the amount of data stored on disk-based backup, with reduction ratios of up to 40:1. We are integrating these technologies into customer sites that have challenges with data growth, shrinking budgets and over taxed administrative resources. In addition, deduplication technology allows storage managers to address their growing data needs due mandatory regulations for archival requirements. Here's a look at deduplication methods and products that EAGLE offers for deduplication. File Level Deduplication Reducing duplicate file copies is a limited form of deduplication sometimes called single instance storage or SIS. This file level deduplication is intended to eliminate redundant (duplicate) files on a storage system by saving only a single instance of data or a file. If you change the title of a 2 MB Microsoft Word document, SIS would retain the first copy of the Word document and store the entire copy of the modified document. Any change to a file requires the entire changed file be stored. Frequently changed files would not benefit from SIS. However, in many environments, duplicate DLL files or files copied by users are stored multiple times. SIS works well in this case and only stores a single copy of each duplicate file. CommVault's Single Instance Storage Single Instance Storage uses file level deduplication. CommVault addresses the storage problem by identifying the duplicate items in a data protection operation and maintaining references for the duplicate items. When a data protection operation is performed for the first time all the data is stored physically. If the same data is subsequently identified in another data protection operation, it is stored as a reference to the already existing data, and the data itself is not physically stored again. The single instanced data are stored in specially designed container files to increase the system throughput and scalability. Single Instance Storage is supported by both backup and data archival products, specifically for the file system and email attachments, in order to provide optimization in storage when copies of the same data are backed up and stored. Block Level Deduplication Block level deduplication is a more granular approach to finding duplicate data. Block level deduplication segments the incoming data stream, uniquely identifies the data blocks, and then compares the blocks to previously stored data. If an incoming data block is a duplicate of what has already been stored, the blocks is not stored again, but a reference is created to it. If the block is unique, it is stored on disk. Data Domain Data Domain is an application-independent storage system (attachable as a file server over Ethernet or a VTL over Fibre Channel). No client software or other configuration is required. As a result, Data Domain’s deduplication is invisible to backup and recovery and other nearline storage processes. It works easily with various data movers and workloads, including non-backup data like e-mail archives, reference data and engineering revision libraries. Data Domain's inline process deduplicates data as it is received and then writes it off to disk. It looks for redundancy of very large sequences of bytes across very large comparison windows. Long (8KB+) sequences are compared to the history of other such sequences, and where possible, the first uniquely stored version of a sequence is referenced rather than stored again. In a storage system, this is all hidden from users and applications, so the whole file is readable after having been written. Spectra Logic's nTier Appliance Spectra Logic's new line nTier secondary storage solutions now offer deduplication. Deduplicaton occurs post-processing, so the data is cached to disk, then deduplicated. The advantages that Spectra Logic sees to the post-process method center around the backup process. Post-processing shortens the backup window, is more secure in most cases, and provides parallel processing on cached data. Slightly more disk is required for caching, however, and there is an increased chance of error as data is handled twice. nTier's deduplication is powered by FalconStor's VTL interface, which easily integrates into an existing backup environment. Will deduplication benefit your storage environment? It might, depending on the types of data you are backing up. Multimedia files, digital images, and engineering drawings are not easily compressed. The amount of commonality between the files, for deduplication purposes, is often close to zero. The initial full backup and subsequent backups result in very little, if any, reduction in required disk space. Another variable is retention time. Organizations that retain data on disk for a long time benefit the most from deduplication. The large reduction ratios often mentioned with deduplication systems rely on a long retention time (typically months) on disk. If an organization stores only a few weeks of backup on disk, the benefit is reduced. Because most restore requests occur within the first two weeks after data creation, some organizations may not keep backups on disk for more than two weeks. Lastly, replication is a bonus feature to deduplication in some sites. Deduplication removes the issues of cost and speed that previously limited organizations from replicating backed up data between sites. Call EAGLE for more information on deduplication and storage technologies. EAGLE distributes a full line of storage and backup and recovery products including tape library backup solutions, backup and recovery software packages, and SAN and NAS storage solutions. EAGLE also provides pre-sales evaluation and design, integration and support services. For more information on EAGLE's products and services, contact: EAGLE Software, 123 Indiana Ave., Salina, KS 67401; Phone (800) 477-5432; Fax: (785) 823-6185; email: contact@eaglesoft.com; website: http://www.storagebyeagle.com. ###
|