Space Technology 5





Data Deduplication Software Technology

Data deduplication refers to the elimination of redundant data stored in the database or in a data storage system.

By: suvo

It is often called "intelligent compression" or "single-instance storage. Data deduplication is usually a method of reducing the storage by eliminating the duplicate and redundant data and redundant data is replaced with the unique data. Data deduplication has some methods. Using these methods, data deduplication operates. The methods are listed below:

v  Chunking and deduplication overview

v  Chunking methods v  Source versus target deduplication v  Client backup deduplication v  Post-process deduplication v  In-line deduplication v  Primary storage vs. secondary storage deduplication

Redundant data creates the space allocation. For example, if any e-mail contains an attachment of 1MB. If there are similar 100 e-mails with similar attachment then it will take 100MB space. It could not be the example of an efficient system. But in data deduplication only one mail is saved and the other subsequent instance is just referenced back to the one saved copy. Thus it'll use only 1MB sapace.

Data deduplication also offers other benefits. Lower storage system will reduce the cost. The efficient use of the disk space will increase the longer disk retention period which provide the better RTO (Recovery Time Objective) and reduce the need for tape backups. Data deduplication also reduces the data sent over the WAN. It could dramatically increase the data transfer rate.

Data deduplication can operate on file, block or even in the bit level. Data deduplication eliminates the duplicate files. Block and bit deduplication looks within a file and saves unique iterations of each block or bit. Each chunk of data is stored using hash algorithm such as MD-5 SHA-1. This process generates a unique number for each piece which is then stored in an index. If a file is updated, only the changed data is saved. That is, if only a few bytes of a document or presentation are changed, only the changed blocks or bytes are saved, the changes don't constitute an entirely new file. This behavior makes block and bit deduplication far more efficient. However, block and bit deduplication take more processing power and uses a much larger index to track the individual pieces.

Hash collision are the potential problem with data deduplication. When a piece of data receives a hash number, that number is then compared with the index of other existing hash numbers. If that hash number is already in the index, the piece of data is considered a duplicate and does not need to be stored again. Otherwise the new hash number is added to the index and the new data is stored. When a hash collision occurs, the system won't store the new data because it sees that its hash number already exists in the index.. This is called a false positive, and can result in data loss. Some vendors combine hash algorithms to reduce the possibility of a hash collision. Some vendors are also examining metadata to identify data and prevent collisions.

Data deduplication has some drawbacks and concerns. Data deduplication solutions rely on cryptographic hash functions for identification of duplicate segments of data. A collision would result in data loss (in actuality a chunk of data would be replaced by incorrect data). Because of this, vendors have devised various ways of tackling this problem. Most vendors however use very large hash values and statistically there is a far greater chance of hardware failure than a hash collision. Another major drawback of data deduplication is the intensive computation power required. The effect of compression and encryption is also considered as the drawback of deduplication.

currently working as a SEo expert in ibacs.co.uk









Related Articles

World_Wide_WebIf you've been searching around online looking for a mouse trail software for creating image trailer cursor effects, I have no doubt at all that ...
Satellite Space TechnologyA major aspect of professional data recovery is clean room technology. In fact the choice of a data recovery method depends on the clean room technolo...
Space TechnologyFuji has introduced advanced technology of “ATOMM” in its DDS-3 and DDS-4 tape cartridges which has enhanced their recording density and performan...
Institute_Of_Space_TechnologyAnn All spoke with Ian Rowlands, senior director of product management at ASG Software Solutions, a privately-held global firm that provides a full ra...
Open Space TechnologyThe DLT-IV tape has long durable life and is considered as one of the most reliable and stable data backup storage medium. The body of the Sony DLT-IV...