Joestechnology: Data Deduplication

Data deduplication algorithm is designed to scan all information stored in any specific file sharing network in order to locate duplication of identical data, to eliminate that redundancy, and eventually maintain a single copy of the data. Even though the process may not be actually seen by end-users, this is an effective way to maximize storage space usage. System scan is performed to check for redundancy. If a duplicate is found, one copy is removed and replaced with a reference to the main copy. In effect, the entire network size is reduced.

Consider an internal e-mail network. There is a high probability of keeping multiple copies of an email sent through the network even if it is done by only one person. In fact, a copy may be saved in every user’s inbox. If the entire network had to be backed up, it would back up multiple copies of the same e-mail. If a thousand of emails are sent then the storage space being used will be wasted a thousand times more. This is where data deduplication works. It removes all the duplicate emails and retains only one copy as the main reference. A well-designed data deduplication algorithm will save you time and money when it comes to data backup.One way to check for duplicate copies is to have someone create a new entry and then scanning the network after few days or weeks. The key advantage of this is that data creation is relatively fast and easy. On the other hand, it is unfortunate that the amount of space being used by such data can never be precisely measured. You will consistently be out of date unless you run the algorithm on a daily basis.

Another way to do this is to check for duplications anytime someone enters data into the system. This can end up being very time-consuming. The reason is because hash calculations are performed by data deduplication systems in order to determine if a data has another existing match or copy. Creation of data and the system check will take up more time considering hash calculations themselves entail significant time to run. Both methods are considered valid, and there is much discussion among computer scientists as to which one is better.

This algorithm has been criticized for few matters. The use of hash calculations has played a big part of these criticisms. Although there is a very slight chance of having an identical hash calculation for two different data pieces, that little chance still exist. As such, data deduplication need not compare data bit-by-bit and even if the same has calculations are generated for two data, data corruption could be a factor. This is also not advisable for networks relying on redundancy. To site very good examples, multi-level networks used by big corporations and government units may not be appropriate for this.

A network system with many users usually end up having duplicate entries of identical data. Without proper control, this can grow very fast. A program is usually deployed by many IT experts to remove such duplication and maintain only a single copy of necessary data. This process is known as data deduplication, and is essential for larger networks.

Discover which Data Deduplication service has the "Must Have" features. Visit http://www.druva.com and find out what features are needed before you purchase any backup program. Try Druva inSync for "Free" today!

Joestechnology

Thursday, September 22, 2011

Data Deduplication - The Need For It

No comments:

Post a Comment