Monday, March 4, 2019

Word of the Day: data deduplication

Word of the Day WhatIs.com
Daily updates on the latest technology terms | March 4, 2019
data deduplication

Data deduplication -- often called intelligent compression or single-instance storage -- is a process that eliminates redundant copies of data and reduces storage overhead. Data deduplication techniques ensure that only one unique instance of data is retained on storage media, such as disk, flash or tape. Redundant data blocks are replaced with a pointer to the unique data copy. In that way, data deduplication closely aligns with incremental backup, which copies only the data that has changed since the previous backup.

For example, a typical email system might contain 100 instances of the same 1 megabyte (MB) file attachment. If the email platform is backed up or archived, all 100 instances are saved, requiring 100 MB of storage space. With data deduplication, only one instance of the attachment is stored; each subsequent instance is referenced back to the one saved copy. In this example, a 100 MB storage demand drops to 1 MB.

Target vs. source deduplication

Data deduplication can occur at the source or target level.

Source-based dedupe removes redundant blocks before transmitting data to a backup target at the client or server level. There is no additional hardware required. Deduplicating at the source reduces bandwidth and storage use.

In target-based dedupe, backups are transmitted across a network to disk-based hardware in a remote location. Using deduplication targets increases costs, although it generally provides a performance advantage compared to source dedupe, particularly for petabyte-scale data sets.

Techniques to deduplicate data

There are two main methods used to deduplicate redundant data: inline and post-processing deduplication. Your backup environment will dictate which method you use.

Inline deduplication analyzes data as it is ingested in a backup system. Redundancies are removed as the data is written to backup storage. Inline dedupe requires less backup storage, but can cause bottlenecks. Storage array vendors recommend that their inline data deduplication tools be turned off for high-performance primary storage.

Post-processing dedupe is an asynchronous backup process that removes redundant data after it is written to storage. Duplicate data is removed and replaced with a pointer to the first iteration of the block. The post-processing approach gives users the flexibility to dedupe specific workloads and to quickly recover the most recent backup without hydration. The tradeoff is a larger backup storage capacity than is required with inline deduplication. Continue reading...

Quote of the Day

 
"Data reduction methods, such as compression or deduplication, can have a positive effect on performance, but the significant CPU resources required could overshadow the benefits." - Scott D. Lowe

Learning Center

 

How do I decide when to deduplicate data and where?
It may not always make sense to deduplicate data. But when it does, you should know which types of deduplication work best for different workloads and times.

Compression, deduplication and encryption: What's the difference?
Compression, deduplication and encryption are important data protection technologies. Learn the distinctions among the three, as well as best practices with each approach.

How do compression and deduplication affect performance?
Now a necessary element of any modern storage system, compression and deduplication can help manage capacity and positively affect performance. However, do the benefits outweigh potential disadvantages?

Inline deduplication vs. post-processing: Data dedupe best practices
Find out how inline deduplication and post-processing dedupe differ, which vendors feature those technologies and how they can work with other data protection capabilities.

What are some new data deduplication techniques?
Data deduplication techniques have seen a number of innovations as of late and storage vendors are definitely taking notice.

Quiz Yourself

 
If you don't _______ your data effectively, you risk losing it.
a. backup
b. back up

Answer

Stay in Touch

 
For feedback about any of our definitions or to suggest a new definition, please contact me at: mrouse@techtarget.com

Visit the Word of the Day Archives and catch up on what you've missed!

FOLLOW US

TwitterRSS
About This E-Newsletter
This e-newsletter is published by the TechTarget network. To unsubscribe from Whatis.com, click here. Please note, this will not affect any other subscriptions you have signed up for.
TechTarget

TechTarget, Whatis, 275 Grove Street, Newton, MA 02466. Contact: webmaster@techtarget.com

Copyright 2018 TechTarget. All rights reserved.

No comments: