Duplicate files are very common regardless where you store your documents. They take up more than their fair share of storage but the real problem is that duplicate files mean you have lost a single version of the truth.
If that sounds a little dramatic, in this article we will explore what happens when copies of a document are made across an organisation (and then show how to fix the problem for good).
Organizations anticipate massive data growth; on average, they expect the volume of data to grow from X to 4.2X in the next two years
Clearly, storing multiple copies of a document uses up storage unnecessarily. Most organizations could save significant costs by reducing duplicate files. Most migration projects would be significantly shorter (and therefore cheaper) if duplicate files were removed as part of the project.
The true cost of storage: Although storage has become much cheaper over the years it is often forgotten that the true cost of storage includes back-ups, anti-virus, replication, and many other factors. The estimated true cost of storage is $450 per user per year.
The true cost of storage:
Although storage has become much cheaper over the years it is often forgotten that the true cost of storage includes back-ups, anti-virus, replication, and many other factors. The estimated true cost of storage is $450 per user per year.
This loss is one of governance and productivity. Let’s imaging an organization creates a support guide document for a new product. The file is stored in a single location where all the support engineers can access it.
For convenience employees across the organization download a copy and store it on their departmental shares and in their personal file stores. Within a month we have sixty copies of the manual across the organization.
At this point an error in found in the product guide and the original document. The document is updated to including the correction. We now have one correct document and fifty-nine incorrect documents. If the document is amended a second time the problem increases exponentially.
This simple example shows how an organization can rapidly lose control of critical information due to duplicate documents.
Firstly, if you wish to understand if your organization has a problem with duplicate files and would like to know how many duplicate files you have and how much storage they are using you can run an audit of your file shares and SharePoint sites.
Audit reports are available in the free version of Ocrato and you can run as many audits as you wish.
Once you have identified your duplicate files, Ocrato can resolve the problem for you.
Ocrato uses multiple techniques to rapidly discover duplicate files including a full comparison on the content to ensure there are no mistakes made when identifying duplicate content (we don’t for example simply look at file names or file sizes).
You can identify original document by using of the two methods described based on the file attributes :
Once the original document is identified you then have two options in terms of the action to be taken on the duplicate files: