Migration projects are always under time and cost pressures but there is a single step that most project teams miss that makes a huge difference to the outcome. Including this step will:
The Association of Information and Image Management (AIIM) reports that on average, half of an organization’s retained information has no business value and the Compliance, Governance, and Oversight Council (CGOC) estimates that a large company with 10 petabytes of data could be spending as much as $34.5 million on data that could be deleted.
It’s time to stop the ROT. A typical file share or SharePoint site that has been around for a while will be riddled with Redundant, old, or trivial files. Choosing to migrate ROT files will make the project take longer (and therefore more expensive), will take up space and will add noise to enterprise search.
Before you press the button on your migration tool and begin moving files, take a little time to review what you will be moving. Some questions before you move a single file should be:
Redundant information is unneeded or duplicated information. First let’s focus on duplicate documents. Many organizations are surprised to discover how many duplicate files they have and how much space they are using. Duplicate files tend to be on the larger size and take up a lot of storage space.
The free version of Ocrato includes duplicate file information in the audit report which will show how many duplicates you have and how much space they are using. With the enterprise version of the tool you can remove duplicate files automatically or replace the duplicates with a link to the original document. For further details of de-duplication please look here.
Old or obsolete files could be defined as a file created before a certain date (for example any file created more than ten years ago) or could be a file that has not changed or been access for a long time. Some organizations will be required to keep some files for a given period (known as retention) and equally there may be some files that cannot be kept after a certain date (known as disposition).
Generally speaking, files that have not been accessed for a very long time should be considered for archiving and not migrated (or migrated to a separate area). Ocrato has an archiving option allowing you to move or delete files based on the date was created, last modified, or accessed. If you are creating a separate archive Ocrato can transform the documents into read-only PFD documents and compress them as part of the process.
Trivial files are files that do not contain useful information. They may have been useful at one time, but their use has passed. For example, migrating the lunch menu from January 1989 is unlikely to worthwhile. One of the areas that many trivial files are found are in personal folders – by their nature personal folders are intended for miscellaneous files.
A useful migration strategy for personal folders is to ask them to move the documents they wish to keep to the new area. You can give them a time period to move any content they wish. Alternatively, you can create a folder for them to place any files they wish to keep and then migrated just the chosen files as part of your project. This simple plan can dramatically reduce the number of trivial files that are migrated.
Another option to discover trivial files is to report on files that have not been accessed for a long period of time. The archiving feature in Ocrato allows you to report and take action on files that have not been accessed for a long time.
35% of organizations believe that over 40% of their information is ROT (Redundant, Old, Trivial).
Many organizations will store a vast number of scanned documents (sometimes stored as image files such as TIFF or JPG and PDF). Although these files look like they contain text, they are actually just regular image files. The issue with them being an image file is that a search engine cannot see the information they contain. A search engine needs text for the document to be discoverable. The upshot of this is that users cannot find their documents using search.
Fortunately, the solution is simple. Using Optical character recognition (OCR) the image can be processed, and the text can be added to the document as saved in a new format called a PDF-A. The files will be smaller and search engines will now be able crawl the document correctly.
If you are unsure if you have this issue in the documents you store you can run an audit of your file shares and SharePoint sites using the free version of Ocrato. If you find you do need to convert the files Ocrato can take care of that for you. This can be done (ideally) before migration but can also be done afterwards.
The larger the file the longer it will take to migrate. Additionally, it will use up more storage and take more time for your users to download. Large PDF and image files will be some of the largest files you will typically be migrating. Scanned documents can be as much as 2MB per page and some documents can number hundreds of pages so this can quickly become a time drain on the migration project.
Ocrato has industry leading capabilities to hyper-compress PDF and image files with spectacular results. A typical one-page scanned document would be reduced from 1.6MB to less than 30KB (that is a reduction of over 3000%). Hyper-compression will dramatically speed up your migration project whilst saving on storage and bandwidth costs.