Product Description FAQ Pricing Downloads Tech Notes Litigation Support Contact Us Resellers
Goto Page 1   2   3

ImageMAKER Discovery Assistant Readme

 

ImageMAKER Discovery Assistant automates the process of preparing documentation for legal discovery.  Supported features include document conversion to TIFF and PDF, Bates stamping, extraction of meta data, OCR text extraction, printing, and export to Summation and Concordance case management tools. 

Product Features include:

·         Scalable to millions of conversions per day.

·         Simple to install and use

·         Powerful customizable feature set.

·         Custom Development and support available.

·         Tools to manage the processing of terabytes of data.

·         Support for extracting OLE embedded documents

·         Optional support for text searchable PDF, and color TIFF output.

·         Simple to use database front end, capable of scaling to millions of documents across multiple  machines. 

·         Support for most common document types, including Word, Excel, PowerPoint, PDF, HTML, TXT, JPEG, and RTF.

·         Also includes support for converting Microsoft Outlook MSG and PST files, and Lotus Notes email files.

·         Flexible built-in bates stamper supports writing bates labels to any four corners of the output TIFF file.

·         Creates TIFF files, Meta data, Txt output, and a conversion log for each file converted.

·         Add-on upgrade to convert to scanned PDF.

Contents

Quick Overview
Installation
Supported File Types
Incorrect Document Extensions
Office 12 / 2007 Support
 
MSG and PST Handling
Handling Outlook Security issues
Support for De-Duplication
Support for De-Blanking
Bates Stamping
Advanced Excel Spreadsheet Processing
Support for proper DATE and TIME settings in Word and Excel
Database Format for the Discovery Assistant project
Support for Scalability
Support for Lotus Notes (NSF)
Support for PaperPort .MAX files
Forensics Tools
Support for OCR
Support for TIFF Pass-through
Support for Scanned TIFF Files
Handling WordPerfect files using Word (WPD)
Algorithm to check for and assign duplicates
Handling Terabytes of Data
System Requirements
Handling Parent Child relationships
Exported Ranges of DOCID and BATES Numbers
Support for UTC Date/Time
Loading in a 'Selection Set'
Estimating Page Counts
Benchmarks tests on Discovery Assistant
Error Conversion Codes
Support for Conversation Topic and Conversation Index in Outlook
Setting up Discovery Assistant to do conversions running as a service
Handling Password Protected Files
Setting up Distributed Processing
Support for RAR files
Using Discovery Assistant as a preview tool
MSG Handling
Questions and Answers
Application Details
Adding a new file association
Quality Control Support in Discovery Assistant
Supported File Types
Adobe Acrobat 7.0 (PDF)
Internet Explorer (HTML)
.GIF and .JPG
Outlook (PST and MSG)
Outlook Express (EML and DBX)
Access Database (MDB)
Autocad (DWG, DXF and DWF)
VectorWorks (MCD)
QuickView support (converting unsupported file types)
7-Zip
Contact Information
Appendix

Quick Overview:

Discovery Assistant can be downloaded from http://www.discoveryassistant.com/Download/Downloads.php

Basic idea is the product can take any set of computer generated documents, including email, word documents, pdf files, spreadsheets, and/or scanned documents (from hard drive, and/or CD-Rom's), and convert them to TIFF or PDF and associated meta-data, suitable for importation into Case Management software. 

The product can also directly output Bates Stamped TIFF and PDF for direct production of documents for legal discovery.

Discovery Assistant currently exports to the following Case Management systems:

 Summation

DII Class I  - tiff and text
DII Class II - source files

 Concordance

IPRO-Tech    - images are loadable into Ipro
Opticon      - images are loadable into Opticon

 Comma Separated Value (CSV)

                currently only supports TAB output

 Ringtail

                Ringtail Legal from FTI

 Introspect IDX (Zantaz)

IDX file format

Some distinguishing Discovery Assistant features include:

·         conversion of all printable document types to TIFF or PDF.

·         emphasis on conversion speed. 

·         support for removal of duplicates.

·         extraction of TEXT and metadata.

·         integrated OCR support for extracting text from scanned images.

·         proper Date/Time stamped values for macros in Word and Excel.

·         integrated Bates Stamping.

·         blank page removal

·         file management features, including a 'MOVE' capability.

·         ability to auto preview the conversion results.

·         specialized Excel formatting controls, including 'fit to pages' feature.

·         ability to identify document file types by content rather than file extension.

·         upgrade to output in color (jpeg compressed TIFF)

·         upgrade to output to postscript (color postscript) for conversion to PDF or direct printing

·         support utilities to handle the processing of Terabytes of data.

·         Export to Concordance and Summation.

·         Export to CSV, Introspect, Ringtail, IPRO.

Our company focus is to provide a high quality easy to use product.  We provide quick turn-around on reported problems, and to offer custom development services in the event that custom features are required - ensuring that the Discovery Assistant product meets our customer's exact needs.

We are the developers of the underlying core technology used in the Discovery Assistant product, and have been in business selling print drivers, viewers, and document conversion products since 1990.

Installion:

1.       Take a quick look at the User Manual

2.       Install the Microsoft .NET Framework Version 1.1 Redistributable Package
http://www.microsoft.com/downloads/details.aspx?familyid=262D25E3-F589-4842-8157-034D1E7CF3A3&displaylang=en

Install the Microsoft .NET Framework Version 2.0 Redistributable Package (x86)   http://www.microsoft.com/downloads/details.aspx?familyid=0856EACB-4362-4B0D-8EDD-AAB15C5E04F5&displaylang=en

OR

Install the Microsoft .NET Framework Version 2.0 Redistributable Package (x64)   http://www.microsoft.com/downloads/details.aspx?familyid=B44A0000-ACF8-4FA1-AFFB-40E78D788B00&displaylang=en

                 (still trying to determine if we need one or both).

3.        Install Microsoft Outlook, and Lotus Notes (if required).

4.        Install the latest Microsoft Office updates: http://office.microsoft.com/OfficeUpdate/default.aspx

5.        Install the DiscoveryAssistant application  (unzip XDCAssistant.zip and run Setup.exe).

6.    Call your technical contact (Ken Davies) at ImageMAKER Development for a walkthrough instruction.

 (604) 525-2108.  Pacific Standard Time.

Quick Start Instructions:

1.       Download and install the .NET runtime version from Microsoft.
http://www.microsoft.com/downloads/details.aspx?familyid=262D25E3-F589-4842-8157-034D1E7CF3A3&displaylang=en

Also, confirm you have the latest Office updates from Microsoft:        http://office.microsoft.com/OfficeUpdate/default.aspx

Make sure you have installed the Office Tools \ Microsoft Office Document Imaging product (contains the OCR engine that Discovery Assistant uses).

Also, if running Windows 2008, make sure you have downloaded and installed the Desktop Experience (Image Viewer for Jpeg).

[To install Desktop Experience, from the Start Button, go to Administrative Tools, and click on Server Manager.  In the Features Summary section of Server Manager, click Add Features.  In the Add Features dialog, select the 'Desktop Experience’ check box, and then click Next, and Install.]

2.       Unzip the Discovery.zip file and run DiscoverySetup.exe.  Setup automatically installs print drivers, Discovery Assistant.exe, PST, MSG Lotus Notes and ZIP crackers.

3.       Select the ‘All Files’ tab in Discovery Assistant, and select ‘Scan’.  Use this interface to select the directory containing the files to be converted.  After the scan has completed, you can sort the files based on filename, full path/filename, date, size, type, or whether it is convertible or not.

4.       Switch tabs to view ‘Files that can be converted’.  Then use the display buttons  to Queue 1, or Queue All files for conversion

5.       Switch tabs again to ‘Queued’ for conversion, and again, from the buttons choose to convert one, or convert all files.  Current suggestion is to first choose only one file to convert.

6.       As the files are converted the first time, one or two dialogs may pop up.  If we don’t properly time-out, and shut down the problem application, you can auto-close these dialogs yourself.  There is a second application (XDCAdmin.exe) that can be used to ‘auto-close’ these dialogs.  Details on how to auto-train the XDCAdmin to  auto-close, auto-shutdown, and auto-kill follow in the detailed notes.
(see UserManual.doc)

7.       Once you have some success, and have identified what problems (if any) there are with the test conversions, set things up to do a full conversion of all queued files.

8.       Switch tabs again to ‘Converted’ and ‘Failed’.  You can look at the resultant files using the interface provided.  Use this interface to investigate any errors that might have come up.  We can be useful at this phase to identify what fixes are necessary.

9.       After all conversions are done, you can perform the following operations from the  Converted Files tab:

·         Deblank the output files - remove blank pages from selected files.

·         Assign Bates Numbers.

·         Assign Document ID's.

·         OCR any image files (scanned PDF files).

·         Bates Stamp the resultant TIFF files.

10.    Once you are done with conversions, you can export the conversion results to Concordance / Summation / IPRO / or a comma separated value file (CSV file) for inclusion in another database, spreadsheet or document management format.

If converting XLS files, we recommend choosing between the following settings:

11.   Under Admin / Configure / Excel Settings, users have the choice to force output to a specified number of pages (print to fit), or to print at full size. In either case the complete spreadsheet is rendered, not just the last saved print range.  Default is to print the entire spreadsheet at full size.
If converting PST and MSG files, we recommend the following additional steps:

12.    If you are planning on converting PST or MSG files, there may be an issue with the Outlook security dialog.  We provide a tool to automatically close these dialogs, but if there is still a problem, The Outlook Security Dialog can be turned off permanently by opening Outlook 2007, and choosing:  Tools / Trust Center / Programmatic Access / "Never warn me about suspicious activity".

13.   Conversion will go quicker if you have the Outlook client open.  This reduces the time we take opening and closing Outlook.

14.   There is tremendous variety in PST/MSG files.  Our current working methodology is if you do have a conversion failure, we can likely comment on (and fix) the problem by getting you to send us a log files.  Log files can be generated by running imgLOG.exe before loading files to be converted.
 (Start / Programs / Discovery Assistant / imgLog.exe).

To remove date headers and footers from MSG, TXT, HTML:

15.   MSG files are formatted as TXT, RTF, or HTML.

For each of these file types we use a different application to do the printing.
TXT -> notepad
 RTF -> Word
HTML -> Internet Explorer.

For Notepad and Internet Explorer, under the File/PageSetup dialog, there are header/footer strings.

To remove headers/footers from HTML and MSG that are rendered as HTML do the following:

1.       Open Internet Explorer

2.       Go to File/PageSetup

3.       Delete the strings in the Header and Footer edit boxes

4.       Click 'ok'

5.       Exit Internet Explorer

 Can remove headers/footers from TXT rendered images the same way as Internet Explorer.

To Get Internet Header extraction working:

16.   First make sure that Outlook CDO (Collaboration Data Objects) is installed. 

Pre Office 2007: CDO comes standard with your Office installation, but must be selected ON to be installed.    

Office 2007: you must download and install a separate add-on from Microsoft:  Collaboration Data Objects, Version 1.2.1
http://www.microsoft.com/downloads/details.aspx?familyid=2714320d-c997-4de1-986f-24f081725d36&displaylang=en

To check if CDO is already installed, Look in the registry for: HKEY_CLASSES_ROOT\CDO.Message

And look in your system folder for:  C:\WINDOWS\system32\cdosys.dll

To install CDO (Collaboration Data Objects)...

1.       Get the Office installation disk.

2.       Run Office Installation.  As you already have Office installed, choose 'update'.

3.       Select advanced...

4.       Expand the Outlook distribution list, look for and enable the ‘Collaboration Data Objects' value.  Default is to change the 'x' to read 'my computer'.

5.       Select OK.

6.       Installation should ADD CDO, and not remove any other Office applications.

You must also turn 'Internet Headers' on from the Project Options / Outlook tab
(default is ON).

Internet headers are extracted to the .MTF file (Metadata File) during conversion. You can open the Metadata file in the Converted tab to review the contents. 

Internet headers are appended to the bottom of the metadata file.

Headers will either be marked as:
                Internet Header:<UNAVAILABLE>          
or:
                Internet Header:...lots of data...

Note: Only messages that have been received have Internet Headers.  Messages that have been sent (and not received) have no headers.

To View the Internet Headers in Outlook, open the source MSG file (use the Open Source button in Discovery Assistant), then select 'View Options'.

To export Internet Headers, be sure to select the INETHEADER field on (near or at bottom of the list).

To convert WordPerfect Office files (WPD and WB3):

17.   Quatro Pro, and Word Perfect require some simple setup before doing any conversion of those file formats.

Fix is to open the application, and load a sample file (can create a simple file if you want to).  Next, make sure that the default printer is the ImageMaker XDC Service1 driver, do a print, and then do a file save.

Leave these applications open, but minimized, or reduce the window footprint to a smaller portion of the screen.

See comments further on down about using Word to handle WordPerfect files.

To get Lotus Notes support working:

18.   At startup, Discovery Assistant looks for Notes.exe in the \program files\Lotus\Notes\ directory.   It then forces a 'path' change to the local logged in user to make sure that the lotus directory is part of the system path in order for the dll's to work.  (No idea why Lotus does not do this as part of its own installation).

If you've just recently installed Lotus Notes, then you need to re-install  Discovery Assistant to set the path information.

Support for OCR:

19.   To enable Discovery Assistant to use the Microsoft Office 2003 OCR engine (recommended), first confirm that OCR is working by running the Microsoft Office Tools / Microsoft Office Document Imaging product.  Open a TIFF file, and then choose OCR to confirm the OCR engine is working.

Support for Search:

20.   Full text search can be done either before conversion, or after conversion.  See notes on dtSearch (end of this file).  Requires that you separately download dtSearch from the dtSearch website.

Support for Distributed Processing:

21.   Discovery Assistant allows multiple machines to be controlled from a master machine to provide faster throughput for job conversions.

To set up Distributed Processing, see notes near the end of the file entitled:  “Setting up Distributed Processing".

Support for Searchable PDF:

22.   To get searchable PDF working, you need to do the following:

1.       Install a Postscript print driver onto your machine.  Recommendation is to install an HP LaserJet PS, or Apple LaserWriter (either color or B&W). Can do this by doing an Add Printer from the Printer's dialog.

2.       Download and install the Discovery Assistant Postscript update from http://www.discoveryassistant.com/Download/Downloads.php.

3.       When installing the postscript update, you will be prompted to download and install GhostScript and GhostView - two open source products that convert Postscript to PDF.

4.       Re-start Discovery Assistant and re-queue files for conversion.  When converting, choose 'Postscript' as the output file type.

5.       review, bates stamp, and export as Searchable PDF.

Note: if the input file is a scanned image, the output file will also come out as a scanned image.  The only way to get text out of a scanned image is to convert to TIFF, then use Discovery Assistant to OCR it.

Switching from Demo to Release:

23.   If you have processed documents in DEMO mode, and have now licensed the product, and want to export files, you need to:                     

1.       From the Project menu item, select 'remove temp files'.

2.       Re-queue the converted files, and re-convert.  This removes the demo stamp.

Avoiding memory problems:

24)    Make sure the following aren't running:

·         Google Desktop.  Turn this off as it consumes vast amounts of CPU.

·         Microsoft Office tablet service (WISPTIS.EXE).  Turn this off as it eats memory like crazy on every file open command.  (CiceroUIWndFrame message crash)

Supported File Types (quick overview):

Discovery Assistant supports file formats based on file extension. 

To check for an associated application for any given extension, you can:

1.       Try to open the file by double clicking on the file icon

2.       Try printing the file by dragging the file over onto a printer icon.

Discovery Assistant also lists associated files:

1.       Use the DA_Sysinfo application to list supported file types.

2.       Use the Discovery Assistant / Admin / Configure / Document types to modify supported file types.

If you still can't figure out the owner application:

Check file extension at http://filext.com.

If you want to add support for a new file type, first ensure that the appropriate application is installed.  As long as that application registers a 'PrintTo' or 'Print' file association, we should be able to convert the file content to TIFF and TXT.

Here are the steps to take to add a new file association:

1.        First thing is to check if file extension type has a 'print' or 'printto' association.  Can do this by right clicking on the file, and seeing if thereis a 'print' menu item.  You can also try dragging the file from Windows Explorer onto a printer icon, and seeing if it prints.

Discovery Assistant lists all print and printto associations in the output generated by DA_SysInfo (installed in the ImageMAKER Discovery Assistant program group).

We use the registered file associations first before looking for other ways to print.  Associations are normally registered as command line strings. You can interactively review and modify review file associations by opening Windows Explorer, and choosing Tools / Folder Options / File Types.

2.       Sometimes the owner application supports printing from the command line, but doesn't properly register that fact.  Applications may require you to activate a 'register' button before it sets the file associations. 

For example, Internet Explorer requires you to select Internet Options /Programs / Reset Web Settings.

3.       If there is no registered application, and you don't know what application opens the file type, then you can search the file type extension database: http://filext.com for the proper application.   Acquire the application, and register the file associations (step 1 or 2).

4.       If there is no file association for the file type, but you know of an application that supports this file type, then there are a number  of things you can do: (all of which are somewhat messy - but permanent).

·          from Windows Explorer, manually add the file type.

·          from Discovery Assistant / Admin / Documents, add the file type

·         run "DA_Sysinfo xyz-txt"  where 'xyz' is the new type, and 'txt'  is the equivalent file type.

Then, stop, and re-start Discovery Assistant, and do a re-check on that file type.

If you want a one time solution:

·         from the Discovery Assistant non-convertible tab, use Assign Type.

5.       In some cases, we do custom development to support the file type in question. Custom file types that we've written converters for include zip, pst, msg, eml, doc, xls, ppt, and pdf.

6.       If the application supports Open, but does not have a command line Print capability, you might still be able to get things working using a macro recorder.
 
Suggested product: Macro Expert - http://www.macro-expert.com/buyall.htm

Incorrect Document Extensions:

Discovery Assistant will also detect and handle file types named with an incorrect extension.  For example, if a Word Document has an extension ".BAK" , Discovery Assistant will detect and treat that file type extension as ".DOC".

File types that we can identify using binary contents is:

Microsoft Excel

.xls

Ami Pro

.sam

WordPro

.lwp

Freelance

.prz

Word

.doc

Word 2007

.docx

Write

.wri

Word Perfect

.wpd

Lotus 1-2-3

.wk3

Microsoft PowerPoint

.ppt

Microsoft Project

.mpp

Microsoft Outlook

.msg

Microsoft Outlook Express

.eml

Calendar

.cal

Bitmap File

.bmp

PNG File

.png

JetForm Data

.dat

Sound Wave

.wav

Postscript

.ps

EDIFACT document

.edi

PKZIP arkivfil

.zip

G3/G4/ect. Tiff

.tiff

Pfs:

 

   Windows Works

.wpd

   Winworks dokument

.wpl

 

 

RTF

.rtf

Adobe Illustrator

.ai

Adobe Acrobat

.pdf

MaXware support form

.msu

Action Multimedia Player

.acp

 “Pretty Good Privacy”, RSA encrypted files

.asc

DES encrypted files

.des

CorelDRAW

.cdr

JPEG

.jpg

GIF

.gif

HTML

.htm

OLE 2 Compound document

 

XML Compound Documents (Office 2007)

 

Microsoft Office Binder document

 

Office12 / Office 2007 support

The URL for the Office 2007 compatibility pack is http://www.microsoft.com/downloads/details.aspx?FamilyId=941b3470-3ae9-4aee-8f43-c6bb74cd1466&displaylang=en.

If you add in the compatibility pack, you should be able to open and process Office 12 documents: (PPTX, DOCX, XLSX) running no an Office 2003 or Office 2000 machine.

MSG and PST handling.

Discovery Assistant excels at handling MSG and PST file formats.

Some things to do to ensure your system is running efficiently:

1.       The Outlook Security Dialog can be turned off permanently by opening Outlook 2007, and choosing: Tools / Trust Center / Programmatic Access / "Never warn me about suspicious activity".

2.       If you are having any troubles scanning a PST file, there is a Microsoft validation tool that can be used to repair PST files:

1.       Exit Outlook if it is running.

2.       Double-click Scanpst.exe, located at drive:\Program Files\Microsoft Office\OFFICE12.

3.       In the Enter the name of the file you want to scan box, enter the name of the .pst or .ost file that you want to check, or click Browse to search for the file.

4.       To specify the scan log options, click Options, and then click the option that you want.

5.       Click Start.

3.        If you've installed Office XP, be sure to disable the Speech and Handwriting Recognition software, as this eats system resources.  To disable, go to:

1.       "Control Panel"

2.       "Add/Remove Programs"

3.       "Microsoft Office," click on the "Change" button

4.       Browse to "Office Shared Features," "Alternative User Input," and select for Speech and Handwriting Recognition (both) "Not available" from the drop-down box.

 

Very neat trick:

To extract multiple MSG files from a PST file for testing, debugging, message ordering etc...

1.       Open Outlook mailbox

2.       Sort messages

3.       Select multiple messages, and then from the outlook menu, choose 'copy'.

4.       Open Windows Explorer and create a new directory.

5.       Select 'paste'.

6.       The files are written to the output directory in the same order as they are listed in Outlook.

7.       To then convert these files, 'drag' the msg files from the Explorer interface into the 'All Files' tab of Discovery Assistant.  Message order will be maintained.

Handling Outlook Security issues.

The Outlook Security Dialog can be turned off permanently by opening Outlook 2007, and choosing: Tools / Trust Center / Programmatic Access / "Never warn me about suspicious activity".

More details at: http://msdn2.microsoft.com/en-us/library/bb226709.aspx - "Code Security Changes in Outlook 2007 - MSDN Library / Office Development / 2007 Microsoft Office System / Outlook 2007 / Technical Articles".

Support for De-Duplication

Many file sets contain multiple copies of the same file.  The de-duplication feature is designed to spot these duplicate files, and ensure that only one copy is converted.

Duplicates are identified by a unique Hash Value, that is calculated for every file, message, and attachment at time of import.

For message files, the hash value is based on the 'text' content of the email message, not the binary contents of the MSG file.  The MSG binary file may contains additional unique information related to how it's stored in the PST file.  The binary file will also contain all the binary attachments.   (ie we don't hash the MSG file, but the extracted TEXT portion only). 

If two files have the same hash value, then we do a binary comparison just to be sure the files are both equal (one last final check).  If the files do not compare, then the hash value is modified to include an extension.

Every file in the project is marked 'true' if there is a duplicate. (see 'Local Duplicate' column in AllFiles).

At any time before conversion, you can also link projects to a 'global' project, and can identify global duplicates this way.  One advantage of 'global' deduplication is it will differentiate between 'primary', and 'secondary' duplicate.  (Local deduplication will flag primary and duplicate as both being duplicates).

The de-duping feature is controlled from the Options / De-duping tab. 

Settings are as follows:

·         skip local duplicates when converting

·         skip global duplicates when converting

·         don't skip children unless parent is skipped

·         if duplicate is NOT skipped, then copy output files rather than converting.

 Normal default is to enable the top three choices.  The forth choice is OFF, and skipped (duplicate) files are not copied to the converted directory.

Explanation of settings:

·         Skip Duplicates means that if it is a duplicate, we don't process any further.

·         Skip Global Duplicates means that if it is a global duplicate, don't process any further.

·         Don't skip children unless parent skipped means that parent and all other children must also be a duplicate before we bother skipping that whole email chain.

·         Copy Duplicates means that we copy the resultant TIFF files from a previous converted copy.

·         Saves the time for duplication, but does not save on drive space.

User Example:

1.       User chooses a list of files to convert.  Discovery Assistant loads the list into memory, creating a unique hash code for each file scanned.  As files are added, they are compared to the list of existing hash codes already generated.  If there is a hash-code match, then both the source and the potential duplicate are binary compared (ensuring an exact match).

2.       At time of conversion, if the file is a duplicate, and has already been converted,  then we ignore (and duplicate again), skip (don't convert), or 'copy' over the duplicated TIFF file rather than do the conversion again.

3.       User selects a range of files to Export to one of the common formats.  If the selected list contains a duplicate, and the de-duplicate setting is set to 'linked', then we create an entry for the file in the output list, but point backwards in the list to the TIFF and META data of the original file.

entry 1, tiff file 1, meta file 1, text file 1

entry 2, tiff file 2, meta file 2, text file 2

 entry 3, tiff file 1, meta file 1, text file 1     <----- duplicate of entry 1

 entry 4, tiff file 4, meta file 4, text file 4

 

The XML files keep track of what files are skipped at time of conversion. (these are marked as skipped instead of converted).  The XML file can be exported as a MDB or XLS file for documentation purposes.

Support for De-Blanking - removal of blank pages.

Discovery Assistant defaults to print the entire spreadsheet, not just the last defined range. When printing the entire spreadsheet, it is possible that blank pages will be produced.

After conversion, from the Converted Tab, users can select 'deBlank' to remove blank pages.

Deblanking goes through each page and looks for black bits in a 10x10 cell grid. If there are more than 200 black bits in any cell, then the page is not blank.

If blank pages are discovered...

Discovery Assistant update the MetaData to indicate what pages have been removed, and creates a 'cleaned' and 'removed' output file.

User can then look at the 'cleaned' file, and the 'removed pages' file to confirm that we've not made any mistakes.  Cleaned + removed = total

Bates Stamping     

To ensure that the Bates Stamp does not obscure any important information:

1.       Confirm that the ImageMAKER XDC Service1 printer has the proper unprintable region margins set. 

In most cases it doesn't make much difference as most business documents do not print right to the edge.  However, if you are converting image files (TIFF / fax / JPEG pictures / etc.)  the converting application may print right to the border edge.

To check/change the printer borders, go to the printers dialog, and select the print properties for the ImageMAKER XDC Service1 printer.  In the Device Settings tab, look for and set the unprintable regions.  Recommend a border of .25 inches.  It may be useful to set the top margin to 0, and the bottom margin to .5 in order to get more room for the bates stamp.

2.       Convert from the 'Queued' directory tab, to the 'Converted' directory tab.

Confirm that the output images have a white space border.

3.       Set up the Bates Stamp.  Margins are defined in the setup area. 

BatesStamp the output.

4.       Review the resulting files.

5.       To print the resulting TIFF images to a hard copy printer, you have two choices:

1.       Print to edges

2.       Scale to fit the printable region.

If you are looking to get the Bates Labels as tight to the outside printable region as possible, then you can set the imgview.exe application (what we use to print TIFF files) to 'print to edge'. To do this, open one of the tiff files in imgview.exe by double clicking on the thumbnail imagein Discovery Assistant.  Then choose menu / Options / Print to edge, and close the imgview application.

Bates stamping images with no image scaling or compression:

If you are looking to Bates Stamp TIFF images without any additional scaling, then the conversion from 'queued' to 'converted' must be run through our imgview.exe application AND the setting 'print to edge' must be set on.  To make the imgview.exe application the default TIFF print application, from the same imgview.exe options menu, select 'set as default viewer'.  Then stop and re-start discovery assistant to pick up the new file association.  You can confirm what the current default viewer is by doing a 'view source'.

Advanced Excel Spreadsheet Processing:

Under the Admin / Configure / Excel tab, the user can set up the following preferences:

Orientation:   Default / portrait / landscape

Scale: 

 Default

Fit-To (pages wide / pages high)

                                Zoom To (% of normal size)

                Show Comments: Default / None / at end of sheet / as displayed on sheet

Page Order: Default / Down then over / Over then down

Print Quality: Default / 200/300/400 dpi.

Paper Size: Default / Standard paper sizes.

Turn headers/footers off.

                Print just the last saved print range, or the whole spreadsheet.

Set all worksheets to active before converting

Clear print area before converting (print all cells)

Scale:

To limit the number of pages when printed, suggest setting the default size to Fit to 1 page wide, 10 pages high.  Special case printing can then be done based on the thumbnail output images produced.

If you have wide varieties of Excel spreadsheets, some with lots of pages, others with only a few pages, our recommendation is to print excel at less than 100% size. Things still look very good at 75% scaling. Can easily go as low as 50% scaling. This reduces the number of pages, and gives you a better chance that you get more meaningful information on each page.

Print Area:

Default is to print entire spreadsheet, not just the print area.  When printing all cells, need to look for and remove blank pages afterwards.

Set all Worksheets to Active:

Default is to set all sheets to print.  Otherwise, print only the active sheets.

Disable Macros, Re-calculate:

Macros and auto-recalculate are disabled.

Currently in testing:

Ability to unhide cells, columns, rows, extract formulas as part of the MetaData, set column width.

Support for proper DATE and TIME settings in Word and Excel.

Word and Excel contain macros and functions that return the 'current' date and time.  The expectation is that these date/time values are properly set when the user creates, prints, or saves a work document.

When submitting a document into discovery, the date/time printed in the TIFF image has to match the date/time the document was last accessed.

Discovery Assistant solution to the date/time problem is to set the system date/time to the document's last saved date/time before doing the conversion.  To enable this feature, go to the Admin/Configure screen.

There you will see the following:

IMPORTANT: Some document headers and footers will render the current date and time.  If you need this to reflect the LastWrite time of the file being converted, check the box below.

Warning: This option may have unpredictable effects on the system  and other applications

X  Reset System Time to file LastWrite Time before conversion.

The only 'unpredictable effect' we can currently think of is that the Discovery Assistant application is 'killed' during conversion, and does not re-set the system date/time back to current. This can easily be solved by going into the Control Panel Date/Time applet, and re-setting the system time.

The visible indication that we are changing the system time can be seen when the computer time value changes in the bottom right hand corner of the computer's task bar.  We always make sure to set the time back to the exact correct value by keeping track of (and accounting for) the elapsed time since changing the system clock value.

DataBase Format for the Discovery Assistant project:

Quick background on our database structure is as follows:

Discovery Assistant uses XML as the data storage format.  Records are read into memory, manipulated in memory, then saved every 100 or so conversions, or when the user closes the file.  Otherwise, all database activity is done in-memory, using .NET controls.

 The advantage of running the database completely in memory is speed.  Things that traditionally take a long time using a transaction based database run 1000's of times faster in a 'memory loaded' database.  Traditional time consuming activities include:

·         Generating and re-viewing different data views of the same data set.

·         Changing a status value for each record in the database (queued status, bates number, document ID)

·         Operations that add 1000's of records at a time.

 

Other advantages of the XML format are:

1.       Universal format can be converted to any other format with a wide variety of available tools.

2.       .NET controls work with XML natively.

3.       Very compact way of storing variable length data.

4.       Can be read/searched by humans using a simple text editor.

5.       Can be repaired if corrupted by an external process/activity (like power failure during a file save).

6.       Can be manually edited by a text editor is users want to a general search and replace.

 

The disadvantages of having the data stored in memory are:

·         takes up memory  (best to limit projects to 500,000 record items or less).

·         if the application crashes, you lose data back to the 'last saved'  version.  (need to save after major activity).

·         Can't multiplex access to the same data from more than one machine. (currently not an issue).

 Our rule of thumb is to limit project sizes to 1 or 2 gigs per project, to a maximum of 100,000 to 200,000 items.

If you have data sets bigger than 200,000 items, or larger than 2 gigs in size, then best to break the data down into multiple projects, possibly spread across multiple machines.

 Another rough rule of thumb: A single machine running Discovery Assistant can process on average 1 gig of data per day.

 For really large projects (Terabytes in size):

We provide an Access Database (MDB) tool we call TeraBite, that enumerates all the files in a given directory tree or server share, then creates a database containing that list of files to process.  The database list can then be written out as multiple text based Load List for further processing by Discovery Assistant.  Load List contents are defined by a maximum number of files, or maximum cumulative file size.  Load Lists are serially processed by the service provider in batches across multiple computers.  As loads are completed, they are exported out to a format suitable for review by the customer. This way data flows through the process in chunks, and delivery of the first chunk can happen in a single day (or less) after start of conversion.

 Support for Scalability

Additional conversion machines can be added to improve overall throughput.

Large conversion requests can be broken down into a set of smaller jobs, each of which is run on a different computer. 

To ensure that each computer has the same conversion settings, we recommend saving the DiscoveryAssistant.xml file, and the HKLM\Software\Imagemaker registry settings, and then duplicating these two files across the various machines.

The discoveryAssistant.xml file (installed in the same directory as the discoveryAssistant.exe file) \program files\imagemaker\discovery assistant\discoveryAssistant.xml

Contains all the global project settings (and is text readable).

The remaining settings (that control document formatting) are saved in the registry.  You can export the hive HKLM\Software\ImageMAKER to a .REG file, and use this as the other settings file.

To match a second machine's settings:

·         install Discovery Assistant

·         copy over the DiscoveryAssistant.xml file

·         double-click on the saved imageMAKER.reg file

Support for Lotus Notes (NSF)

Discovery Assistant supports loading Lotus Notes NSF files natively.

Before installing Discovery Assistant, make sure you have Lotus Notes client version installed first. (we've tested using Lotus Notes Domino Designer 6.0.3).

If Discovery Assistant is already installed, install Lotus Notes, then re-install Discovery Assistant.

At startup, Discovery Assistant looks for Notes.exe in the \program files\Lotus\Notes\ directory. It then forces a 'path' change to the local logged in user to make sure that the lotus directory is part of the system path in order for the dll's to work.  (No idea why Lotus does not do this as part of its own  installation).

If you've just recently installed Lotus Notes, then you need to re-install Discovery Assistant to set the path information.

Download site for Lotus Notes client:

                http://www-128.ibm.com/developerworks/downloads/

Lotus Notes®, Domino Designer, and Domino Administrator clients V8 or later.

 Here is the direct link to the notes client: (these link names change over time)

http://www.ibm.com/developerworks/downloads/ls/lsndad/?S_TACT=105AGX28&S_CMP=DLMAIN

Lotus Notes Metadata

Type: Lotus Notes Document

 ID:B0DB4E68D9BF457B86256FBA00621AE9

From: CN=Helmuth X Fendel/OU=LAKE/OU=CORP/O=ABBOTT

To:CN=Giorgio Martellino/OU=ADDITN11/OU=ADD_ITL_HUB/OU=ADD_EURO_HUB/OU=ADD_HUB/O=ADD/C=US@ABBOTT;"Karrer, Roberto (INT'L)" <Roberto.Karrer@ace-ina.com>

Cc:Bryan.Willcox@ace-ina.com;CN=Charles M Santora/OU=LAKE/OU=CORP/O=ABBOTT@ABBOTT

Bcc:

Subject:Re: Sibutramina - Privileged & Confidential

Sent:2002-04-10 10:23:54

Received:2002-04-10 10:23:56

Date Modified:2005-03-04 09:51:34

Date Created:2005-03-04 09:51:34

Date Accessed:2005-03-04 09:51:34

 Size:18255

 Importance:1

 Priority:1

Mood:0

PreventCopying:0

ReturnReceipt:0

IsSentByAgent:0

Number of Attachments:0

Body: Message Contents

EndBody:

 

Known Problems:

On some NSF files, we have troubles extracting attachments.  Fix seems to be to stop, then re-start Discovery Assistant, then re-import the NSF file (or do a re'check if already imported).

Alternate solution:

Convert Lotus Notes messages to PST

http://www.lotus-notes-export.com/XitNotes.asp

Problem #2

A user with appropriate Access Control List (ACL) rights receives the following error when attempting to open a local replica of a database:

"The database has local access protection and you are not authorized to access it locally."

                Solution

This will occur in cases where a user other than the current user created the local replica. This occurs because the Notes client has a default security setting to encrypt local replicas. This setting is accessed via File -> Security -> User Security -> Notes Data -> Databases. To create local replicas that are not encrypted, select "Do not locally encrypt" rather than the default "Locally encrypt using".

                Alternate Solution:

Switch to the user supplied Notes id file (filename.id); then open the datbase  enter the password, and go to File/Access Control, set everything to Manager;  then File/Application/Properties/Encryption Settings and checked the "do not encrypt" box; and then compacted the database.

Related information

How to Determine Which Databases Are Encrypted

Encrypting PAB causes error 'Unable to create location'

Error Accessing Server Database "This Database Has Loca

http://www-1.ibm.com/support/docview.wss?rs=0&uid=swg21088323

Problem #3

If you encounter problems, run the Logger (red button on top right of Discovery Assistant application).  You can then email us the log contents for further analysis. 

example problem:

[08-06-27 11:35:47 AM DA::OpenNsfDatabase()] System.Runtime.InteropServices.COMException (0x80040154): COM object with CLSID {5FB98ACD-8EAA-4E2D-A980-9B1C678B8C4D} is either not valid or not registered.

possible resolution:

1.       From DOS prompt, type 'path'.  Make sure that the path contains a pointer to the nnotes.dll file (c:\program files\lotus\notes\NNOTES.DLL)

2.       re-register the nsfCracker.dll: regsvr32 "c:\Program Files\Imagemaker\Discovery Assistant\NSFCRACKER.DLL"
Make a note of any reported problems registering the DLL.

If installing on Vista, make sure DOS is running in Admin mode. (right click on Command prompt, and choose Admin).

Problem #4

Notes error: "You must supply the bulk decryption key in order to extract this file object."

Resolution:

The Encrypt incoming mail field is set to Yes in the Mail section of the user's Address Book entry. Once this was changed to “No” the problem is resolved.

Problem #5

You attempt to read newly-encrypted mail (i.e. with a new key) with an old backup ID file that does not contain the new key, and the following error occurs:

"Specified Private Key Does Not Exist."

Additionally, if an encrypted message has an attachment and you attempt to open it with an old backup ID, the message above is generated and an empty message with the attachment is displayed. If you then try to launch or detach the attachment, the following error occurs:

"You Must Supply the Bulk Decryption Key in Order to Extract This File Object <path\filename>)".

or (in Notes 4.6x):

"The encrypted data has been modified or the wrong key was used to decrypt it: Could not detach to file <path\filename>

or (in Notes 5.x):

"You cannot access portions of this document because it is encrypted and you do not have any of the keys: Could not detach to file <path\filename>

This issue only occurs if an old ID is being used. To avoid the issue, use a current ID.

The fact that the error messages could be more descriptive has been reported to Lotus Quality Engineering.

Possible Resolution:

This issue might occur when a user's ID has been updated with a new Public key, and the user is using an older version of their ID which contains the old Public Key. A user can initiate the updating of their Public key by using the menu options: File, Tools, User ID, More Options, New Public Key. The ID will then need to be recertified.

Support for Novel GroupWise

Unfortunately, we currently do not support GroupWise directly.

However... there is a product that might be able to help with migration: http://www.transend.com/products_transend_migrator.asp

There is a special Transend Migrator Forensic Edition license for use in forensic environments for eDiscovery. One license per workstation allows the conversion of an unlimited number of data files/mailboxes. Please contact us for more information on Transend Migrator Forensic Edition.

Transend Corporation, 

                225 Emerson Street, Palo Alto, CA 94301

                Phone: 650-324-5370

 

Converts Messages/Folders, Attachments, Archives, Address Books, Calendars and Tasks Between Virtually All Email Systems/Clients.  Includes support for:

                Lotus Notes

 Outlook/Exchange (server or .pst file)

                Outlook MSG Files

                GroupWise (5.5+ for GW Archives)

                IMAP4 Server

                HTML

                Eudora

                Netscape/Mozilla/Thunderbird

                AOL

                CompuServe 2.0+

                Outlook Express

                Pegasus

                Notework

                ExpressIT (Native and SMTP)

                cc:Mail

                DaVinci 3.0+

                MHS/SMF-70

                Calipso Archive

                Transport File (Transend proprietary format)

                Sun One (via IMAP)

                Pop Server

                http://www.transend.com/supported_mail_systems.asp

Support for PaperPort .MAX files

If you have the Paperport application installed, and it supports printing MAX files, then Discovery Assistant supports converting MAX files to TIFF. (tested and works).

CommandLine: <D:\Program Files\ScanSoft\PaperPort\PPPAGEVW.EXE /p z:\web_test_files\5pages.max">

Forensics Tools:

  OST -> MSG   Advanced Exchange Recovery.  http://www.exchange-recovery.com/.  $600

  PST -> MSG   Aid4Mail  http://www.aid4mail.com

 

  hard drive usb write protect.

    Logicube Hard Drive & Media Duplication

    http://www.logicube.com/logicube/pressreleases/writeprotect.asp

Support for OCR

Discovery Assistant now supports a native OCR feature.  The default is to use Microsoft Office 2003 MODI control if available, otherwise, uses a SimpleOCR package shipped with Discovery Assistant. 

Microsoft MODI OCR uses the OmniPage SDK engine from Nuance software (and is our current best of breed recommendation).  To confirm you have MODI installed, run Microsoft Office Document Imaging application, load a TIFF file, and select 'OCR'.  The application will install OCR if not already installed.

Discovery Assistant extracts text from documents during the conversion process. Extraction is extremely accurate as the text is generated by the print driver during the print process.

If source documents are scanned images though, there is no text extraction when printing.  In this case, you can manually OCR those documents you would like the text from by selecting the OCR button in the Conversion Tab.

Fixes:

OCR was not successful (no text was found) on one or more pages.   http://support.microsoft.com/kb/918215/en-us

Support for TIFF Pass-through

To speed up the processing of scanned TIFF documents... if the source documents are already in the proper format (scanned B&W, standard dpi), then processing can be sped up by selecting 'Enable no-Print convert on images'.  You can select this flag from the Admin / Configure screen.

If this flag is set, then Discovery Assistant does not print the image, but creates an exact copy of the source image ready for further processing. formatted (standard dpi, and scanned in B&W)

Support for Scanned TIFF Files

For Discovery Assistant to properly fill in the Custodian / Box / Folder information at time of export, the scanner operator must save the original TIFF files according to the following rules:

1.       Each Custodian gets their own output directory.  An example custodian would be John Smith.

2.       Within the custodian directories are subdirectories that correspond to each of the boxes.  ie: if John Smith's documents come in three boxes, then there are three BOX folders in the John Smith directory.

c:\...\John Smith\Box1

c:\...\John Smith\Box2

c:\...\John Smith\Box3

 

3.        Within the box directories are the folder names.  If a folder contains multiple folders, then those names are appended.  ie: if Box1 contains 3 folders: January, February, March, and the January Folder contains two sub folders: Invoices, Receipts - then the output scanned TIFF files will be placed in the following directories:

c:\...\John Smith\Box1\January\*.tif

c:\...\John Smith\Box1\January-Invoices\*.tif

c:\...\John Smith\Box1\January-Receipts\*.tif

c:\...\John Smith\Box1\February\*.tif

c:\...\John Smith\Box1\March\*.tif

 

Discovery Assistant then uses the following logic to automatically generate the Custodian / Box / Folder export information:

1.       TIFF filename, "c:\...\%1\%2\%3\filename.tif" is broken back into the following sub-directories:

                sub directory %1 is Custodian name

                sub directory %2 is Box name

                sub directory %3 is folder name

 

This way, no one has to hand-code any information. 

 The Discovery Assistant operator checks that the scanner folder is correct when adding files into the project (checks that the scanner operator has done their job).  If there is any confusion as to where documents came from, they can be immediately traced back to the original folder by using the displayed source path name.  The source path name is reduced to Custodian / Box / Folder at time of export.

 If at some later date the Discovery Assistant operator is processing a PST file, or multiple folders of data, then the same rules apply when extracting the Custodian, Box, and Folder.  (ie: everything is consistent moving forward).

Handling WordPerfect files using Word: (WPD)

If installing WPD support for the first time in Word, you may need to install the WPD plug-in.

Next, you need to set up a file association for WPD files.  Easiest way to do this is to right-click on a WPD file, then associate Microsoft Word with that file type.  (Open association).

Next, to get print and printto file associations established, easiest way to do this is to run the DA_Sysinfo.exe application and use it to copy across the DOC file associations.  \program files\imagemaker\discovery assistant\DA_Sysinfo.exe  wpd-doc

To switch to using WordPrintTo to handle WordPerfect, go to Discovery Assistant Admin / Configure / Documents, and go to .DOC to get the over-ride settings.

 The Override setting for WPD will look something like:

     "C:\Program Files\ImageMaker\Discovery Assistant\WordPrintTo.exe" /pt "%1" "%2" "%3" "%4"

Copy these same settings to the WPD entry in the Documents dialog.

Calculation of MD5 Hash code to detect duplicates:

 Under Options / De-Duping, users can set the following values:

 Hash Code Sample Size(KB) 100  (set to 0 for entire file)

 On conversion:

·          Ignore Duplicates   (process as usual)

·          Skip Duplicates     (don't convert)

·          Copy Duplicates     (copy the TIFF file from previous conversion) <---- recommended

·          Link Duplicates     (point to the TIFF file from previous conversions

Hash codes are generated when the file is first entered into the database. To speed things up, users can set hash code generation to just the first K bytes of a file.  Default is 100K.

Duplicates are generated as files are added to the database.  If a duplicate is found, the duplicate file, and the file being added are both marked as 'duplicate'.

Email files are binary files with unique index values within them (MessageID). To compare if emails are duplicates, we extract and check only the text contents of the message.

Algorithm to check for and assign duplicates:

For each new file being processed:

If Message File, then extract message body (as text).

 Calculate MD5 hash code for first (x) bytes of file.  (multiple of 1K)

Convert hash code to a string

Loop until Done:

Search existing database for first matching MD5 hash code.

                                If no matching MD5 hash code

                                                Add new hash code.

                Done

                                Else

                Binary compare the two matching files.

                                                If files match

Mark both as duplicates.

Done

Else

Add a character extension to the hash value to make it unique, and loop

End Loop

Handling Terabytes of Data:

We rate our product at a gig per day per machine.  1 gig of data averages out to approximately 70,000 pages, and about 5 gigs of storage space.  Actual conversion speeds are rated at 3,500 pages per hour of straight conversion (20 hours a day), plus an additional 4 hours a day to handle the other house-keeping tasks, like file import / de-duplication / deblanking / bates labeling / exporting etc.

In addition to straight conversion is the time to:

·         Set up the machines and install all appropriate software.

·         Quality control review of output data.

·         Exception handling.

·         Trouble-shooting.

The Discover Assistant operating philosophy is that if you want to convert Terabytes of data, you need multiple-Terabytes of storage space and lots and lots of computers. 

To handle terabytes of data requires:

·         Enumerating what files are to be converted using our TeraBite application.

·         Breaking the project down into 1 or 2 gig Batch files.

·         Optionally run a global 'de-dup' check

·         Process the Batch files across multiple machines.

·         Export the resultant files back into a case management system for additional processing

 

Assuming each GIG of data yields approximately 70,000 tiff pages, rough estimates as to time to process the data are as follows:

Pages per TeraByte:

70,000 pages per gig  * 1000 gigs = 70,000,000 pages

Computer time to process a TB (assumes an average of 1 second per page):

70,000,000 * 1/60  =  1,200,000 minutes

20,000 hours  or approximately 1000 days.

Standard outsourcing prices per TB (3 cents a page):

                                70,000,000 * .03 = $200,000

If you had 100 computers on-site processing the data, 1TB would take 10 days to process.

With two operators running the machines, costs to do a TB would be:

        $20K amortization of computers  (10% of $200,000 worth of equipment)

        $20K amortization of software (10% of $200,000 worth of ImageMAKER, Office, etc)

     + $10K for operator costs

        $50K

 

If you were to outsource the same job to a third party (at preferred rates), expected costs would be:     $200K

System Requirements:

Windows 2000, Windows XP, Windows 2003 (client, server, or WTS).

1 gig of ram.

30+ gigs of hard drive space (for output files).

Microsoft Office, Acrobat, IE6, and any other file type application  pre-installed.

 Preferred Computer Configuration:

3 Gig memory

Dual 64 bit AMD 2 GHz processor running Windows XP

                200 GIG hard drive.

                Gigabyte network cable

Discovery Assistant comfortably handles the conversion of up to 100,000 files per project. For example, if you have one million files to convert, then our recommendation is to break them down into 10 separate projects.

Sample Hard Drive requirements:

3.3 Gigs NSF file

expands to:

15284 files.

9 Gigs of source files

13 Gigs of TIFF/Text/Metadata

Handling Parent Child relationships:

When we load in PST / MSG/ ZIP files, we keep track of all parent/child relationships between the related files.

Specialized parent/child relationships handling is done at:

·         Queuing for conversion.

·         Assigning doc id's and bates numbers

·         Time of export

·         User interface can identify parent / child / sibling  of any item in the queue.

 

In addition, there are a number of metadata fields designed explicitly to identify ranges of parent/children.

Current methodology for handling parent/children we first load the parent, then we enumerate each of the children, assigning sequential FileID numbers as we go along. Next we process the children's attachments, assigning FileID's as we move along, and so forth until the message is processed.  We then go onto the next message.

'Child next' order turned OFF is how we load the files into Discovery Assistant.

0(0-6)

|

-------------------------------------------------

|              |              |              |              |              |

1(0-6)    2(2-9*) 3(0-6)    4              5     6(6-12*)

|                                                                              |

 -----------------                                   ---------------------

|       |               |                                   |              |              |

7      (8-15*)    9                                   10           11      12(6-12*)

         |

         -----------------

         |       |       |

          13    14    15

 

'Child next' order turned ON is how we assign Bates Numbers, Document ID's in preparation for export.

0(0-15)

|

-------------------------------------------------

|              |              |              |              |              |

1              2              9              10           11           12

     |                                                                         |

     -----------------                               ---------------------

     |         |             |                              |              |              |

     3         4              8                              13           14           15

                |

                -----------------

                |       |      |

                5       6      7

 

Assumptions:

1.       There is only one parent, everything else is a child.

2.       every child has the same parent (0), and all files have the same range (0-15).

3.       when we hit a PST file, that breaks the cycle.  PST files are not parents. If a msg file contains a PST file, then we don't keep track of children.

4.       The diagram contains items, and range values in brackets.  Any range value with an asterix in it is an incorrect 'child next' grouping.

Parent Child relationships are reported in the export files through the following MetaTags:

(Summation example)

@ATTACHRANGE         "filename" based

@C GROUPRANGE        "filename" based

@C BATESGROUPRANGE   "bates number" based

@C BEGATTACH         "filename" based

@C ENDATTACH         "filename" based

 

Where "filename" can be any combination of  DocID, FileID, Title, Bates Number etc.  (naming scheme).

 

If your original files are all named by their DOCID's, and you want to preserve that information at time of export, then set "filename" to %TITLE% in the naming scheme, and all exported files, and file ranges (except for BATESGROUPRANGE) will be based on the original filenames.

Exported Ranges of DOCID and BATES Numbers:

DocID Export fields for Summation:

   Parent: DOCID_00016

   Attachments DOCID_00017-DOCID_00022

 

   @ATTACH DOCID_00017; DOCID_00018; DOCID_00019; DOCID_00020; DOCID_00021; DOCID_00022

   @ATTACHRANGE DOCID_00017-DOCID_00022

   @C BEGDOC DOCID_000160001

   @C ENDDOC DOCID_000160002

   @C GROUPRANGE DOCID_00016-DOCID_00022

   @C BEGATTACH DOCID_00016

   @C ENDATTACH DOCID_00022

 

Bates export fields for Summation:

   Test Data:

   Parent:  NTR00033-NTR00034

   first attachment NTR00035-NTR00035

   last attachment: NTR00041-NTR00053

 

   @BATESBEG NTR00033

   @BATESEND NTR00034

   @C BATESGROUPRANGE NTR00033-NTR00053

   @C BATESGBEG NTR00033

   @C BATESGEND NTR00053

 

Note: @ATTACHRANGE and @ATTACH are the only export field that list JUST the attachments.  All other fields include the mail message and attachments within the range.

Support for UTC Date/Time

All dates are UTC based (coordinated universal time), but expressed in the local time zone of the machine that is doing the conversion.

Here are the exceptions:

·         All files have a Created, Modified and Accessed date stored in UTC format.  These date/times are displayed in local time by the machine that is accessing them.  For example, if the files are written to a hard drive in London at 9:00 AM (Local time is UTC-0), and that hard drive is then read in Vancouver (Local Pacific time is UTC -8), the time stamp will be reported as 1:00 AM.

·         If files are 'copied' to another directory or filename, then the Create and Last accessed dates of the new file will change to today's date and time. 

·         If files are 'zipped', then 'unzipped', the Created Date, and Last Accessed Date will be set to today's date/time.  Only modified date will be retained.  Same goes for RAR compressed files - when uncompressed, only the 'modified' date will be correct.

·         Email received/sent date/time values are stored UTC, and reported in local time.

·         Word / Excel / Powerpoint, Acrobat store the following additional UTC dates in their MetaData:

Date Created

Date Last Printed

Date Last Saved

When we extract these date/times, we use these values to over-ride the operating system values for Created/Modified. 

Note: most simple file types (such as TXT, HTML, JPEG) have operating system date/time values for Created, Modified and Accessed, and do not contain any embedded Date/Time Metadata.

Loading in a 'Selection Set'

Discovery Assistant supports two different selection sets: Document ID, and FileID. (FileID's are internally assigned numbers.  Document ID's are user defined strings).

To define a selection set, create a TEXT file containing the FileID's, or DocumentID's, one file per line in the text file, then use the 'select' button to load that file in.

Items in file list are marked 'selected'.

Practical use:

1.       Do a first pass-through to produce converted documents.

2.       Assign DocumenID, and export DocumentID with data set.

3.       Customer reviews data, and provides you with a list of DocID's to produce.

4.       Load selection set in 'converted' tab.

5.       Assign Bates Numbers to selection set.  Choose 'child recursive' to get children.

6.       Sort on Bates Numbers, and select only those that have been assigned bates numbers.  Parents of children are identified by 'bates range'.

7.       Select which files you want to bates stamp.

8.       Bates Stamp 'selected' files and export OR Export unstamped Selected files.

 

 

Estimating Page Counts

If you convert to TIFF, and then use the 'summary report' on the all files tab, you get a CSV file containing formatted information about:

·         file types

·         number of pages per file

·         total size of files by type

·         number of files converted / passthrough / failed /skipped.

 

If you don't want to invest time converting the files, BUT still want an estimated page count (for billing purposes), then queue the data for MetaData conversion only.   Then convert (metadata only).  Discovery Assistant then estimates page count based on file size if the count is not already included in the metatata.

Values used to 'estimate' bytes per page, broken down by file type, are stored in the registry at: HKLM\Software\ImageMAKER\DiscoveryAssistant\Settings\PageCountEstimates. If you do change these values, you need to stop/re-start Discovery Assistant for them to take effect.  These values are also stored in the setup.ini file, and re-set at installation time.

Actual values for TIFF files are calculated.

Metadata values for Word and PDF are used if available.

All other file types are defaults at 10,000 bytes per page.

Some file types (like zip, nsf, pst, msg) are estimated at 1 page per.

Page 1 Go to the Next Page
For more information

ImageMAKER Development Inc.
416 Sixth Street, Suite 102
New Westminster, BC
Canada V3L 3B2
http://www.imgmaker.com
Copyright © 2004-2008
To contact us from overseas:

Sales: 1.604.525.2170
Local (Pacific) time: GMT-8
ImageMAKER Development Inc.

Sales: toll free (866) 525-2170
or (604) 525-2170
Support: (604) 525-2108
Fax: (604) 520-0029
Email: sales@imgmaker.com
support@imgmaker.com