Product Description FAQ Pricing Downloads Tech Notes Litigation Support Contact Us Resellers
Goto Page 1   2   3

Appendix A:  Export Format Notes

Export Naming Conventions:

Exported files can be named using any combination of the following:

%ProjectID%

Three letter project ID

%FileID%

Internal file ID

%TITLE%

Original file name(includes parent if zip/msg/eml)

%SHORT_TITLE%

Guaranteed 32 char unique name

%EXT%

Original file extension

%BATESSTART%

Starting bates sequence for file

%BATESEND%

Ending bates sequence for file

%PAGE

Page number

%BATES%

Bates number for page

%DOCID%

User assigned document ID

ASCII_STRING

Any ASCII string

Export File Formats:

Export Directory Structure Options:

If we take a set of source files,

Source Files

Assigned Name

custodian1\list.doc

file1

custodian2\Folder1\sample.pdf

file2

custodian2\Folder2\sales.xls

file3

custodian3\Box1\Folder3\january.doc 

file4

custodian3\Box1\Folder3\february.doc

file5

custodian3\Box1\Folder3\march.doc

file6


Will get the following directory exports:

Flat:

---------

 

|---Output

|      File1.tif

|      File2.tif

|      File3.tif

|      File4.tif

|      File5.tif

|      File6.tif

|

|      File1.txt

|      File2.txt

|      File3.txt

|      File4.txt

|      File5.txt

|      File6.txt

|---Source

       File1.doc

       File2.pdf

       File3.xls

       File4.doc

       File5.doc

       File6.doc

 

 

 

Mirror:

---------

 

|---Custodian1

|   |---source

|   |      File1.doc

|   File1.tif

|   File1.txt

|

|---Custodian2

|   |---Folder1

|   |   |---source

|   |   |      File2.pdf

|   |   file2.tif

|   |   file2.txt

|   |

|   |---Folder2

|       |---source

|       |      File3.xls

|       file3.tif

|       file3.txt

|

|---Custodian3

    |---Box1

        |---Folder3

            |---souce

            |      File4.doc

            |      File5.doc

            |      File6.doc

            File4.tif

            File5.tif

            File6.tif

            File4.txt

            File5.txt

            File5.txt

 

 

Bates:

 

----OUTPUT

|   ----Bates_file1

|   |      File1.tif 

|   ----Bates_file2

|   |      File2.tif 

|   ----Bates_file3

|   |      File3.tif 

|   ----Bates_file4

|   |      File4.tif 

|   ----Bates_file5

|   |      File5.tif 

|   ----Bates_file6

|          File6.tif 

----SOURCE

|   ----Bates_file1

|   |      list.doc

|   ----Bates_file2

|   |      sample.pdf

|   ----Bates_file3

|   |      sales.xls

|   ----Bates_file4

|   |      january.doc

|   ----Bates_file5

|   |      february.doc

|   ----Bates_file6

|   |      march.doc

----TEXT

|   ----Bates_file1

|   |      File1.txt

|   ----Bates_file2

|   |      File2.txt 

|   ----Bates_file3

|   |      File3.txt

|   ----Bates_file4

|   |      File4.txt

|   ----Bates_file5

|   |      File5.txt 

|   ----Bates_file6

           File6.txt

 

 

 

Vol/Box

 

----VOL0001

    ----BOX0001

    |   |----source

    |   |      File1.doc

    |   |      File2.pdf

    |   File1.tif

    |   File2.tif

    |   File1.txt

    |   File2.txt

    |

    |---BOX0002

    |   |----source

    |   |      File3.xls

    |   |      File4.doc

    |   File3.tif

    |   File4.tif

    |   File3.txt

    |   File4.txt

    |

    |---BOX0003

    |   |----source

    |   |      File5.doc

    |   |      File6.doc

    |   File5.tif

    |   File6.tif

    |   File5.txt

    |   File6.txt


Summation DII notes:

Classifications of DII Files

Summation created a batch load file format and protocol that service bureaus can use to facilitate the processing and delivery of eDiscovery that will be loaded into a Summation case. Service bureaus can provide eDiscovery using three different types of DII files:

* Class I DII file - This class is geared toward traditional paper discovery service bureaus that scan paper documents and use Optical Character Recognition (OCR) technology on the resulting imaged documents. Also, in this model, e-mail messages and electronic documents (received in either in paper or native, electronic format) are converted or petrified by a service bureau to TIFF or PDF image formats, and the text and metadata are extracted. When loaded into a Summation case, the image information is loaded into the ImgInfo table, the full-text is loaded into the ocrBase, and generated metadata is loaded into the Core Database. The difference between a Class I DII file and a DII file prepared for previous versions of Summation is the ability of the Class I DII file to more easily maintain the parent/child relationships of compound documents.

* Class II DII file - This file is geared toward forensic-oriented service bureaus that extract or parse metadata and e-mail message information for loading into designated Summation Core Database fields. Native electronic files are copied to the eDocs repository specified in the case directory structure. Once the files are copied and the data loaded, the user can take advantage of Summation's multi-file format index, search, and retrieval functions to produce electronic documents in their native formats. These Class II DII file attributes will allow users to narrow or winnow down a collection of electronic data, such as e-mail messages, to only disclose relevant non-privileged data to the requesting party. The Class II DII file also facilitates the preservation of the parent/child relationships of compound documents.

* Class III DII file - This file is a combination of the Classes I and II DII file formats.

The above DII load file classes give Summation users the ultimate flexibility for applying the varying formats and protocols used to acquire, process, deliver, and deploy digital information underlying litigation, regulatory compliance, and risk management.

Note: The above DII load file formats are also acceptable formats to deliver electronic data that will be loaded into CaseVault, the litigation hosting service and subsidiary of Summation Legal Technologies. CaseVault can be used as a winnowing platform for cases that include large volumes of electronic data. Once the set is culled and reduced, the electronic data can be loaded into a Summation system for additional review and case preparation.

Note:

Tokens can be longer than 8 characters, but fields cannot be. For example, the @ATTACHRANGE token is 11, but it populates the ATTRANGE field, which is only 8. Custom tokens have to be under 8 because the fields they populate are limited to 8 chars in size.

ImageMAKER custom defined additional fields in the Summation Export DII file:

   @C FILENAME calendar.zip

   @C FILEPATH Z:\Web_test_files\calendar.zip

   @C ISDUP True

   @C DUPPATHS C:\test\test.HTM; C:\test\testcopy.htm.

   @C PGCOUNT 10

 

Details:

FILENAME - name of file at time of conversion.

FELEPATH - original source path for file (when being converted).

PGCOUNT  - number of pages in the converted file.

Default is 1 if record not defined in data set.. or defaults to last value defined if not defined in a FileID record.

If files are exported single page per file, then this value indicates  total number of exported pages for the source file.

PgCount is already defined as a custom data field in the Summation

database.

ISDUP - defines whether the record has any other duplicates in the exported data set. 

This information is used when reviewing the data - and indicates that there are other copies of the same information elsewhere in the data set.  (Field name lengths are limited to 8 chars).

Supported values are 'True' and 'False'

DUPPATHS - lists the 'filePath' source file names that are in the duplicate set.

This value lists source filenames of the duplicate files, not DocIDs' and gives an immediate indication as to where the duplicate data is stored. FilePaths are separated by a '; ' character pair (Semicolon/space).

If there are no duplicates, then the character string 'NA' is required.

Sample DII File:

; Summation DII Class I File

; Created on 7/20/2005 2:55:29 PM

; Created by DiscoveryAssistant version 3.2 build 1095

; Copyright © 2004,2005 ImageMaker Development Inc.

;

; Machine Name: BLAISE

; Project Path: F:\Work\TEST.xml

; Project Name: TEST

; Project ID: TM

 

@FULLTEXT DOC

@T 0000038

@DOCID 0000038

@MEDIA eDoc

@APPLICATION WinZip File

@C FILENAME calendar.zip

@C FILEPATH Z:\Web_test_files\calendar.zip

@C PGCOUNT 1

@C ISDUP False

@C DUPPATHS NA

@ATTACH 0000039; 0000040; 0000041; 0000043; 0000044; 0000045; 0000046; 0000047; 0000048; 0000049; 0000050; 0000051; 0000052; 0000053

@ATTACHCOUNT 14

@DATESAVED 7/21/2005

@DATECREATED 7/21/2005

@D @I\

0000038.tif

 

@T 0000039

@DOCID 0000039

@MEDIA eMail

@MSGID

@C PGCOUNT 1

@C ISDUP True

@C DUPPATHS Z:\Web_test_files\calendar.zip\calendar.pst\Personal Folders\Tasks\a second task request.msg;C:\imgmaker\temp1\a second task request.msg

@SUBJECT a second task request

@EMAIL-BODY separate task item in a separate task list.

@EMAIL-END

@ATTACHCOUNT 0

@PARENTID 0000038

@D @I\

0000039.tif

 

Available MetaData Fields for Summation:

     @C BEGDOC: Export file title of first page

     @C ENDDOC: Export file title of last page

     @APPLICATION: Name of creating application

     @C ATTCOUNT: Count of attachments

     @ATTACH: List of export file titles of attachments

     @ATTACHRANGE: Range of export file titles of attachments

     @C GROUPRANGE: Range of export file titles that belong as a group.  e.g. an email and it's attachments or a zip file and its contents

     @C BATESGROUPRANGE: Range of Bates Numbers that belong as a group.  e.g. an email and it's       attachments or a zip file and its contents

     @C BEGATTACH: Export file title of first page of group.  e.g. an email and it's attachments or a zip file and its contents

     @C ENDATTACH: Export file title of last page of group.  e.g. an email and it's attachments  or a zip file and its contents

     @C ATTTITLE: File title of attachment

     @FROM: Document author

     @BATESBEG: Beginning Bates number

     @BATESEND: Ending Bates number

     @C BATESGBEG: Beginning Bates number for group. e.g. an email and it's attachments or a zip file and its contents

     @C BATESGEND: Ending Bates number for group.  e.g. an email and it's attachments or a zip file and its contents

     @BCC: Blind Carbon Copy recipient

     @CC: Carbon Copy recipient

     @C DACOMMNT: Discovery Assistant PassThru comment

     @DATECREATED: Source document creation date

     @TIMECREATED: Source document creation time

     @DATERCVD: Email received date

     @TIMERCVD: Email received time

     @DATESAVED: Source document modified date

     @TIMESAVED: Source document modified time

     @DATESENT: Email sent date

     @TIMESENT: Email sent time

     @C DATEACC: Source Document Last Access Date

     @C TIMEACC: Source Document Last Access Time

     @C DOCTITLE: Document Title

     @C DUPPATHS: Source document paths of duplicate items

     @EMAIL-BODY: Body of email

     @C FILEEXT: Source file extension

     @C FILEPATH: Source file path

     @C XSFPATH: Exported source file path

     @C FTITLE: Source file title

     @C FILENAME: Source file name (including extension)

     @C FTYPENAME: Source file type name

     @FOLDERNAME: Email parent folder name

     @FROM: Email From address

     @C HASHCODE: MD5 hash code value for source document

     @C ISDUP: True/False is duplicate

     @C ITEMID: Discovery Assistant file ID

     @MSG: Email message ID

     @C PGCOUNT: Output file page count

     @PARENTID: Export file title of parent item

     @C SFTITLE: Short file title

     @C SIZEDISK: Source file size on disk

     @STOREID: Message store identifier

     @C STORNAME: Message store source file name

     @SUBJECT: Email subject

     @TO: Email To address

     @C ITEMINDX: Item Index

     @C INETHDR: Internet Header

     @C DOCID: Document ID

     @C ALTRCALW: Alternate Recipient Allowed

     @C AUTOFWD: Auto Forwarded

     @C BILLINFO: Billing Information

     @C CATEGOR: Categories

     @C COMPNIES: Companies

     @C DATEDFDL: Deferred Delivery Date

     @C TIMEDFDL: Deferred Delivery Time

     @C DELAFSUB: Delete After Submit

     @C DATEEXP: Expiry Date

     @C TIMEEXP: Expiry Time

     @MULTILINE HTMLBODY: HTML Message Body

     @C IMPRTNCE: Importance

     @C MSGCLASS: Message Class

     @C MSGMLG: Message Mileage

     @C NOAGING: No Aging

     @C DLVRPTRQ: Originator Delivery Report Requested

     @C OLINTVER: Outlook Internal Version

     @C OLVER: Outlook Version

     @C RDRECREQ: Read Receipt Requested

     @C RCVBYNAM: Received By Name

     @C RCVBENAM: Received On Behalf Of Name

     @C RCPREPRO: Recipient Reassignment Prohibited

     @MULTILINE REPRECIP: Reply Recipients

     @C SAVED: Saved

     @C SENSI: Sensitivity

     @C SENT: Sent

     @C SNTBENAM: Sent On Behalf Of Name

     @C SUBMTTED: Submitted

     @READ: Message read y/n?

     @C UNREAD: UnRead

     @C VOTOPT: Voting Options

     @C VOTRESP: Voting Response

     @C GLBLPRM: 'Yes' if this is the first occurance of this item in the global table.

     @C GLBLCNT: Count of occurances of this item in the Global Project table.

     @C SRCCUSTOD: Source Custodian.  Obtained from third to last directory name in source file path.

     @C SRCBOX: Source Box.  Obtained from second to last directory name in source file path.

     @C SRCFOLDER: Source Folder.  Obtained from last directory name in source file path.

     @C DATEPRNT: Source Document Last Print Date

     @C TIMEPRNT: Source Document Last Print Time

Concordance Export File Format:

Source documents are to be generated into single page TIFF files, single page TXT files, and a meta-data file.

Meta data and the single page TXT file are then combined to create a single DAT file per page for import.  Each data file is assigned a unique ID (Bates Number). 

Concordance imports all the DAT files from a given directory into the database.

The list of image files is listed in the .LOG file.  There is a unique TIFF file for each DAT file crated.  The image files are imported all at the same time through the Opticom Viewer interface.

Detailed Requirements:

Create the following files:

1.       multi-line .DAT files containing information for each page of each file.

2.       multi-line .LOG file containing a list of tiff images (OPTICOM Load images) that are associated with each defined page.

.DAT File Description:

The .DAT file contains file meta data, with the exported text as the last field.

Export fields for the data are defined in the 'export fields' section (below).

Sample data are also provided in the 'sample data' data section (below).

The .DAT file contains a single comma delineated list of fields.

But... Rather than using the common notation

    "field1","field2","field3"

notation, fields are delineated by substituting decimal 20 for ',', and decimal 254 for '"'. 

Decimal 20 and decimal 254 are explicitly defined to NOT occur in any imported text.

Newline values in the imported text are modified to be decimal 174.

.DAT File Sample:

The sample data:

   to:Ken Davies

   from:Sales

   Subject:The year ahead

   Text: A long discussion about the year ahead.

      Looking forward to your comments.

      Call me if you want to do lunch.

 

becomes:

   (245)Ken Davies(245)(20)(245)Sales(245)(20)(245)The year ahead(245)(20)(245)A long discussion about the year ahead(174)      Looking forward to your comments.(174)      Call me if you want to do lunch.(174)(245)

 

where the values in brackets (245) (20) (174) are decimal byte values in the data stream.

The data fields in this example are pre-defined to be "to","from","subject","text".

.DAT file fields:

Field Name

Sample Data

Populated

STARTPAGE

00010002

YES

ENDPAGE

00010002

YES

DATE

20041219

YES [Date Accessed/Sent Date]

DOCTYPE

Doc extension

YES [SourceFile Ext]

TITLE

Untitled

YES [Title from MetaData]

AUTHOR

Simmons;RC / McMurrian;HP

YES [Author/From:from MetaData]

AUTHORORG

Cole Evans and Peterson

NO

RECIPIENT

McCorman;SL

YES [To: from MetaData]

RECIPORG

Cowco

NO

CC

“”

YES [Cc: from MetaData

SUMMARY

“”

NO

CONDITION

“”

NO

ATTACH_TYPE

“”

NO

LEAD_DOC

“”

NO

ATTACHMENTS

“”

NO

PRIMARYDATE

19831220

YES [Date Created]

PAGES

3

YES

CCORG

“”

NO

ATT

“”

NO

ATTORG

“”

NO

OCR1

*** 0010002 **** …. contents of page…

NO

OCR2

“”

NO

OCR3

“”

NO

OCR4

“”

NO

OCR5

“”

NO

RENUMBER

161

NO

ISSUE

“”

NO

DISC_STATUS

“”

NO

SOURCE_FILE_NAME

C:\fname.doc

YES

SOURCE_FILE_SIZE

104456

YES

 

Hyperlinked Source documents:

XSPATHNAME   .\SOURCE\TST00002.msg

TIFF file destination:

XIPATHNAME    OUTPUT

XIFILENAME       TST00002.tif

.LOG file Sample:

00010001,Data,E:\DATABASE\COWCO\001\00010001.TIF,Y,,,

00010002,Data,E:\DATABASE\COWCO\001\00010002.TIF,,,,

00010003,Data,E:\DATABASE\COWCO\001\00010003.TIF,,,,

00010004,Data,E:\DATABASE\COWCO\001\00010004.TIF,Y,,,

00010005,Data,E:\DATABASE\COWCO\001\00010005.TIF,,,,

00010006,Data,E:\DATABASE\COWCO\001\00010006.TIF,,,,

00010007,Data,E:\DATABASE\COWCO\001\00010007.TIF,Y,,,

00010008,Data,E:\DATABASE\COWCO\001\00010008.TIF,Y,,,

00010009,Data,E:\DATABASE\COWCO\001\00010009.TIF,Y,,,

00010010,Data,E:\DATABASE\COWCO\001\00010010.TIF,,,,

00010011,Data,E:\DATABASE\COWCO\001\00010011.TIF,,,,

 

.LOG file fields:

Field  1:  "Production Number" -- This is a text field which contains the "Production" or "Control" or Bates number for that page of the document.  It is a unique value and is the load file "key".

Field  2:  "Volume ID" -- This is also a text field.  It should contain the Volume ID of the CD on which the images are delivered.

Field  3:  "Full DOS Path" -- This is a text field containing the full DOS path to the image file.

Field  4:  "Document Break" -- This is a text field.  If this particular image is the first page of a document, this field should contain a "Y" (Yes).

Field  5:  "Folder Break" -- This is a text field.  It's fairly rarely used but if used is intended to work just like Document Break, i.e. it would contain a "Y" if this is the first page of a new folder

Field  6:  "Box Break" -- This is a text field.  Also rarely used but intended to work like Doc and Folder Break...would contain a "Y" if this is the first page of a new box.

Field  7:  "Pages" -- This is a text field although it contains numeric data.  If this is the first page of a new document, "Document Break" will contain a "Y" and this field will show the number of pages for the document.  (This field is a "nice to have" as after the images are loaded, Opticon will calculate the number of pages based on the database.)

Contents of import directory for an 11 page file:

00010001.dat

00010001.tif

00010002.dat

00010002.tif

00010003.dat

00010003.tif

00010004.dat

00010004.tif

00010005.dat

00010005.tif

00010006.dat

00010006.tif

00010007.dat

00010007.tif

00010008.dat

00010008.tif

00010009.dat

00010009.tif

00010010.dat

00010010.tif

00010011.dat

00010011.tif

images.opt

IPRO LFP Export File Format:

(source: http://www.ediscovery.org/litigation-support/technical-standards_4_02_IPRO.htm)

To convert from Opticon format, download iConvert from http://www.IproCorp.com. (free)

Example 1: Single Page .TIF files

IM,MSC00014,D,0,@MSC001;IMAGES\ 00\ 00;MSC00014.TIF;2

IM,MSC00015,,0,@MSC001;IMAGES\ 00\ 00;MSC00015.TIF;2

IM,MSC00016,D,0,@MSC001;IMAGES\ 00\ 00;MSC00016.TIF;2

IM,MSC00017,,0,@MSC001;IMAGES\ 00\ 00;MSC00017.TIF;2

 

Example  2: Multi Page .TIF file

IM,MSC00014,D,1,@MSC001;IMAGES\ 00\ 00;MSC00014.TIF;2

IM,MSC00015,,2,@MSC001;IMAGES\ 00\ 00;MSC00014.TIF;2

IM,MSC00016,D,1,@MSC001;IMAGES\ 00\ 00;MSC00016.TIF;2

 

Note: Because the files are multi-page, the entire bates range (or image key range) must point to the same .TIF file. As example, MSC00014 contains both "14" and "15". Therefore, to view page 15, the computer must display MSC00014.TIF.

 The following provides a breakdown of the fields:

IM

 Import code identifier (Importing New Page/Image database record)

 

MSC00014

 The image key/document id number

 

D

 Document designation; only designate the first page of each document.

 

0

 Offset to the Tiff file.  Always 0 for single page tiff files.  When creating Multi-Page Tiff files, this number will increment for the pages within the file.  (If there is an 11 page document, the offset would start at 1 and end at 11 and the next tiff file would start over at 1.

 

@MDEMO

 CD volume name

 

IMAGES\00\00

 Directory path on the CD for the image

 

MSC00014.TIF

 Filename for the image.

 

;2

 Tells IPRO the Types* of image file, e.g. tiff, PDF

 

*Supported Image Types and their specification in the LFP file are:

1.       Type 1 is for IPRO Tech image from DOS-Based version, still supported (.IMG)

2.       Type 2 is for Standard single and multiple page black & white or color TIFF (.TIF)

3.       Type 3 is for IPRO Tech stacked TIFF (.STF)

4.       Type 4 is for Color image (.BMP, .PCX, .JPEG or .PNG)

5.       Type 5 is for black & white .PDF

6.       Type 6 is for Color .PDF

7.       Type 7 is to Auto-detect the .PDF type, e.g. Color or Black & White

RINGTAIL Support

Exporting to RingTail:

1.       Export to Ringtail from Discovery Assistant.

2.       Load the CSV file into the Ringtail Flat File converter to convert to MDB, then run it through the Validator.

Reference Docs: (these seem to overlap)

   CaseBook_Data_Standards_Manual_v602r5.pdf

   Ringtail Legal Data Standards Manual v2[1].1.2.pdf

 

Tools Provided by FTI Ringtail

1.       Data Standards Manual: outlines the Ringtail load file

2.       Flat File Converter: a tool used to convert a flat-file database to a Ringtail load file; and

3.       Validator: a tool used to verify the integrity of a Ringtail load file. 

                NO load file should be loaded to Ringtail without first being run through this free tool.

To access these free tools, browse to our support website http://support.ftiringtail.com .  From there, click the button to LOGIN AS GUEST, then access the Downloads tab.

Ringtail Flat file converter Notes:

The validator does not understand Office 2007.  You need to run on an Office 2003 machine. Time fields are not supported in Ringtail.  Any time fields should be set to TEXT. Boolean fields are TEXT.  We don't currently convert to T/F.

MAIN tab:

 

ImageMAKER field name Ringtail

----------------------------------

Main_Document_ID Document_ID User assigned Document ID

Main_Document_Date Document_Date   Source document create date, otherwise received date, otherwise sent date (in that order)

Main_Document_Time ???  Source document create time, otherwise received time, otherwise sent time (in that order)

Main_Document_Type Document_Type Source file type name

Main_Title_docTitle Title  Document Title

Main_Title_DocSubject Descripiton Email/Document subject

Main_Host_Reference Host_Reference Export file title of parent item

0   Estimated

 

Notes:

   use "0" for Estimated (all dates are exact)

   There are no time fields in Ringtail

 

 

PAGES tab:

 

ImageMAKER field name Ringtail

----------------------------------

Pages_Page_Start Page_Start Export file title of first page

Pages_Page_End  Page_End Export file title of last page

Pages_Image_File_Name   ??              Export file name [image] with extension.

.tif   Page_Extension 

Pages_Num_Pages  Total_Number_of_Pages

???   Page_Range

 

Notes:

   choose 'Use Page Range' (not 'Use Image_File_Name') when matching fields.

   use ".tif" for Page_Extension.

 

   Missing:

   no values for Page_Range.  Suggest using Pages_Num_Pages.

 

 

PARTIES tab:

 

ImageMAKER field name         Ringtail  type: to, from, between, cc, bcc, userDefined

----------------------------------

Parties_People_From_Author Document author

Parties_People_From_LastAuthor Last Document author

Parties_People_From_Sender Email From address

Parties_People_To  Email To address

Parties_People_CC  Carbon Copy recipient

Parties_People_BCC  Blind Carbon Copy recipient

 

Notes:

   assigned to 'people'

   one to many

   delimiter is the ';' character (semicolon).

   no concatenate string

 

 

 

LEVELS tab:

 

ImageMAKER field name Ringtail

----------------------------------

Levels_Levels Fields Level Fields [1-10] Export file path (image)

 

 

 

EXTRAS tab:

 

ImageMAKER field name Ringtail   (BOOL DATE NUMB PICK TEXT MEMO UTEXT UMEMO)

----------------------------------

Extras_ALTRCPALLOW TEXT(T/F) Alternate Recipient Allowed

Extras_APPLICATION_NAME TEXT   Name of creating application

Extras_ATTACHLIST TEXT   List of export file titles of attachments

Extras_ATTACHMENTRANGE TEXT   Range of export file titles of attachments

Extras_ATTACHMENTSCOUNT NUMB   Count of attachments

Extras_ATTACHTITLE TEXT   File title of attachment

Extras_AUTOFWD  TEXT(T/F) Auto Forwarded

Extras_BATESBEG  TEXT   Beginning Bates number

Extras_BATESBEGGROUP TEXT   Beginning Bates number for group. e.g. an email and it's attachments or a zip file and it's contents

Extras_BATESEND  TEXT   Ending Bates number

Extras_BATESENDGROUP TEXT   Ending Bates number for group.  e.g. an email and it's attachments or a zip file and it's contents

Extras_BATESGROUPRANGE TEXT   Range of Bates Numbers that belong as a group.  e.g. an email and it's attachments or a zip file and it's contents

Extras_BEGATTACH TEXT   Export file title of first page of group.  e.g. an email and it's attachments or a zip file and it's contents

Extras_BILLINFO  TEXT   Billing Information

Extras_BODY  MEMO   Body of email

Extras_CATEGOR  TEXT   Categories

Extras_CNVINDEX  TEXT   Conversation Index

Extras_CNVTOPIC  TEXT   Conversation Topic

Extras_COMPANIES TEXT   Companies

Extras_DACOMMENT TEXT   Discovery Assistant PassThru comment

Extras_DEFDLVDATE TEXT(T/F) Deferred Delivery Date

Extras_DEFDLVTIME TEXT(T/F) Deferred Delivery Time

Extras_DELAFTSUB TEXT(T/F) Delete After Submit

Extras_DLVRPTREQ TEXT(T/F) Originator Delivery Report Requested

Extras_DOCTEXT  MEMO   Document Text

Extras_DUPPATHS  TEXT   Source document paths of duplicate items

Extras_ENDATTACH TEXT   Export file title of last page of group.  e.g. an email and it's attachments or a zip file and it's contents

Extras_EXPIRYDATE DATE   Expiry Date

Extras_EXPIRYTIME TEXT(HMS) Expiry Time

Extras_EXPORTDATE DATE   Export start date

Extras_EXPORTEDSOURCEFILEPATHNAME TEXT Exported source file path

Extras_EXPORTTIME TEXT(HMS) Export start time

Extras_FILEACCESSDATE DATE   Source document Last Access Date

Extras_FILEACCESSTIME TEXT(HMS) Source document Last Access Time

Extras_FILECREATIONDATE DATE   Source document creation date

Extras_FILECREATIONTIME    Source document creation time

Extras_FILEDISPLAYNAME TEXT   Source file title

Extras_FILEEXTENSION TEXT   Source file extension

Extras_FILEMODIFYDATE DATE   Source document modified date

Extras_FILEMODIFYTIME TEXT(HMS) Source document modified time

Extras_FILENAME  TEXT   Source file name (including extension)

Extras_FILEPATHNAME TEXT   Source file path

Extras_FILEPRINTDATE DATE   Source document Last Print Date

Extras_FILEPRINTTIME TEXT(HMS) Source document Last Print Time

Extras_GLOBALCOUNT NUMB   Count of occurrences of this item in the Global Project table.

Extras_GLOBALPRIMARY TEXT(T/F) 'Yes' if this is the first occurrence of this item in the global table.

Extras_GROUPRANGE TEXT   Range of export file titles that belong as a group.  e.g. an email and it's attachments or a zip file and it's contents

Extras_HASHCODE  TEXT   MD5 hash code value for source document

Extras_HTMLBODY  MEMO   HTML Message Body

Extras_IMPORTANCE TEXT   Importance

Extras_INETHEADER TEXT   Internet Header

Extras_ISDUP  TEXT(HMS) True/False is duplicate

Extras_ITEMID  TEXT   Discovery Assistant file ID

Extras_ITEMINDEX NUMB   Item Index

Extras_LASTSAVEDDATE DATE   Source document Last Saved date

Extras_LASTSAVEDTIME TEXT(HMS) Source document Last Saved time

Extras_MSGCLASS  TEXT   Message Class

Extras_MSGID  TEXT   Email message ID

Extras_MSGMLG  TEXT   Message Mileage

Extras_NOAGING  TEXT(T/F) No Aging

Extras_OBJECTSIZE NUMB      Source file size on disk

Extras_OLINTVER  TEXT      Outlook Internal Version

Extras_OLVER  TEXT      Outlook Version

Extras_PAGECOUNT        NUMB   Number of pages in TIFF file

Extras_PARENT  TEXT   Email parent folder name

Extras_PARENTCREATIONDATE  DATE   Parent document create date

Extras_PARENTCREATIONTIME  TEXT(HMS) Parent document create time

Extras_PARENTMODIFYDATE DATE   Parent document modified date

Extras_PARENTMODIFYTIME TEXT(HMS) Parent document modified time

Extras_PARENTRECEIVEDDATE  DATE   Parent email received date

Extras_PARENTRECEIVEDTIME  TEXT(HMS) Parent email received time

Extras_PARENTSENTDATE DATE   Parent email sent date

Extras_PARENTSENTTIME TEXT(HMS) Parent email sent time

Extras_RCPREASSPROHIB BOOL   Recipient Reassignment Prohibited

Extras_RCVBYNAME TEXT   Received By Name

Extras_RCVONBEHALFNAME TEXT   Received On Behalf Of Name

Extras_RDRECREQ  TEXT(T/F) Read Receipt Requested

Extras_READ  TEXT(Y/N) Message read y/n?

Extras_RECEIVEDDATE DATE   Email received date

Extras_RECEIVEDTIME TEXT(HMS) Email received time

Extras_REPLRECIPS TEXT   Reply Recipients

Extras_REVNUM  TEXT   Last Document author

Extras_SAVED  BOOL   Saved

Extras_SENSITIVITY TEXT   Sensitivity

Extras_SENT  TEXT(T/F) Sent

Extras_SENTDATE  DATE   Email sent date

Extras_SENTTIME  TEXT(HMS) Email sent time

Extras_SHORTFILETITLE TEXT   Short file title

Extras_SNTONBEHALFNAME TEXT   Sent On Behalf Of Name

Extras_SOURCELABEL TEXT   Source volume label

Extras_SOURCEPAGECOUNT TEXT   Source document page count

Extras_SRCBOX  TEXT   Source Box.  Obtained from second to last directory name in source file path.

Extras_SRCCUSTOD TEXT   Source Custodian.  Obtained from third to last directory name in source file path.

Extras_SRCFOLDER TEXT   Source Folder.  Obtained from last directory name in source file path.

Extras_STOREID  TEXT   Message store identifier

Extras_STORENAME TEXT   Message store source file name

Extras_SUBMITTED TEXT(T/F) Submitted

Extras_UNREAD  TEXT(T/F) UnRead

Extras_VOTINGOPT TEXT   Voting Options

Extras_VOTINGRESP TEXT   Voting Response

 

 

Notes:

   All fields are one-to-one

            

dtSearch: Notes on Searching using dtSearch

One of the problems with using dtSearch is it doesn't do NSF. Second problem is how to extract the responsive files from a PST while keeping all the metadata, and parent/child relationships intact.

Current solution is to:

1.       Load files into discovery, use the COPY button to write files back out numbered by FileID, dtSearch the fileset.

2.       Use the 'mark' and 'select' buttons

3.       Use the 'user field' button to keep track of what search strings were used to find these files.

OR

1.       Convert all the files

2.       Search the 'projectname.cnvt' directory TXT files

3.       Use the 'mark', 'select' and 'user field' buttons to track responsive files.

To Download and install dtSearch:

http://www.dtsearch.com/download.html   file: dtSearchEval750.exe

cost: $200 to buy, 1 month free evaluation.

Quick guide to converting and searching:

1.       SETUP: import files into Discover Assistant.

2.       SEARCH SOURCE: dtSearch the source files.

3.       SEARCH TIFF/TEXT: dtSearch the converted project files.

4.       EXPORT: load dtSearch selection set, and export msg files that contain search items.

 

 

SETUP: Import files into Discovery Assistant

a.       Create a Discovery Assistant project, and add in one or more NSF/PST/Folder directories.  Contents of imported email and documents are enumerated.   Global and Local duplicates are identified at this point.

b.      If you want to search source files, you can do so by exporting a 'copy' of each  file (using the 'Copy' button) to a separate search directory.   Copied files are identified by their fileID.  

c.       If you want to search converted files, you can do so by queuing the files  for conversion, then converting. 

d.      When converting files, user options should be:  skip local duplicates,  don't skip children unless parent is skipped.

e.      [will remove this restriction at a later date]

On completion of conversion, remove the NSF and PST records from the converted tab.  Can queue these for re-conversion to get them out of the way.  (Note: Don't delete  from project).  

f.        if your files contain images, and you want text from those images, select  OCR, and in the dialog, select 'OCR only those items without text'. 

Note: requires that 'Microsoft Office Imaging' 2003 or 2007 is installed.  We use the Microsoft provided OCR engine to do the text extraction.  (Can install this from the Office installation disks - under Tools).   Re-save project.

g.       sort on FileID, assign Document ID's  (string: %COUNT1%) and save the project.

h.      f your files contain spreadsheets, there is a good chance there are blank pages that should be removed.  To remove blank pages, select: DeBlank.  Re-save project.

SEARCH SOURCE: dtSearch the source files.

a.       From the All Files tab, select 'copy' All.  Select a destination directory using the browse button.  Best to choose somewhere that has a lot of available space.

 Copied files are named same as the FileId, with the proper extension.

b.       Use dtSearch to search the source files.  See comments below (Search Tiff/Text) for how to proceed.  Basic idea is to generate a list of files to be queued for  conversion, without having to convert all the other files.

SEARCH TIFF/TEXT: dtSearch the converted project files.

a.        use dtSearch to index the project.CNVT directory - *.TXT files only. (need to exclude .mtf, .tif, .log files)

b.       Enter one or more search terms in DT_Search to create individual search results.         enable stemming, phonic spelling, and fuzzy search to find similar words. (can check results using Browse Words button)

For individual search terms:  save each result as a project_searchterm.CSV.

For all search terms:  save 'all strings' search result as project_all.CSV.

Save search results by choosing "File / Save As" - choose CSV format.

 Generate a report by choosing "Search / search report".

c.       When done, open the project_all.CSV file, select Column E (display name), and copy to clipboard

d.      Open Notepad, paste the clipboard into Notepad, then do a search and replace:

[abc] first 3 letters replaced with nothing [].

[F.tif.txt] replaced with nothing [].

delete header line, and blank line at end of file.

                                Save as project_all.txt in the project

Notes on using dtSearch:

dtSearch evaluation copy can be downloaded from: http://www.dtsearch.com/download.html

Stemming: searches grammatical variations of the words in your search request.  For example, with stemming enabled a search for apply would also find applies.

Phonic: search finds words that sound similar to words in your request, like Smith and Smythe.

Fuzzy search: sifts through scanning and typographical errors.  Fuzziness adjusts from 1 to 10 depending on the degree of misspellings.  (Try starting with 3.)

Synonym search: tells dtSearch to use a thesaurus to find synonyms of words in your search request. 

dtSearch provides three ways to perform synonym searching:      

§  Check the User thesaurus box to find synonyms that you have defined in your own thesaurus.     

§  Check the WordNet thesaurus box to find synonyms using the WordNet concept network included with dtSearch.     

§  Check the WordNet related words box to find related words from the WordNet concept network.  

EXPORT: Load dtSearch selection set, and export files:

a.       Go back to Discovery Assistant, same project, go to the converted tab,  Select 'Select / by FileIDList', and select the project_all.txt file.

b.      From the Converted tab, do the following:

Select / Parent of selected items

                                                Select / Children of selected items

                                                Select Mark / Selected

You can choose 'User Fields' to assign a text string to selected items.  One use for this feature is to define what search term was used to select the record.

Save project.

This ensures that we are exporting any file that matches a string (has the string in it) PLUS it's parent, PLUS any siblings of that file.

At any point from now on, you can choose 'Select / Marked Items' and get back the items to export as a selection set.

At any point, you can also 'sort' on the left hand column (marked) to see  what items are marked.

If for what ever reason you have incorrectly marked items, and want to start over, choose Select / Marked Items, then, Toggle Mark / Selected.  This will clear all marked items.

c.        To export the selected items:

Choose 'Select / Marked Items', sort on Document ID, and then Export / Selected.

Naming convention is "%ProjectID%.%DOCID%.%PAGE%" 

Other settings are:

Destination - location that files are going to be exported to.

Format - choose Summation DII Class I

Note: Press options to choose metadata fields to export Directory Structure - flat is recommended

Other files to include - select Text files.

Whew!  You are now done....

Internal notes:  ImageMAKER optimizations.

   We will be making code changes to remove the following steps:

                Load: item 3:  will not have to remove NSF or PST

Search: item 3 and 4: will not have to create a TXT file (will use CSV directly)

Export: item 2: will simplify the selection and marking functionality.

                Next step will be to integrate dtSearch engine directly into Discovery Assistant.

Embedded Files: XML, PDF, and OLE linking and Embedded files support:

Discovery Assistant extracts embedded files from OLE containers (DOC, XLS, PPT) ML containers (DOCX, XLSX, PPTX), RTF files, and in development:PDF files. (Early January 2008).

Supported Microsoft Office formats include: Office 95, Office 97, Office 2000, Office XP, Office 2003, and Office 2007.

Linked files are noted (in the warnings), but not extracted or enumerated.

Basic extraction logic is as follows:

·         determine if the file is an XML or OLE container type, RTF, or PDF.

·         do a quick check to see if there are embedded files.

·         if there are embedded files, attempt to extract the files from the native document.

·         if there is a failure condition, convert the document to Office 2007 format (zipped XML) using the Office 2007 migration tool, and the re-attempt to extract.

Discovery Assistant uses two tools provided by Microsoft to help with extraction:

Microsoft Office 2007 Compatibility Pack: http://www.microsoft.com/downloads/details.aspx?FamilyId=941b3470-3ae9-4aee-8f43-c6bb74cd1466&displaylang=en

Microsoft Office 2007 Migration Tool: http://www.microsoft.com/downloads/details.aspx?familyid=13580cd7-a8bc-40ef-8281-dd2c325a5a81&displaylang=en

These tools must be installed in order for everything to work correctly. The Options / Embedded tab contains links to both of these tools.

When downloading and installing the MigrationPlanningManager.exe tool, you need to specify an installation directory.  Then, after installation,  from the Options / embedded tab / Settings, specify the installation directory.

Other notes:

In the Options/ Embedded / Settings tab, you can also specify the prefix used for all extracted files.  Current default is EMB_1, EMB_2, and EMB_3 (represents different types of embedded files).  After loading in your file set into Discovery Assistant, if you sort on name, you should be able to group all the extracted embedded files.

You can conditionally turn file handling off for certain file types by selecting the file type from the Settings dialog, then hit the modify button.

Speed / Size of files.

For optimum speed and size, best to convert everything to B&W G4 TIFF.

When exporting to different file types, here are some of the speed/size metrics.

46,462 pages, 4592 tiff files, exported as:

TIFF (G4)

1.1 GB

20 minutes (doesn’t require reading/writing the files)

Scanned PDF

1.5 GB

2 hours (uses 8 bit Flat compression)

24 bit LZW

3.0 GB

4.5 hours

Go to the Previous Page Page 3
For more information

ImageMAKER Development Inc.
416 Sixth Street, Suite 102
New Westminster, BC
Canada V3L 3B2
http://www.imgmaker.com
Copyright © 2004-2008
To contact us from overseas:

Sales: 1.604.525.2170
Local (Pacific) time: GMT-8
ImageMAKER Development Inc.

Sales: toll free (866) 525-2170
or (604) 525-2170
Support: (604) 525-2108
Fax: (604) 520-0029
Email: sales@imgmaker.com
support@imgmaker.com