How to Search a PST file
Discovery Assistant is an eDiscovery application that can
identify keyword search terms contained in many file types, including Microsoft Word files, Excel
spreadsheets, plain text, web pages, and email messages among others. In this article, we will examine
searching email messages, and PST files in particular PST files. Properly using keyword
search terms is a powerful way to identify relevant documents contained in a large set of
unstructured documents and email folders.
Discovery Assistant Search includes the ability to search
through compound nested emails, attachments, scanned documents (PDF), loose electronic documents,
zip files, faxed images (TIFF) and metadata.
Matching document may also have multiple parents, siblings, and child relationships
to other documents that form the family group. It’s important that the family relationships are
maintained during the document search and identification process.
Documents that match one or more search terms can be produced either as native
documents, or formatted as PDF or TIFF images, and exported for further review along with extracted
text, and the embedded metadata.
Preparing a PST file for Search:
To prepare a PST file for searching, perform the following tasks with
Discovery Assistant:
- Load the PST file into Discovery Assistant.
- This will extract email and attachments from within a PST file.
- Additionally, it will extract zip contents and embedded objects (OLE).
- Identify and remove duplicate email messages.
- Perform Optical Character Recognition (OCR) on any scanned documents.
- Extract text from electronic documents.
- Index the extracted text.
Searchable electronic document types include PST, MSG, EML, Lotus NSF, Microsoft
Office documents, PDF, HTML, TXT, Scanned documents, and other common electronically stored file
formats.
Searching a PST file for Keywords:
Discovery Assistant includes a powerful search engine
that is capable of indexing and searching Terabytes of data.
Once the contents of the PST file have been indexed, searching takes less than a
second to return all matching documents.
Supported Features include:
- Batch Searching (multiple keywords).
- Compound Boolean search (AND / OR / NOT).
- Proximity search (must be within (e.g.) 2 words).
- Fuzzy Search (sounds like, but not exactly the same as).
- WildCard search (e.g. all files that start with MULTI*).
- Metadata Search:
- Limited to a certain date range.
- Contents of Subject field.
- Contents of To/From/CC/BCC field.
- Many other metadata values extracted at load time.
Example Uses for Keyword Search:
- Respond to a legal request for files that contain one or more search terms.
- Respond to a freedom of information request
on a specific topic.
- Initial preparation of relevant documents for litigation review.
- Context searching, near duplicate identification, email thread identification.
- Data culling.
- Pre-discovery review.
- Search for privileged and responsive documents.
- Evidence collection.
Example Keyword Search Requests:
apple and pear |
both words must be present |
apple or pear |
either word can be present |
apple w/5 pear |
apple must occur within 5 words of pear |
apple not w/12 pear |
apple must occur, but not within 12 words of pear |
apple and not pear |
only apple must be present |
subject contains smith |
the field subject must contain smith |
apple w/5 xfirstword |
apple must occur in the first five words |
apple w/5 xlastword |
apple must occur in the last five words |
Search terms may include the following special characters:
Tech Specs:
- Speed (on typical hardware):
Loading |
1 Gig per hour |
OCR Preparation |
1 second per page |
OCR'ing |
1 second per page |
Building the index |
1 Gig per hour |
Searching |
1 second per term |
- Scalability:
- Ability to scale to multiple projects across multiple machines.
- Support for Unicode, UTF-8, ANSII:
- Index text is first converted to UTF-8 before the words are indexed. This way, the UNICODE, ANSII, and UTF-8 formatted files will all return the same search results.
|