Proximity searches - A better understanding

What is a Proximity Search?

A Proximity search is a search syntax crafted to find items based on words that are within a specified maximum distance from each other in the item’s text. For example, to find all items with the words `desktop` and `application` within 10 words of each other, you would use the following syntax:

  • "desktop application"~10
Proximity searches differ from phrase searches in that the order of the search terms does not matter. Both of the following passages will be respondent to the proximity search above:

  • "You must turn on your desktop computer before you can open an application."
  • "I have copied the shortcut for the application onto the desktop."

Here is an example of a proximity search being used in Intella.


To illustrate how proximity search syntax works, consider the example of searching for "vound connect intella"~3
This search would match the following text strings:
  • "vound intella connect" (with 0 words in between)
  • "vound extra words here connect intella" (with 3 words in between)
  • "vound some words connect separated intella" (with 3 words in between)
  • "intella vound connect" (with 0 words in between)
However, it would not match the text strings:
  • "vound too many extra words here connect intella" (with 5 words in between)
  • "vound some words connect further separated intella" (with 4 words in between)
This example demonstrates how proximity search syntax can be used to find items based on words that are within a specified maximum distance from each other in the item's text, regardless of the order in which they appear.

Using the Correct Proximity Search Syntax

When provided with customer-created search syntax, we often find it to be complex, lengthy, and prone to errors. Below, we discuss three examples of incorrect syntax and demonstrate how to improve them.

Example 1

Incorrect syntax:
  • (Baxter Jason) ~20 (article) OR (paper) OR (presentation) OR (public) OR (report)
Issues:
  1. Search terms are not encased in double quotes
  2. The number of words to be within (~20) is not at the end of the proximity search syntax
Improved syntax:
  • "(Baxter OR Jason) (article OR paper OR presentation OR public OR report)"~20
Here, we group the keywords by placing them in parentheses, separating the terms with the OR operator, and ensuring all terms are encased in double quotes followed by the number of words that the terms must be within.


Example 2

Incorrect syntax:
  • "national OR fire OR service"~30 (truck) OR (department)
Issue:
  1. Double quotes do not encase all search terms
Improved syntax:
  • "(national OR fire OR service) (truck OR department)"~30
Again, we use parentheses to group the search terms and ensure all terms are encased in double quotes.


Example 3

Incorrect syntax:
  • reading w2 glasses
Issue:
  1. The 'w' (within) and 'n' (near) operators are not supported in Intella
Valid syntax for Intella should follow the format shown in the previous examples.

Using Phrases within Proximity Searches

Starting from version 2.5.0, Intella supports the use of phrases within proximity searches. Phrases can be used by placing single quotes ' around the phrase. Here are two examples:

Example 1

Search for evidence of suspected stolen documents, finding the phrase 'chemical formula' within 50 words of either 'copied' OR 'attached':
  • "'chemical formula' (copied OR attached)"~50


Example 2

Search for a fraudulent invoice, looking for the names of two firms:
  • "'abc engineering' 'bobs construction'"~20


Limitations and Workarounds

Constructing complex proximity searches with 40+ words or extensive use of wildcards is not recommended, as it can lead to poor performance and troubleshooting difficulties. To manage complex searches more efficiently, consider breaking down the search string or using keyword lists.

Breaking down the search string

A complex search string can be split into shorter proximity search strings, which can then be placed into a keyword list:

  • "Baxter article"~20
  • "Baxter paper"~20
  • "Baxter presentation"~20
  • "Baxter public"~20
  • "Baxter report"~20
Intella can process this list efficiently compared to a single, large complex search string.

Using keyword lists

Using keyword lists can reduce the number of items your proximity search needs to search across. Create two keyword lists, one for each group in the proximity search, and run them to tag the overlapping cluster: 

Keyword list 1      Keyword list 2
Baxter                   article
Jason                     paper
                              presentation
                              public
                              report 

Next, run the two keyword lists and Tag the overlapping cluster. This cluster will contain the items that have search terms from both keyword lists. Set this Tag as a Require search and run the proximity search. This provides faster searching as you are not searching over the entire dataset. However, be aware that hit highlighting can still be slow or have issues if the proximity search is complex and contains many wildcards.

Conclusion

Understanding the correct syntax for proximity searches in Intella is critical for finding relevant evidence. By following the guidelines and best practices shared in this guide, you can create efficient and accurate proximity searches to enhance your investigations. Remember to update to the latest version of Intella to take advantage of all available features and consider using keyword lists and breaking down complex search strings for optimal performance.

Updated August 2023