I had a query from a user about Text not retrieving the expected data for a given bit of Text

The Text field (‘Solution’) contains:

1. What type of OS can be used with the xxx?
2. What are the minimum requirements of a PC for use with xxx?
3. What versions of browser can be used?
4. What modem is recommended in PPP (Point-to-Point Protocol)?
Solution
1. Recommended Operating Systems:
· Windows 95/98
· Windows NT 4.0 with service pack 5 or later
· Windows 2000
Note:The Mac OS is not supported because, Mac version browsers do not have some key functions
(related to server push).
2. PC Requirements:
· Minimum requirements: 400MHz Pentium III with 64MB RAM
3. Recommended Browsers:
· Internet Explore 5.0 (or higher)
Note:The xxx does not require a special plug-in because the picture images use JPEG compression. However, some versions of Internet Explorer do not support 'Server Push' functions and the 'Active X' component will need to be installed . supplied with the xxx.
4. The xxx supports the following modems:
· 3Com 56k Faxmodem
· US Robotics Sportster Flash
· Ericsson K56 DTV
· Diamond Supra Express 56e PRO

The user found that:

  • Search ‘What are the minimum requirements of a PC for use with’ finds document
  • Search ‘What type of OS can be used with the’ finds document
  • Search ‘What type os’ doesn’t
  • Search ‘What type OS’ doesn’t
  • Search ‘What os’ does
  • Search ‘What type of OS can be used with the xxx-yyy’ doesn’t
  • Search ‘What type of OS can be used with the xxx yyy’ does

As words are ored together these should all find the relevant document

Just going to paste my response in ‘as is’ for the time being:

First there’s a couple of points which I haven’t necessarily got chapter and verse on, but (currently) believe are true. These are related to:

  • or-ing
  • stopwords

Or-ing

To get the search terms to be ‘OR’-ed within the query I think you have to use the '' syntax. Spaces on there own don’t do it. The ‘$’ enables stemming, I think.

So running:

select document_id, SOLUTION from sc_solutions where contains(solution, ‘&1’) > 0 and document_id = 15526

SQL> / Enter value for 1: Flash Sportster old 4: contains(solution, ‘&1’) > 0 new 4: contains(solution, ‘Flash Sportster’) > 0

no rows selected

SQL> / Enter value for 1: FlashSportster old 4: contains(solution, ‘&1’) > 0 new 4: contains(solution, ‘FlashSportster’) > 0

15526

  1. What type of OS can be used with the xxx?
  2. What are the minimum requirements of a PC for use with xxx?
  3. What versions of browser can be used?
  4. What modem is recommended in

Note: not hugely relevant but I was tyring to prove this using ‘requirements minimum’. It appeared to be or-ing it in any case, but eventually noticed the text in green below. I think it was matching this.

Stoplist words

The way Oracle seems to process stopwords is, I reckon, just weird. If you have a stoplist in the search string in effect it turns it into a wild card word matching any other word (stopword or not).

So:

SQL> / Enter value for 1: requirements be what PC old 4: contains(solution, ‘&1’) > 0 new 4: contains(solution, ‘requirements be what PC’) > 0

15526

  1. What type of OS can be used with the xxx?
  2. What are the minimum requirements of a PC for use with xxx?
  3. What versions of browser can be used?
  4. What modem is recommended in

and also

SQL> / Enter value for 1: Mac is browsers old 4: contains(solution, ‘&1’) > 0 new 4: contains(solution, ‘Mac is browsers’) > 0

15526

  1. What type of OS can be used with the xxx?
  2. What are the minimum requirements of a PC for use with xxx?
  3. What versions of browser can be used?
  4. What modem is recommended in

[ its matching ‘Mac version browsers’ ]

But it doesn’t match if you leave the stop words out of the search:

SQL> / Enter value for 1: requirements PC old 4: contains(solution, ‘&1’) > 0 new 4: contains(solution, ‘requirements PC’) > 0

no rows selected

As I say this seems to me to be weird. But it seems to be as per the documentation. At:

http://www.stanford.edu/dept/itss/docs/oracle/9i/text.920 /a96517/query.htm (text manual)

It says

When you include a stopword within your query phrase, the stopword matches any word. For example, the query: ‘Jack was big’ matches phrases such as Jack is big and Jack grew big assuming was is a stopword.

Examples

So re: the examples below, currently I’d make the following assumptions.

Search ‘What are the minimum requirements of a PC for use with’ finds document

  • as expected

Search ‘What type of OS can be used with the’ finds document

  • as expected

Search ‘What type os’ doesn’t

As per the points above re ‘or-ing’ and stoplists, I believe you either need to explicitly ‘or’ it:

SQL> / Enter value for 1: whattypeos old 4: contains(solution, ‘&1’) > 0 new 4: contains(solution, ‘whattypeos’) > 0

15526

  1. What type of OS can be used with the xxx?
  2. What are the minimum requirements of a PC for use with xxx?
  3. What versions of browser can be used?
  4. What modem is recommended in

Or explicitly have a stopword in the place where the ‘of’ is. I used ‘what’ again here.

SQL> / Enter value for 1: what type what os old 4: contains(solution, ‘&1’) > 0 new 4: contains(solution, ‘what type what os’) > 0

15526

  1. What type of OS can be used with the xxx?
  2. What are the minimum requirements of a PC for use with xxx?
  3. What versions of browser can be used?
  4. What modem is recommended in

Search ‘What type OS’ doesn’t

  • as above, I think

Search ‘What os’ does

  • ‘What’ being a stopword is behaving like a wildcard. So it would also get a match with any other stopword I think. As per:

SQL> / Enter value for 1: mz os old 4: contains(solution, ‘&1’) > 0 new 4: contains(solution, ‘mz os’) > 0

15526

  1. What type of OS can be used with the xxx?
  2. What are the minimum requirements of a PC for use with xxx?
  3. What versions of browser can be used?
  4. What modem is recommended in

Note:‘mz’ is an Oracle default stopword, for some reason.