silverorange Code


NateGoSearch.NateGoSearchQuery
/NateGoSearch/NateGoSearchQuery.php at line 59

Class NateGoSearchQuery

NateGoSearchQuery

public class NateGoSearchQuery

Perform queries using a NateGoSearch index

This is the class used to actually search indexed keywords. Instances of this class may search the index using the NateGoSearchQuery::query() method. For example, to search a database table called Article indexed with a document type of article, use the following code:

<?php
$query 
= new NateGoSearchQuery($db);
$query->addDocumentType('article');
$result $query->query('some keywords');

$sql 'select id, title from Article ' .
    
'inner join %s on Article.id = %s.document_id and '.
    
'%s.unique_id = \'%s\' and %s.document_type = %s';

$sql sprintf($sql,
    
$result->getResultTable(),
    
$result->getResultTable(),
    
$result->getResultTable(),
    
$result->getUniqueId(),
    
$result->getResultTable(),
    
$result->getDocumentType('article'));

$articles $db->query($sql);
?>

Because of the specific PL/pgSQL implementation of the search algorithm, the query() method may only be called once per page request.

If the PECL stem package is loaded, English stemming is applied to all query keywords. See http://pecl.php.net/package/stem/ for details about the PECL stem package. Support for stemming in other languages may be added in later releases of NateGoSearch.

Otherwise, if a PorterStemmer class is defined, it is applied to all query keywords. The most commonly available PHP implementation of the Porter-stemmer algorithm is licenced under the GPL, and is thus not distributable with the LGPL licensed NateGoSearch.

Copyright:
2006-2007 silverorange
License:
http://www.gnu.org/copyleft/lesser.html LGPL License 2.1

Field Summary
protected array

$blocked_words

Keywords that should not be included in the search performed by this query.

protected MDB2_Driver_Common

$db

The MDB2 database driver to use.

protected array

$document_types

The document types searched by this query.

protected array

$popular_words

Popular keywords that are searched often on the site.

protected NateGoSearchSpellChecker

$spell_checker

Spell checked used to check the spelling of keywords used in this query.

Constructor Summary

NateGoSearchQuery(MDB2_Driver_Common db)

Creates a new NateGoSearch fulltext query.

Method Summary
void

addBlockedWords(string|array words)

Adds words to the list of words that are not to be searched.

void

addDocumentType(string type_shortname)

Adds a document type to be searched by this query.

void

addPopularWords(string|array words)

Adds words to the list of popular words that should be suggested.

protected array

cleanPopularResults(array results)

Clean the popular words list.

static array

getDefaultBlockedWords()

Gets a default list of words that are not searched by a search query.

protected void

getPopularReplacements(string keywords, array misspellings)

Get popular replacements for words in the search keywords.

protected boolean

isPopularMatch(string word1, string word2)

Checks if two words are similar.

protected string

normalizeKeywordsForSearching(string text)

Performs additional normalization of a query string suitable for searching.

protected string

normalizeKeywordsForSpelling(string text)

Performs initial normalization of a query string suitable for spell-checking.

NateGoSearchResult

query(string keywords)

Queries the NateGoSearch index with a set of keywords.

private string

quoteArray(array array, string type)

Quotes a PHP array into a PostgreSQL array.

void

setSpellChecker(NateGoSearchSpellChecker spell_checker)

Sets the spell checker used by this query.

Field Detail

/NateGoSearch/NateGoSearchQuery.php at line 83

blocked_words

protected array $blocked_words = array()

Keywords that should not be included in the search performed by this query

This is an array of blocked keywords.

See Also:
NateGoSearchQuery::addBlockedWords()

/NateGoSearch/NateGoSearchQuery.php at line 117

db

protected MDB2_Driver_Common $db

The MDB2 database driver to use

Currently, NateGoSearch only supports PostgreSQL.

See Also:
NateGoSearchQuery::__construct()

/NateGoSearch/NateGoSearchQuery.php at line 71

document_types

protected array $document_types = array()

The document types searched by this query

See Also:
NateGoSearch::getDocumentType()
NateGoSearchQuery::addDocumentType()

/NateGoSearch/NateGoSearchQuery.php at line 94
protected array $popular_words = array()

Popular keywords that are searched often on the site

This is an array of of popular keywords.

See Also:
NateGoSearchQuery::getPopularWords()

/NateGoSearch/NateGoSearchQuery.php at line 106

spell_checker

protected NateGoSearchSpellChecker $spell_checker

Spell checked used to check the spelling of keywords used in this query

Null by default meaning no spell-checking is done.

See Also:
NateGoSearchQuery::setSpellChecker()

Constructor Detail

/NateGoSearch/NateGoSearchQuery.php at line 127

NateGoSearchQuery

public NateGoSearchQuery(MDB2_Driver_Common db)

Creates a new NateGoSearch fulltext query

Parameters:
db - the database driver to use.

Method Detail

/NateGoSearch/NateGoSearchQuery.php at line 173

addBlockedWords

public void addBlockedWords(string|array words)

Adds words to the list of words that are not to be searched

These may be words such as 'the', 'and' and 'a'.

Parameters:
words - the list of words not to be searched.

/NateGoSearch/NateGoSearchQuery.php at line 145

addDocumentType

public void addDocumentType(string type_shortname)

Adds a document type to be searched by this query

Parameters:
type_shortname - the shortname of the document type to add.
See Also:
NateGoSearch::createDocumentType()
Throws:
if the document type shortname does not exist.

/NateGoSearch/NateGoSearchQuery.php at line 333

addPopularWords

public void addPopularWords(string|array words)

Adds words to the list of popular words that should be suggested

Parameters:
words - the list of popular words.

/NateGoSearch/NateGoSearchQuery.php at line 504

cleanPopularResults

protected array cleanPopularResults(array results)

Clean the popular words list

This is used to clean up the results queried in NateGoSearchQuery::getPopularWords() to a list of unique words that contains only on word per array entry. Numbers and common words found in NateGoSearchQuery::blocked_words are also removed from the list.

Parameters:
results - an array of results to clean
Returns:
an array of cleaned results.

/NateGoSearch/NateGoSearchQuery.php at line 352

getDefaultBlockedWords

public static array getDefaultBlockedWords()

Gets a default list of words that are not searched by a search query

These words may be passed directly to the NateGoSearchQuery::addBlockedWords() method.

Returns:
a default list of words not to index.

/NateGoSearch/NateGoSearchQuery.php at line 459

getPopularReplacements

protected void getPopularReplacements(string keywords, array misspellings)

Get popular replacements for words in the search keywords.

This is used to check search keywords along with their coresponding spelling suggestion for matches in the popular words list. If a match is found we either replace the current misspelling, if one exists, or add an entry to the mispelling list with the new popular suggestion added.

Parameters:
keywords - the keywords to check for improved suggestions
misspellings - the misspellings for the given $keywords
misspellings - the misspellings with added suggestions for popular words

/NateGoSearch/NateGoSearchQuery.php at line 541

isPopularMatch

protected boolean isPopularMatch(string word1, string word2)

Checks if two words are similar

This is used to check if a one word matches a another word from the popular wordlist. Used to improve search suggestions by confirming whether two words are similar in sound and/or spelling.

Parameters:
word1 - the first word being compared for similarities
word2 - the second word being compared for similarities
Returns:
whether or not the strings are similar

/NateGoSearch/NateGoSearchQuery.php at line 430

normalizeKeywordsForSearching

protected string normalizeKeywordsForSearching(string text)

Performs additional normalization of a query string suitable for searching

This converts all words to lower-case and removes apostrophe s's from all words. Keywords should have already been partially normalized using NateGoSearchQuery::normalizeKeywordsForSpelling().

Parameters:
text - the string to be normalized.
Returns:
the normalized string.
See Also:
NateGoSearchQuery::normalizeKeywordsForSpelling()

/NateGoSearch/NateGoSearchQuery.php at line 388

normalizeKeywordsForSpelling

protected string normalizeKeywordsForSpelling(string text)

Performs initial normalization of a query string suitable for spell-checking

This removes excess punctuation and markup. The resulting string may be tokenized by spaces. Before searching, query strings should be further normalized using NateGoSearchQuery::normalizeKeywordsForSearching().

Parameters:
text - the string to be normalized.
Returns:
the normalized string.
See Also:
NateGoSearchQuery::normalizeKeywordsForSearching()

/NateGoSearch/NateGoSearchQuery.php at line 197

query

public NateGoSearchResult query(string keywords)

Queries the NateGoSearch index with a set of keywords

Querying does not directly return a set of results. This is due to the way NateGoSearch is designed. The document ids from this search are stored in a results table and accessed through a unique identifier.

Parameters:
keywords - the search string to query.
Returns:
an object containing result information.
See Also:
NateGoSearchResult::getUniqueId()

/NateGoSearch/NateGoSearchQuery.php at line 311

quoteArray

private string quoteArray(array array, string type)

Quotes a PHP array into a PostgreSQL array

This is used to quote the list of document types used in the internal SQL query.

Parameters:
array - the array to quote.
type - the SQL data type to use. The type is 'integer' by default.
Returns:
the array quoted as an SQL array.

/NateGoSearch/NateGoSearchQuery.php at line 290

setSpellChecker

public void setSpellChecker(NateGoSearchSpellChecker spell_checker)

Sets the spell checker used by this query

Parameters:
spell_checker - optional. The spell checker to use for this query. If not specified or specified as null, no spell checking is performed.

silverorange Code