Use cases

Let us see what we can do with those concepts in sheraf. Wo do not pretend sheraf can replace tools like Whoosh, but to experiment the flexibility sheraf offers.

Name completion

Imagine we have a cowboy database, and we would like to be able to find cowboys by their names. We have a input field, and we would like to suggest valid cowboy names as the user is typing. Then we can enforce the user to choose one of the valid names to the search. The idea is to find cowboy names that are prefixed by the user input. For instance, if we have a cowboy name George Abitbol, then he would appear in the suggestion box if we type Geor or abit.

Note

This is just an example. In a real situation, querying the database each time an user presses a key does not seem to be a good idea. Periodically generating and caching the valid data you want to suggest sounds a better way to achieve this.

  • We do not have to understand a whole natural language like English, because proper nouns won’t appear in a dictionnary. Also each name stands for a unique person, and there is no name synonyms. In that case it seems useless to deal with stemming or lemmatization.

  • We can consider our search queries will be indexed maximum once for each cowboy. Thus, we can avoid using pertinence algorithms.

  • We just want to find approximate matches, so case and accents won’t matter. Thus, we can use alphabet reduction techniques.

  • We can provide useful data to users before they can make a typo, so typo correction algorithms are not needed here.

>>> import unidecode
>>> import itertools
>>> def cowboy_indexation(string):
...     lowercase = string.lower()
...     unaccented = unidecode.unidecode(lowercase)
...     names = unaccented.split(" ")
...     permutations = {
...         " ".join(perm)
...         for perm in itertools.permutations(names, len(names))
...     }
...     return {
...         name[:x]
...         for name in permutations
...         for x in range(len(name))
...         if name[:x]
...     }
...
>>> def cowboy_query(string):
...     lowercase = string.lower()
...     return {unidecode.unidecode(lowercase)}
...
>>> class Cowboy(sheraf.Model):
...     table = "cowboys_prefixes"
...     name = sheraf.StringAttribute().index(
...         index_keys_func=cowboy_indexation,
...         search_keys_func=cowboy_query,
...     )

The indexation method sets the names in lowercase, remove the accents, then build all the possible combinations of words in the name (because we want the user to be able to type George Abitbol or Abitbol George), and then build all the possible prefixes for those combinations.

>>> with sheraf.connection(commit=True):
...    george = Cowboy.create(name="George Abitbol")
...
...    assert [george] == Cowboy.search(name="George")
...    assert [george] == Cowboy.search(name="gEoRgE")
...    assert [george] == Cowboy.search(name="Abitbol")
...    assert [george] == Cowboy.search(name="geo")
...    assert [george] == Cowboy.search(name="abi")
...    assert [george] == Cowboy.search(name="George Abi")
...    assert [george] == Cowboy.search(name="Abitbol Geo")
...
...    assert [] == Cowboy.search(name="Peter")
...    assert [] == Cowboy.search(name="eorge")

We can see that any prefix of any words in the name is enough to find back a cowboy.