Semantic MediaWiki and related extensions
search

https://github.com/SemanticMediaWiki/SemanticMediaWiki/blob/master/src/Elastic/docs/usage.md "Usage" | https://github.com/SemanticMediaWiki/SemanticMediaWiki/blob/master/src/Elastic/docs/config.md "Settings" | https://github.com/SemanticMediaWiki/SemanticMediaWiki/blob/master/src/Elastic/docs/technical.md "Technical notes" | https://github.com/SemanticMediaWiki/SemanticMediaWiki/blob/master/src/Elastic/docs/faq.md "FAQ"

#ask queries are system-agnostic, meaning that queries that worked with the SQLStore (or SPARQLStore) are expected to work equally with ElasticStore and do not require any modifications to a query or its syntax.

By default, the ElasticStore has set its query execution to a compatibility mode where queries are expected to return the same results as when used with the SQLStore. For example, in some instances Elasticsearch could provide a different result especially in connection with boolean query operators, but the compat.mode warrants consistency among results retrieved from the ElasticStore in comparison to the SQLStore (which is important when running the same set of integration tests against each store).

Filter and query context

Most searches with a discrete value in Semantic MediaWiki will be classified as structured search that operates with a filter context while full-text or proximity searches use a query context that assigns relevancy scores to a match pool. A filter context will always yield a 1 relevancy score as it is translated into a boolean operation which either matches or neglects a result as part of a set.

Prefixes

To improve the handling of proximity searches, the following expressions can be used:

Expression Interpret as Description Note
in: ... ~~* ... * or ~* ... * Find anything that contains ... The in: expression can also be combined with a property and depending on the type, context will be interpreted differently.
phrase: ... ~~" ... " or ~" ... " Find anything that contains ... in the exact same order The phrase: expression is only relevant for literal components such as text or page titles as well as unstructured text.
not: ... !~~... or !~... Do not match any entity that matches ... The not: expression is intended to only match the exact entered term. It can be extended using * if necessary (e.g. [[Has text::not:foo*]])

A wide proximity is expressed with ~~ and the intent to search where a specific property is unknown (in case of ES it can expand the search radius to fields that have not been annotated or processed by Semantic MediaWiki prior to a query request; see indexer.raw.text and experimental.file.ingest)

Type #ask Interpret as
- [[in:some foo]] [[~~*some foo*]]
Text [[Has text::in:some foo]] [[Has text::~*some foo*]]
Page [[Has page::in:foo]] [[Has text::~*foo*]]
Number [[Has number::in:99]] [[Has number:: [[≥0]] [[≤99]] ]]
  [[Has number::in:-100]] [[Has number:: [[≥-100]] [[≤0]] ]]
Time [[Has date::in:2000]] [[Has date:: <q>[[≥2000]] [[<<1 January 2001 00:00:00]]</q> ]]

Relevancy and scores

Relevancy sorting is a topic of its own (and is only provided by Elasticsearch and the ElasticStore). In order to sort results by a score, the #ask query needs to signal that a different context is required during the query execution. The es.score sortkey (see score.sortfield which is used as a convention key) signals to the QueryEngine that for a non-filtered context score, tracking is to be enabled.

Only query constructs that use a non-filtered context (~/!~/in/phrase/not) provide meaningful scores that are expressive enough for sorting results; otherwise, results will not be distinguishable and not contribute to a meaningful overall sorting experience.

// Find entities that contains "some text" in the property `Has text` and sort
// by its score returned from each matched document:
{{#ask: [[Has text::in:some text]]
 |sort=es.score
 |order=desc
}}

Property chains, paths, and subqueries

ES does not support subqueries or join constructs natively but it provides so-called terms lookup which we enable to execute a path (chain of properties), building an iterative process allowing the creation of a set of results that match a path condition (e.g. Foo.bar.foobar), with each element holding a restricted list of results from the previous execution to traverse the property path.

The introduced process allows matching the SQLStore behaviour in terms of path queries where the QueryEngine splits each path and computes a list of elements. To avoid issues with a vast list of matches, Semantic MediaWiki will "park" those results in the lookup index with the subquery.terms.lookup.index.write.threshold setting (the default is 100) determining when the results will be moved into the separate lookup index.

Hierarchies

Property and category hierarchies are supported by relying on a conjunctive boolean expression for hierarchy members that are computed outside of the Elasticsearch framework (the Elasticsearch parent join type is not used for this).

Examples

File attachment

If the file ingestion was enabled and the processing has provided the File attachment property then access to its content is available using the property chain notation.

`` // Find all subjects (aka. files) that were ingested and indexed with aimage/png` // content type

{{#ask: [[File attachment.Content type::image/png]] |?File attachment.Content title }}

// Find all subjects (aka. files) that were ingested and indexed with a application/pdf // content type and where the index content contains the brown fox text

{{#ask: [[File attachment.Content type::application/pdf]] [[in:brown fox]] |?File attachment.Content title |?File attachment.Content author }} ```

Query debugging

format=debug will output a detailed description of the #ask and Elasticsearch DSL used for a query response, making it possible to analyze and retrieve explanations from Elasticsearch about a query request.

Special:Search integration

In the event SMWSearch is enabled, it is possible to retrieve highlighted text snippets for matched entities from Elasticsearch if highlight.fragment.type is set to one of the declared types (plain, unified, and fvh). Type plain can be used without any specific requirements; for the other types please consult the Elasticsearch documentation.


About | General disclaimer | Privacy policy