Understanding Solr Configuration And Panl Integration
This chapter builds on the Panl Fundamentals and Solr Fundamentals chapters and covers information that is required to understand the integration points between the Solr and Panl servers. Many books, articles, blog posts, sites, and information has been written and published on the Apache Solr project and server which are useful for understanding and fine-tuning the Solr search server, this chapter will not go into enough detail to even come close to replacing those resources.
Querying Data
For any search engine, there are two ways to query the data, either through a keyword search, or through faceting. The keyword matches text found within the indexed fields, whilst facets refine results by selecting matching document attributes.
For example, in the Mechanical Pencils dataset, the Faber-Castell Goldfaber wooden pencil has the following data:
01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 |
{ "brand" :"Faber-Castell", "name": "Goldfaber", "mechanism_type": "None", "nib_shape": "Tapered", "body_shape": "Hexagonal", "grip_type": "None", "grip_shape": "Hexagonal", "cap_shape": "None", "category": "Everyone", "length": "175", "relative_length":"175", "diameter": "8", "weight": "4", "relative_weight": "5", "lead_length": "120", "disassemble": false, "nib_material": "Wood", "mechanism_material": "N/A", "grip_material": "Wood", "body_material": "Wood", "tubing_material": "None", "clip_material": "None", "cap_material": "None", "hardness_indicator": "Etched on body", "lead_size_indicator": "No", "in_built_eraser": false, "in_built_sharpener": false, "variants": [ "Blue and gold pinstriping" ], "colours": [ "Blue" ], "description": "The basic, everyday pencil from Faber-Castell Goldfaber. A ↩ classic hexagonal wooden pencil with gorgeous blue and gold pinstriping. Not ↩ a mechanical pencil, but always worth including as a reference point.", "id": 4, "images": [ "400-goldfaber-blue.jpg" ] } |
The description field is used for the keyword search, whilst the other data is used as the attributes for faceting and fields. In the implementation used in the Panl example setup, other fields are copied to an additional Solr field named text, which is indexed and analysed by Solr and then used for the keyword to be searched against.
For each of the JSON keys in the above example JSON file must be defined in the Solr managed schema with the correct type.
There are two ways that the Solr index is queried with a keyword.
- The default query search
- The Specific Solr Keyword Field search
The Solr Query Operand
|
IMPORTANT: If a query operand LPSE code is within the URL, then this __WILL_ALWAYS__ override the default query operand configured in the <panl_collection_url>.panl.properties file.
Additionally, the query operand works between __ALL__ keywords, and __ALL__ specific fields - i.e. you cannot have a keyword phrase with the words ORed and the specific fields ANDed (and vice versa). |
The Solr query operand defines how the keywords are used when searching against Solr fields. By default, the query operand is set to 'OR' - i.e. any of the keywords
This default query operand can be set per collection in the <panl_collection_url>.panl.properties configuration file with the property solr.default.query.operand.
To set the query operand to OR (which is the default), the property becomes:
solr.default.query.operand=-
To set the query operand to AND, the property becomes:
solr.default.query.operand=+
If the query operand is set to OR, then the keywords will be ORed together - i.e. when there is more than one keyword, then it will select keyword_one OR keyword_two OR ...
If the query operand is set to AND then the keywords will be ANDed together - i.e. when there is more than one keyword, then it will select keyword_one AND keyword_two AND ...
|
Note: This only affects keyword searches with multiple words and single or multiple word searches across Specific Solr Field Searches. |
The second Panl configuration item in the <panl_collection_url>.panl.properties file is the panl.param.query.operand which is the LPSE code to place in the URL. If required, this can be set by the user and is returned in the JSON response including valid addition and removal URLs. By default, the generator will prompt that this is set to the LPSE code of 'o'. The property is defined within the configuration files in this book as:
panl.param.query.operand=o
The above two properties are referenced in the next section.
The Default Keyword Search
The default keyword search works only on a single Solr field (which in the examples in the book is the text Solr field) in conjunction with the Solr query operand (in the examples this is mapped to the 'o' LPSE code). This default field has the author, title, description, genre, and series Solr fields copied to it so it will use all of these values in the search.
Single Keyword Example
For the Bookstore example a simple search of 'Mary'
http://localhost:8181/panl-results-viewer/book-store/default/Mary/q/
Will send through the Solr query of
q="Mary"&q.op=OR
And returns 6 results.
The q.op is not defined in the URL (there is no 'o' LPSE code) so it defaults to the query operand in the <panl_collection_url>.panl.properties file (as set by the panl.param.query.operand=o property).
As there is only one keyword, setting the solr.default.query.operand property will have no effect, it is only when multiple keywords are passed through.
Multiple Keywords Example
The Bookstore example with a keyword search of 'Mary sequel'
http://localhost:8181/panl-results-viewer/book-store/default/Mary%20sequel/q/
Will send through the Solr query of
q="Mary"+"sequel"&q.op=OR
And returns 7 results
Note: the keyword has been split on the whitespace character and will be ORed on the field - i.e. the keyword 'Mary' OR 'sequel' must be in the default field - hence more results are returned.
If the default query operand is set to AND (i.e. solr.default.query.operand=+) or the LPSE code for the Panl query operand parameter is in the URL (panl.param.query.operand=o) then the keywords will be ANDed together.
For the Bookstore URL:
http://localhost:8181/panl-results-viewer/book-store/default/Mary%20sequel/qo+/
Will send through the Solr query of:
q="Mary"+"sequel"&q.op=AND
Returns only 1 result.
As there is only one result that contains the word 'Mary' and 'sequel' in the 'text' Solr field.
Keyword Phrases Searches
To perform a keyword phrase, the user will need to enter the keyword phrase in double quotation marks, for a search on 'fiction novel' (without double quotation marks)
http://localhost:8181/panl-results-viewer/book-store/default/fiction%20novel/q/
31 results - i.e. the default text has either fiction OR novel in the default search field
q="fiction"+"novel"&q.op=OR
Returns 31 results
However, for the actual phrase "fiction novel" (with double quotation marks), the keyword search of
http://localhost:8181/panl-results-viewer/book-store/default/%22fiction%20novel%22/q/
3 results - i.e. there are only 3 results which have the exact phrase "fiction novel" in the default search field.
q="fiction+novel"&q.op=OR
Lookahead Searches
Lookahead searches are lightweight searches and always rely on the configured default query operand, irrespective of whether there is a LPSE code that changes the Solr query operand.
The lookahead search is mapped to the URL:
http://localhost:8181/panl-lookahead/<panl_collection>/<fieldset>
The lookahead search ONLY searches on the default field - you CANNOT use the Specific Solr Field Searches for the lookahead.
The implementation is the same as the default search above.
The Specific Solr Field Keyword Search
The Specific Solr Field keyword search works on one or more Solr fields (which in the examples in the Bookstore collection is text_author, title, and description) in conjunction with the Solr query operand (in the examples this is mapped to the 'o' LPSE code).
Single Keyword Example
For the Bookstore example a simple search of 'Mary'
http://localhost:8181/panl-results-viewer/book-store/default/Mary/q(tTd)/
Will send through the Solr query of
q=title:"Mary"+OR+text_author:"Mary"^4+OR+description:"Mary"
And returns 6 results.
The q.op is not defined in the URL (there is no 'o' LPSE code) so it defaults to the query operand in the <panl_collection_url>.panl.properties file (as set by the panl.param.query.operand=o property - which in the Bookstore example is OR).
|
Notes: The same results are returned as in the single keyword search on the default field, however the order will be different as there is a boost on the text_author field of 4 - as designated by text_author:"Mary"^4 |
Multiple Keywords Example
The Bookstore example with a keyword search of 'Mary sequel'
http://localhost:8181/panl-results-viewer/book-store/default/Mary%20sequel/q(tTd)/
Will send through the Solr query of
q=title:"Mary"+OR+title:"sequel"+OR+text_author:"Mary"^4+OR+text_author:"sequel"^4+OR+description:"Mary"+OR+description:"sequel"
And returns 7 results
Note: the keyword has been split on the whitespace character and will be ORed on the field - i.e. the keyword 'Mary' OR 'sequel' must be in one of the specific fields - hence more results are returned.
|
Notes: The same results are returned as in the single keyword search on the default field, however the order will be different as there is a boost on the text_author field of 4 for both of the keywords - as designated by text_author:"Mary"^4 and text_author:"sequel"^4 |
If the default query operand is set to AND (i.e. solr.default.query.operand=+) or the LPSE code for the Panl query operand parameter is in the URL (panl.param.query.operand=o) then the keywords will be ANDed together.
For the Bookstore URL:
http://localhost:8181/panl-results-viewer/book-store/default/Mary%20sequel/q(tTd)o+/
Will send through the Solr query of:
q=title:"Mary"+AND+title:"sequel"+AND+text_author:"Mary"^4+AND+text_author:"sequel"^4+AND+description:"Mary"+AND+description:"sequel"
Returns No Results.
As there are no results that contain the word 'Mary' AND 'sequel' in the 'text_author' AND 'title' and 'description' Solr fields.
Keyword Phrases Searches
To perform a keyword phrase, the user will need to enter the keyword phrase in double quotation marks, for a search on "fiction novel" (without double quotation marks)
http://localhost:8181/panl-results-viewer/book-store/default/fiction%20novel/q(tTd)/
31 results - i.e. the specific search fields have either fiction OR novel in the the fields
q=title:"fiction"+OR+title:"novel"+OR+text_author:"fiction"^4+OR+text_author:"novel"^4+OR+description:"fiction"+OR+description:"novel"
Returns 27 results
With the actual phrase "fiction novel" (with double quotation marks), the keyword search of
http://localhost:8181/panl-results-viewer/book-store/default/%22fiction%20novel%22/q(tTd)o-/
3 results - i.e. there are only 3 results which have the exact phrase "fiction novel" in the specific search fields.
q=title:"fiction+novel"+OR+text_author:"fiction+novel"^4+OR+description:"fiction+novel"
With the query operand set to AND, zero results are returned in both cases (i.e. with and without double quotation marks):
Without double quotation marks:
http://localhost:8181/panl-results-viewer/book-store/default/fiction%20novel/q(tTd)o+/
Solr query
q=title:"fiction"+AND+title:"novel"+AND+text_author:"fiction"^4+AND+text_author:"novel"^4+AND+description:"fiction"+AND+description:"novel"
With double quotation marks:
http://localhost:8181/panl-results-viewer/book-store/default/%22fiction%20novel%22/q(tTd)o+/
Solr query
q=title:"fiction+novel"+AND+text_author:"fiction+novel"^4+AND+description:"fiction+novel"
Which makes sense as it would be highly unlikely to have the all of the words in the title and the author.
As a final example, searching for Mary Poppins - both with and without double quotation marks - first in the title and description fields, then in the text_author, title, and description fields.
Panl URL (title and description fields only) |
Quote? |
Operand |
# Results |
Searching on the title and description fields only |
|||
http://localhost:8181/panl-results-viewer/book-store/default/mary%20poppins/q(td)/ |
No |
OR |
6 |
http://localhost:8181/panl-results-viewer/book-store/default/mary%20poppins/q(td)o+/ |
No |
AND |
2 |
http://localhost:8181/panl-results-viewer/book-store/default/%22mary%20poppins%22/q(td)/ |
Yes |
OR |
2 |
http://localhost:8181/panl-results-viewer/book-store/default/%22mary%20poppins%22/q(td)o+/ |
Yes |
AND |
2 |
Searching on the title, description, and text_author fields |
|||
http://localhost:8181/panl-results-viewer/book-store/default/mary%20poppins/q(tTd)/ |
No |
OR |
6 |
http://localhost:8181/panl-results-viewer/book-store/default/mary%20poppins/q(tTd)o+/ |
No |
AND |
0 |
http://localhost:8181/panl-results-viewer/book-store/default/%22mary%20poppins%22/q(tTd)o-/ |
Yes |
OR |
2 |
http://localhost:8181/panl-results-viewer/book-store/default/%22mary%20poppins%22/q(tTd)o+/ |
Yes |
AND |
0 |
The Solr Managed Schema
Included within the downloaded Panl release is an example managed schema file for the mechanical pencils collection.
PANL_INSTALL_DIRECTORY/sample/solr/mechanical-pencils/managed-schema.xml
For the Solr server version 9 - this file can also be viewed on the GitHub repository:
Whilst there are many configuration options within the schema file, this file has three major parts of interest to the Panl server, namely:
- The schema name attribute on the top level element (<schema name="mechanical-pencils" />) which is configured to map to the Panl collection URL name,
- The field definitions including the fields type (<field /> elements) which may either be facets or fields, and
- The field type definitions (<fieldType /> elements) which drive validation and in place text replacements for the Panl server
In this file formatting added for readability, which means that the original schema file line numbers will not match the below:
01 02 03
04
05
06
07
08
09
10 11 12 13 14 15 16 17 18 |
<?xml version="1.0" encoding="UTF-8" ?> <schema name="mechanical-pencils" version="1.6"> <field name="_version_" type="plong" indexed="false" stored="false" ↩ docValues="true/> <field name="id" type="string" indexed="false" stored="true" required="true" ↩ multiValued="false" /> <field name="brand" type="string" indexed="true" stored="true" ↩ multiValued="false" /> <field name="disassemble" type="boolean" indexed="true" stored="true" ↩ multiValued="false" /> <field name="description" type="text_general" indexed="true" stored="true" ↩ multiValued="false" /> <field name="manufacturer_link" type="string" indexed="false" stored="true" ↩ multiValued="false" /> <field name="colours" type="string" indexed="true" stored="true" ↩ multiValued="true" />
<!-- additional field definitions -->
<fieldType name="string" class="solr.StrField" sortMissingLast="true" /> <fieldType name="boolean" class="solr.BoolField" sortMissingLast="true"/>
<!-- additional field type definitions -->
</schema> |
Line 1:
Is the standard XML definition
Line 2:
The start of the schema definition for the mechanical-pencils data. Note: The schema's XML elements name attribute (i.e. name="mechanical-pencils") will be used by the Panl generator as the filename for the <panl_collection_url>.panl.properties file, which will also form part of the URL that Panl will bind this collection to. For the above example schema file, it will generate a properties file named mechanical-pencils.panl.properties, and place a property in the panl.properties file of
panl.collection.mechanical-pencils=mechanical-pencils.panl.properties
Which will then be bound to the URL:
http://localhost:8080/mechanical-pencils/*
|
IMPORTANT: When creating a collection in the Solr server, (i.e. by having a panl.collection.<solr_collection_name>=<panl_collection_url>.<properties_file_name> in the panl.properties file, the <solr_collection_name>) part of the property key __MUST__ match the Solr collection name to connect to. The <properties_file_name>, may be any name, noting that the <panl_collection_url> first part of the file name is the URL path that the Panl server will respond to, and __MUST__ be unique amongst all URL paths registered by the Panl server.
For example, the following line in the panl.properties file:
panl.collection.mechanical-pencils=mechanical-pencils.panl.properties
Will be parsed as follows:
The Solr collection name of mechanical-properties will be taken from the property key: panl.collection.mechanical-pencils
Panl will read the configuration from the properties file and will use the first prefix of the filename to bind the Panl URL path to, for the above example the URL path would be mechanical-properties/*
brands/* |
Lines 3-9:
These are the field definitions, seven fields are defined in the above example, there are many more fields in the actual managed schema file.
- _version_ - required by Solr - this is an internal field that is used by the partial update procedure, the update log process, and by SolrCloud.
- id - this is the identifier of the result, and must be unique across the collection.
- brand - the field that stores the brand of the mechanical
- disassemble - the field that stores whether the mechanical pencil can be easily disassembled.
- description - the field that stores the description of the pencil
- manufacturer_link - the field that stores the link to the manufacturer of the mechanical pencil
- colours - the field that can store multiple colour values for the specific mechanical pencil.
Line 11:
For brevity, additional field definitions were removed and replaced with a comment.
Lines 13-14:
Solr field type definitions, which the Panl generator will look for to determine how validation and prefix-suffix replacement will be done.
Note that the solr.BoolField will also allow boolean value replacement (along with optional prefixes and suffixes).
Line 16:
For brevity, additional field type definitions were removed and replaced with a comment.
Line 18:
The end of the Solr managed schema.
Determining the Appropriate FieldType and Attributes
A quick overview of the decisions around choosing the field type, and setting the :
Image: The decision chart for determining the field types and attributes
There attributes for every field definition in the Solr managed schema are as follows:
- name - the Solr field name for this piece of data
- type - the type of the data to be indexed, this will determine the configuration options that are available through the Panl server. Note: this 'type' is used to reference the Solr field type. It is this field type that determines how it is to be indexed by Solr.
- indexed - whether to index the data so that it may be searched/faceted upon
- stored - whether the data will be stored in the Solr index and available to be returned with the results documents.
- multiValued - whether this field may contain more than one value
For example:
In the sample file, the brand field is configured as a type of string
01 |
<field name="brand" type="string" indexed="true" stored="true" ↩ |
The value of the type attribute is then mapped to a Solr class. Further down the Solr managed schema file are the definitions of the field types which match the above type attribute (I.e. the type of the field above, matches the name of the fieldType's name attribute below). For the above field, the matching fieldType definition is below.
01 |
<fieldType name="string" class="solr.StrField" sortMissingLast="true" /> |
This will return the facet values for the brand data as they are and stored. With NO ANALYSIS done on the fields.
|
They will be displayed on the Panl Results Viewer:
http://localhost:8181/panl-results-viewer/mechanical-pencils/brandandname
And return the facets values in full (including the configured prefix and suffix):
Brand (b)
- Manufactured by Koh-i-Noor Company (11)
- Manufactured by Caran d'Ache Company (4)
- Manufactured by Faber-Castell Company (4)
- Manufactured by Pacific Arc Company (4)
- Manufactured by Alvin Company (3)
- Manufactured by Kaweco Company (3)
- Manufactured by Rotring Company (3)
- Manufactured by Hightide Penco Company (2)
- Manufactured by Kita-Boshi Company (2)
- Manufactured by Küelox Company (2)
- Manufactured by Mitsubishi Company (2)
- Manufactured by OHTO Company (2)
- Manufactured by Scrikks Company (2)
- Manufactured by Staedtler Company (2)
- Manufactured by BIC Company (1)
- Manufactured by DEDEDEPRAISE Company (1)
- Manufactured by Ito-Ya Company (1)
- Manufactured by Mr. Pen Company (1)
- Manufactured by Muji Company (1)
- Manufactured by Redcircle Company (1)
- Manufactured by Unbranded Company (1)
- Manufactured by WSD Company (1)
- Manufactured by YStudio Company (1)
In contrast, if the brand field was configured as text_general
01 |
<field name="brand" type="text_general" indexed="true" stored="true" ↩ |
The fieldType definition includes an analyser which will break up the text into individual words and lowercase them, and ignores stopwords:
01 02 03 04 05 06
07 08 09 10 11 12 13 14 15
16 17 18 |
<fieldType name="text_general" class="solr.TextField" positionIncrementGap="100"> <analyzer type="index"> <tokenizer name="standard"/> <filter name="stop" ignoreCase="true" words="stopwords.txt" /> <!-- in this example, we will only use synonyms at query time <filter name="synonymGraph" synonyms="index_synonyms.txt" ignoreCase="true"↩ <filter name="flattenGraph"/> --> <filter name="lowercase"/> </analyzer>
<analyzer type="query"> <tokenizer name="standard"/> <filter name="stop" ignoreCase="true" words="stopwords.txt" /> <filter name="synonymGraph" synonyms="synonyms.txt" ignoreCase="true"↩ <filter name="lowercase"/> </analyzer> </fieldType> |
This will break up the facet values into individual, lower-cased words and return the analysed facet values for brand and display the following facet list: (Note: that this configuration is not included in the sample configuration)
Brand (b)
- i (11)
- koh (11)
- noor (11)
- arc (4)
- caran (4)
- castell (4)
- d'ache (4)
- faber (4)
- pacific (4)
- alvin (3)
- kaweco (3)
- rotring (3)
- boshi (2)
- hightide (2)
- kita (2)
- küelox (2)
- mitsubishi (2)
- ohto (2)
- penco (2)
- scrikks (2)
- staedtler (2)
- bic (1)
- dededepraise (1)
- ito (1)
- mr (1)
- muji (1)
- pen (1)
- redcircle (1)
- unbranded (1)
- wsd (1)
- ya (1)
- ystudio (1)
|
Tips: In general, the only data that you want analysed are those fields that will be used as a keyword search. |
Additionally, throughout this book, all Solr fields will use a copy field for text searching, rather than having Solr search on individual fields. This will allow values of non-analysed fields to be copied to this field, analysed, and queried.
Testing The Impact Of Indexed / Stored / Analysed
A testing configuration for Panl and Solr is included in the source code so that you can see the impact of indexed/analysed/stored on the searching, facets, and fields - see the src/test/config directory, or
https://github.com/synapticloop/panl/tree/main/src/test/config/
The Solr fields defined in the managed schema file define fields for every permutation of analysed/unanalysed, indexed, and stored.
If you add the collection to Solr, index the data, generate the Panl configuration, and start up the Panl server with this configuration you will be able to browse the results with the following URLs:
http://localhost:8181/panl-results-viewer/testing-all-facets/default
Which configures all fields to be facets, and
http://localhost:8181/panl-results-viewer/testing-all-fields/default
Which configures all fields to be just regular fields.
Additionally, both of the Panl configurations set all fields to be Specifically searchable and highlighting is enabled.
Note: ALL fields have the word 'synapticloop' in the data so a search on this keyword will return highlighting information for the fields that it was found in.
All Fields
This configuration has all Panl fields configured to be regular fields, searchable fields, and has highlighting turned on, the URL which searches for the word 'synapticloop' which does exist in every field:
http://localhost:8181/panl-results-viewer/testing-all-fields/default/synapticloop/q(ISbOaAhN)/
Will show the following page, note:
- There are no facets (as expected),
- Not all fields are returned in the results
- Not all fields are highlighted, despite the fact that every field was set as a specific search field.
Image: The All Fields Testing configuration Panl Results Viewer
All Facets
This configuration has all Panl fields configured to be regular facets, searchable fields, and has highlighting turned on and search for the keyword 'synapticloop', The URL:
http://localhost:8181/panl-results-viewer/testing-all-facets/default/synapticloop/q(ISbOaAhN)/
Will show the following page:
Image: The All Facets Testing configuration Panl Results Viewer
Despite the Panl configuration defining all fields as facets and specific search fields, they will not perform as required unless the underlying Solr fields have the correct definitions.
In the above image, the Solr fields (with their names) and where they will appear are
- A facet (see the Available Filters) section - all indexed fields whether analysed or not can be facets. Note that analysed Facets are broken into their individual words
- A field (see the Results) section - all stored fields can be returned within the results.
- A specific search field (see the Highlighted fields which are the only ones that respond to the specific search fields) - all fields that are both analysed AND stored[26] fields can be specific search fields.
Summary
The Solr managed schema file has this advice in a XML comment:
PERFORMANCE NOTE: this schema includes many optional features and should not be used for benchmarking. To improve performance one could
- set stored="false" for all fields possible (esp large fields) when you only need to search on the field but don't need to return the original value.
- set indexed="false" if you don't need to search on the field, but only return the field as a result of searching on other indexed fields.
- remove all unneeded copyField statements
- for best index size and searching performance, set "index" to false for all general text fields, use copyField to copy them to the catchall "text" field, and use that for searching.
The rules are:
- If Indexed the field can be used for a facet
- If Stored the field can be returned in the results
- If Analysed then the field can be used for the default search
- If Stored AND Analysed then the field can be used as a specific search field and highlighted
The below table summarises the indexed / stored / analysed fields in Solr and the impact as to what you can do with them.
Solr Field Name |
Index |
Store |
Analyse |
Notes |
indexed_field |
Yes |
No |
No |
Facet |
stored_field |
No |
Yes |
No |
Results field |
both_indexed_and_stored_field |
Yes |
Yes |
No |
Facet and results field |
none_field |
No |
No |
No |
Not returned |
analysed_indexed_field |
Yes |
No |
Yes |
Facet (see notes) |
analysed_stored_field |
No |
Yes |
Yes |
Results field, specific search and highlighting |
analysed_both_field |
Yes |
Yes |
Yes |
Facet (see notes), results field, specific search and highlighting |
analysed_none_field |
No |
No |
Yes |
No results |
Notes:
- Whilst you may set the Panl configuration to be a facet, or a field, or a specific search field, it DOES NOT mean that Solr will be able to use the field as it is configured, and consequently will not return the expected results and facets.
- It is not recommended to use analysed fields as Facets as the field value is tokenised and each individual word becomes a facet value.
- If you want define a field to be a Specific Solr Search Field and also a facet, then you will need to define two fields, one analysed, and one not and then copy the unanalysed field into the analysed field (There is an example in the Bookstore walkthrough of how to configure this).
The Impact Of docValues (Schema Version 1.7+)
In this example, the schema version is set to 1.6. There is a version of the schema in the src/test/config/solr-schema-1.7/ directory which has specific fields set to docValues="false" for the fields that are neither indexed nor stored.
For schema version 1.7 (which shipped with Solr 9.7.0 upwards), the docValues XML attribute on the fields was automatically set to true for the following Solr primitive field types:
- Numeric (not DenseVectorField)
- Boolean
- String
- Date
- UUID
- Enum
The Solr documentation states:
When using an earlier schemaVersion (<= 1.6), you only need to enable docValues for a field that you will use it with. As with all schema design, you need to define a field type and then define fields of that type with docValues enabled. All of these actions are done in the schema.
The impact of this change is:
- All fields will be able to be returned with the document results.
To stop a field being returned with the documents add an attribute of docValues="false" on the <field /> definition XML element in the managed-schema.xml file. - Any Analysed field will not be able to be faceted on
Whilst this is a good thing for the majority of use-cases as this will split the field into individual tokens, in some instances, you may wish to still have this split done.
To enable this, add or set an attribute of uninvertible="true" on the <field /> definition XML element in the managed-schema.xml file. Note: the field MUST be indexed (as is the normal requirement).
All Fields
For the above examples with the version 1.7 schema without the above changes, for the all fields Panl collection with a specific search for 'synapticloop' with the URL:
http://localhost:8181/panl-results-viewer/testing-all-fields/default/synapticloop/q(ISbOaAhN)/
|
All Facets
For the above examples with the version 1.7 schema without the above changes, for the all facets Panl collection with a specific search for 'synapticloop' with the URL:
http://localhost:8181/panl-results-viewer/testing-all-facets/default/synapticloop/q(ISbOaAhN)/
|
In the above image, the Solr fields (with their names) and where they will appear are
- A facet (see the Available Filters) section - all indexed fields whether analysed or not can be facets. Note that in this example, no fields that are analysed will appear in the facets AND the none_field (i.e. not analysed or stored) appears due to the docValues=true being automatically applied.
- A field (see the Results) section - all stored fields can be returned within the results AND the none_field (i.e. not analysed or stored) appears due to the docValues=true being automatically applied.
- A specific search field (see the Highlighted fields which are the only ones that respond to the specific search fields) - all fields that are both analysed AND stored fields can be specific search fields.
The Solr Configuration File
The Solr configuration file (solrconfig.xml) with comments is over 1,000 lines long, there are two parts of the configuration file that may be of interest, namely the Query Request Handler and the Highlighting.
Query Request Handler
This handler defines the default field that will be used for the keyword query.
01 02 03 04 05 06 07 08 09 |
<!-- A request handler that returns indented JSON by default --> <requestHandler name="/query" class="solr.SearchHandler"> <lst name="defaults"> <str name="echoParams">explicit</str> <str name="wt">json</str> <str name="indent">true</str> <str name="df">text</str> </lst> </requestHandler> |
The default field that Solr will search against if no search field is set. Panl does not set the search field and relies on this default.
In the managed-schema.xml file for the mechanical pencils collection, all fields are copied to this text field, which can then be searched upon.
01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 |
<copyField source="brand" dest="text" /> <copyField source="name" dest="text" /> <copyField source="mechanism_type" dest="text" /> <copyField source="nib_shape" dest="text" /> <copyField source="body_shape" dest="text" /> <copyField source="grip_type" dest="text" /> <copyField source="grip_shape" dest="text" /> <copyField source="cap_shape" dest="text" /> <copyField source="category" dest="text" /> <copyField source="nib_material" dest="text" /> <copyField source="mechanism_material" dest="text" /> <copyField source="grip_material" dest="text" /> <copyField source="body_material" dest="text" /> <copyField source="tubing_material" dest="text" /> <copyField source="clip_material" dest="text" /> <copyField source="cap_material" dest="text" /> <copyField source="colours" dest="text" /> <copyField source="variants" dest="text" /> <copyField source="description" dest="text" /> |
|
IMPORTANT: Panl does not include a way to set the search fields for a phrase query and relies on the default Solr field to search upon |
Highlighting
Highlighting results is not suited to every search implementation, and is generally better implemented when searching through large volumes of multi-page documents where you just want to return the highlighted text, and a link to the appropriate document, rather than a large number of result fields.
Web searches are a good example, you don't want to return all information, just the relevant part of the information that is available. In the below image for a DuckDuckGo search for the keywords "getting started with synapticloop", the following (first) result is returned:
Image: The DuckDuckGo search for "getting started with synapticloop" which has the highlighted words in bold.
This section of the Solr managed schema file starts around line 1060 and runs for just over 100 lines; only some parts are included in this book for brevity. The highlighting component controls what text any highlighted words will be surrounded with when they are returned with the Solr results. This doesn't impact the Panl server in any way, however if you are going to use highlighting, then there are some important considerations in order to enable this.
If highlighting is enabled in the <panl_collection_url>.panl.properties file via setting the solr.highlight=true property, then the following rules are applied:
Highlighting requires that you have a uniqueKey defined in your schema. - In the mechanical-pencils collection, the unique key is id (mapped to the LPSE code of i)
- Panl will pass through the hl=true Solr query parameter and the hl.fl=* parameter.
- Panl will use the unified highlighter
- In the Panl Results Viewer web app, the functionality for utilising the returned highlighted results has only a lightweight implementation.
The configuration for how much text to be returned is determined by the fragmenter. By default Panl uses the gap fragmenter, should you wish to configure another fragmenter, edit the solrconfig.xml file to set the default.
01 02 03 04 05 06 07 |
<fragmenter name="gap" default="true" class="solr.highlight.GapFragmenter"> <lst name="defaults"> <int name="hl.fragsize">100</int> </lst> </fragmenter> |
The configuration for how the highlighted text is marked up is defined by the formatter which surrounds the highlighted text by an <em> tags (Lines 5-6 below) e.g. with a search for the keyword 'Mary', one of the highlighted results returned is:
"Frankenstein," written by <em>Mary</em> Shelley, is a groundbreaking novel that explores the themes of creation, ambition, and the consequences of playing God.
01 02 03 04 05 06 07 08 |
<formatter name="html" default="true" class="solr.highlight.HtmlFormatter"> <lst name="defaults"> <str name="hl.simple.pre"><![CDATA[<em>]]></str> <str name="hl.simple.post"><![CDATA[</em>]]></str> </lst> </formatter> |
See the Solr documentation for in-depth information:
https://solr.apache.org/guide/solr/latest/query-guide/highlighting.html
|
IMPORTANT: Solr will only return fields for which are analysed. I.e. fields in the managed schema file that have a type that is analysed. |
~ ~ ~ * ~ ~ ~