Appendices
Definitions
#
42: The meaning of life
404: An HTTP status code returned when a resource could not be located by the server.
500: An HTTP status code returned by the server when an error occurred that stopped the processing of a request.
A
Apache: Refers to the Apache Software Foundation, the producers of an HTTP Server and a fine collection of software and libraries, also the author of the Solr search server.
B
Boolean facet: A Solr field which can only be one of two values, 'true' or 'false'.
Boolean value replacement: For a 'true' or 'false' value from a Solr field can have their 'true' or 'false' value replaced with more meaningful text, which will be converted by the Panl server.
C
CaFUPs: An acronym for Collections and FieldSet URL Paths. This is the URL path that the Panl server will handle to return results from the Solr collection with the configured field names.
Collection: A collection is either a single logical search index that uses a single Solr configuration file (solrconfig.xml), or a collection of properties and field sets for the Panl server. (See also: Solr collection, and Panl collection).
D
DATE range facet: A Solr facet of type solr.DatePointField that can be used to filter results from the time period NOW to some arbitrary number of hours, days, months, or years, or from some arbitrary number of hours, days, months, years to NOW.
Document: A single result returned from the Solr search server.
E
F
FieldSet(s): A list of Solr field names that will be returned by the Panl server in the result documents.
Facet: Also referred to as a regular facet, a defined field in the Solr and Panl configuration that allows a user to filter the results by selecting a particular value for this field that the document must have. The returned facets have a Solr field name, a Panl field name, a value, and a count. (See also: Regular facets, OR facets, RANGE facets, DATE range facets, and Boolean facets)
Facet count: The number of documents which have a particular facet value assigned to them.
Facet index: Solr nomenclature for the value of a facet.
Facet value: The value of the Solr facet field as indexed by the Solr search server.
G
Hierarchical facet: A facet that will only appear if another facet or parameter has already been selected. This is defined by the panl.when.<lpse_code> property.
H
Hierarchical facet: A facet that will only appear if another facet or parameter has already been selected. This is defined by the panl.when.<lpse_code> property.
Highlighting: When configured, highlighting will return the matching text surrounded by markup (e.g. HTML) so that it may be rendered in a different fashion to bring attention to the text.
I
Infix: Text that is placed between the range values. Note: this is only available for RANGE facets.
J
JSON: JavaScript Object Notation, an open standard file and data interchange format that uses human-readable text to store and transmit data objects consisting of attribute-value pairs.
K
L
LPSE code: The Last Path Segment Encoded string that the Panl server uses to parse the preceding URL path.
M
Mechanical Pencil: (or a clutch pencil) is a pencil with a replaceable and mechanically extendable graphite core (or "lead"). The lead is not attached to the outer casing, and can be mechanically extended as its point is worn away from use.
N
Number per page parameter: Defines the number of results to return per page from the Solr server. This may be set to any positive integer value greater than 0 (zero).
O
OR facet: A configured facet that will allow returning multiple values from a facet that normally would allow the user to select only one.
P
Page parameter: A parameter which defines the page number of results that the user will be shown.
Panl collection: A collection of properties and field sets that control how the facets are configured and how the search results are returned.
Panl collection URL: Unlike a Solr collection of documents, a Panl collection is a collection of field sets and configuration. There can be multiple field sets per configuration file, and multiple Collection URLs can be connected to a Solr search collection. The Panl collection URL is made up of the selected Panl collection and selected FieldSet.
Panl field: A one-to-one mapping between a Solr field to a Panl field which is referenced throughout the configuration.
Panl generator: A utility function within the Panl server that will generate a starting panl.properties file and a panl_collection_url.panl.properties file from a managed-schema.xml Solr collection configuration file.
Panl server: The HTTP server that acts as the proxy between LPSE encoded URL parts and the underlying Solr Server
Panl release package: A binary release as a .zip or .tgz file that can be downloaded from the Github releases page for Synapticloop Panl project.
Parameters: Parameters are specialised forms of LPSE codes that do not directly filter the results, but alter the way in which the results are returned.
Prefix: Text that is prepended to a facet value for the URL path, and optionally for the display of the value.
Properties: A text file format where each line defines a parameter which is stored as a pair of strings, one storing the name of the parameter (called the key), and the other storing the value
Q
q: The query operator URL parameter that the Solr server responds to when searching text within the collection.
q.op: The query operand URL parameter that the Solr server responds to to determine whether the search term should be found in one of the fields (OR), or all of the fields (AND).
Query term: Text that is either a word or a phrase that is used to search across the Solr collection index. Unlike facets, this is not limited to the values of the Solr facets.
Query parameter: The parameter that is passed through as a URL parameter from a text input field by an HTML form.
R
RANGE facet: A specialised form of facet which allows a search to be performed on a range of values. If the Solr document value matches between the minimum and maximum range values, then it will be included with the results.
Replacement value: A value that replaces another value, used in BOOLEAN facets to replace a "true" or "false" value with another more human-readable value, and in RANGE facets for minimum and maximum values.
Regular facet: The default type of facet that allows filtering on a particular value.
S
SEO: An acronym for Search Engine Optimisation which is a set of practices designed to improve the appearance, positioning, and usefulness of content in the search results for search engines.
Solr collection: A collection of documents that Solr has indexed and is searched upon to return results. (See also: Panl collection)
Solr field: The name of a field that is defined within the managed-schema.xml file, which may be indexed, stored, and.or analysed.
Solr server: The Solr server
Solr schema: The XML file that defines the fields, their types, what
Sort order: The field, or fields of a document which will be either ascending or descending ordered.
Suffix: Text that is appended to a facet value for the URL path, and optionally for the display of the value. This also applies to a RANGE facet when appending the text to the end of a range query.
Synapticloop: The people behind the Panl server and generator.
T
.tar: A bundle of files placed together into the Tape ARchive format
Token: Panl parses and decodes the incoming request URL path and turns each of the path parts into a token. If there was an error in either the parsing or decoding, then this token will be marked as invalid and will not be passed through to the Solr server.
U
URI: Uniform Resource Identifier which is a unique sequence of characters that identifies an abstract or physical resource.
URL: Uniform Resource Locator is a subset of URI which specifically references a webpage.
URL path part: A part of the path of a URL, in the examples in the book the path part is taken as everything after the hostname. A URL is of the form:
scheme ":" ["//" authority] path ["?" query] ["#" fragment]
V
W
Wildcard: Designates that a value that is used will match all values - used by the Panl server within a range query to indicate that it should be any number either below or above the selected minimum/maximum value.
X
x: x marks the spot.
XML: eXtensible Markup Language which is the format the Solr uses for the managed schema and configuration files.
Y
y: Why not?
Z
Zip: A file compression format allowing multiple files and directories to be easily packaged into a single file.
Command Line Options
The Panl release package has two modes
- Panl server - serving up the search results, and
- Panl generator - to generate configuration files for an existing collection.
For both of the modes, the usage describes the invocation through the Java JRE, rather than the supplied executable files (i.e. bin/panl and bin\panl.bat), however the command line options are the same.
|
IMPORTANT: Throughout this section, the filesystem paths are described using the *NIX nomenclature of a forward slash '/' between the directories, rather than the backslash of Microsoft Windows systems '\'. So please take care when copying the commands. |
Panl Server
The Panl server may be started by
bin/panl server
Which will start the server with default values:
- Looking for a panl.properties file in the directory that the command was executed
- Binding to the default port of 8181.
To set the panl.properties file that will be referenced, use the -properties command line option.
To set the port that you want to bind the properties to, use the -port command line option.
Panl Generator
The Panl generator can be invoked by
bin/panl generate -schema path/to/solr/managed-schema.xml
You MUST, at a minimum, pass through the path to the Solr managed schema file with the -schema command line option.
The default output directory and file name is the directory where this command was invoked and the filename will be panl.properties. Should you wish to change this, use the -properties command line option with the filename. If the option points to a directory, the default filename of panl.properties will be used. Note: This will also be the directory that the <panl_collection_url>.panl.properties will be written to as well.
Panl generator will fail if the files already exist, however you may pass through the command line switch of -overwrite with a value of true to overwrite the files (the default is false).
Usage Text
You can see the complete usage (and possibly updated) instructions on the GitHub repository:
https://github.com/synapticloop/panl/blob/main/src/main/resources/usage.txt
01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 |
__ .-----.---.-.-----.| | | _ | _ | || | | __|___._|__|__||__| |__| ... .-..
~ ~ ~ * ~ ~ ~
Usage: java -jar panl.jar \ [command] \ [-properties properties_file_location] \ [-overwrite true_or_false] \ [-config solr_config_location] \ [-port port_number]
Where: [command] is one of: server - start the server, or generate - generate an example panl.properties file from an existing Solr managed-schema.xml file
To start the Panl server
java -jar panl.jar \ server \ [-properties properties_file_location] \ [-port port_number]
To generate the panl.properties file for each collection:
java -jar panl.jar \ generate \ [-properties properties_file_location] \ [-overwrite true_or_false] \ -schema solr_schema_location
If you choose the 'server' command, the following command line options are available:
[-properties properties_file_location] (optional - default panl.properties) the properties file to load, if this property is not included, the default panl.properties file will be used which __MUST__ reside in the same directory as the server start command
[-port port_number] (optional) the port number to start the server on. The default port number is 8181
If you choose the 'generate' command, the following command line options are available:
[-properties properties_file_location] (optional - default panl.properties) the base properties file to write the generated configuration out to. If this property is not included, the default panl.properties filename will be used with each collection file that is generated named: <panl_collection_url>.panl.properties NOTE: If the files exist, the generation will TERMINATE, you will need to remove those files before the generation - unless you have the -overwrite true command line option present
[-overwrite true_or_false] (optional - default false)
-schema solr_schema_location(s) (mandatory) the managed-schema.xml configuration file(s) to read and generate the panl.properties file from. NOTE: For multiple files, use comma separated values |
Sample .properties Files
The Sample panl.properties Files
There are multiple sample properties files that are included in the downloaded packaged and can be viewed online:
Mechanical pencils
The default collection for running the examples through the book.
All
This file will start the Panl server with all included examples - however for it to display results correctly, the all datasets must be indexed.
https://github.com/synapticloop/panl/blob/main/src/dist/sample/panl/all/panl.properties
Book store
Just the book store demo dataset with hierarchical and facet ordering.
https://github.com/synapticloop/panl/blob/main/src/dist/sample/panl/book-store/panl.properties
Mechanical pencils OR
The mechanical pencils collection with the manufacturer as an OR facet
Simple date
The simple date collection with DATE range facets.
https://github.com/synapticloop/panl/blob/main/src/dist/sample/panl/simple-date/panl.properties
The contents of these files are unlikely to change between publishing this book and updates to the codebase. To see the most up-to-date comments and instructions for usage, the template file that the Panl generator utility uses as a base can be found here:
https://github.com/synapticloop/panl/blob/main/src/main/resources/panl.properties.template
The Sample <panl_collection_url>.panl.properties Files
The sample file for the mechanical pencils example used in this book can be viewed online:
Mechanical pencils
Additional properties files are also included in the download package
Book store (with Hierarchical facets and facet ordering)
Mechanical pencils OR faceting
Simple date (with DATE range facets)
The contents of this file is unlikely to change between publishing this book and updates to the codebase. To see the most up-to-date comments and instructions for usage, the template file that the Panl generator utility uses as a base can be found here:
Solr Version 8 & 7 Integration Notes
SolrJ
The SolrJ connectors are changed, with the available list of SolrJ clients for version 8:
- HttpSolrClient
- Http2SolrClient
- LBHttpSolrClient
- LBHttp2SolrClient
- CloudSolrClient
- CloudHttp2SolrClient
And for version 7
- HttpSolrClient
- LBHttpSolrClient
- CloudSolrClient
Solr Configuration files
The managed-schema.xml file has now been renamed to simply managed-schema.
The Solr configuration file solrconfig.xml files change between versions.
JSON Response Object
The JSON response object has also changed
|
IMPORTANT: The returned Solr JSON object has changed, however the returned Panl response JSON object has not. This will only ever affect your integration when dealing with the Solr JSON. The Panl Results Viewer web app has taken this into account to support all versions. |
In Solr version 7 and 8:
01 02 03 04 05 |
{ "facet_counts": { ... }, "response": [ ... ], "responseHeader": { ... } } |
Note: The response key is a JSON array of results (line 3)
Whilst in Solr 9:
01 02 03 04 05 06 07 08 09 10 11 |
{ "facet_counts": { ... }, "response": { "docs": [ ... ] "numFound": 55, "start": 10, "maxScore": 1, "numFoundExact": true }, "responseHeader": { ... } } |
Note: The response key is a JSON Object of results (line 3 to 9 in bold above) with additional details.
Setting up a Solr 7 or 8 server
The main difference between Solr 7, 8 and Solr 9 is that the configuration MUST be uploaded to the zookeeper instance first, before the collection is created.
Additionally, there is no bin\post command for Windows machines so this is done through a Java command.
Windows Commands
|
IMPORTANT: Each of the commands - either Windows or *NIX must be run on a single line - watch out for ↩ continuations. |
- Create an example cloud instance
This requires no interaction, will use the default setup, two replicas, and two shards under the 'example' cloud node.
Command(s) |
cd SOLR_INSTALL_DIRECTORY
bin\solr start -e cloud -noprompt |
- Create the configuration for the mechanical pencils
This will create and set up the mechanical pencil schema so that a collection can be created and the data can be indexed.
Command(s) |
cd SOLR_INSTALL_DIRECTORY
bin\solr zk upconfig -d ↩ PANL_INSTALL_DIRECTORY\sample\solr\mechanical-pencils -n mechanical-pencils ↩ -z localhost:9983 |
- Create the mechanical pencils collection
This will create and set up the mechanical pencil collection and schema so that the data can be indexed.
Command(s) |
cd SOLR_INSTALL_DIRECTORY
bin\solr create -c mechanical-pencils -n mechanical-pencils ↩ -s 2 -rf 2 |
- Index the mechanical pencils data
This will index the included sample mechanical pencil data into the Solr instance in the mechanical-pencils collection.
Command(s) |
cd SOLR_INSTALL_DIRECTORY
java -Dc=mechanical-pencils -Dtype=application/json -jar example\exampledocs\post.jar ↩ PANL_INSTALL_DIRECTORY\sample\data\mechanical-pencils.json |
- Start the Panl Server
This will start the server and be ready to accept requests.
Command(s) |
cd PANL_INSTALL_DIRECTORY
bin\panl.bat -properties ↩ |
- Start searching and faceting
Open the link http://localhost:8181/panl-results-viewer/ in your favourite browser, choose a collection/FieldSet and search, facet, sort, paginate and view the results
For the simple-date dataset, the commands are almost identical, and, assuming that the Solr cloud is set up:
Command(s) |
cd SOLR_INSTALL_DIRECTORY bin\solr zk upconfig -d ↩ PANL_INSTALL_DIRECTORY\sample\solr\simple-date -n simple-date ↩ -z localhost:9983
bin\solr create -c simple-date -n simple-date -s 2 -rf 2
java -Dc=simple-date -Dtype=application/json -jar example\exampledocs\post.jar ↩ PANL_INSTALL_DIRECTORY\sample\data\simple-date.json |
*NIX Commands
The *NIX commands are as per the windows section above with the file path delimiter changed from '\' to '/'
|
IMPORTANT: Each of the commands - either Windows or *NIX must be run on a single line - watch out for ↩ continuations. |
- Create an example cloud instance
This requires no interaction, will use the default setup, two replicas, and two shards under the 'example' cloud node.
Command(s) |
cd SOLR_INSTALL_DIRECTORY
bin/solr start -e cloud -noprompt |
- Create the configuration for the mechanical pencils
This will create and set up the mechanical pencil schema so that a collection can be created and the data can be indexed.
Command(s) |
cd SOLR_INSTALL_DIRECTORY
bin/solr zk upconfig -d ↩ PANL_INSTALL_DIRECTORY/sample/solr/mechanical-pencils -n mechanical-pencils ↩ -z localhost:9983 |
- Create the mechanical pencils collection
This will create and set up the mechanical pencil collection and schema so that the data can be indexed.
Command(s) |
cd SOLR_INSTALL_DIRECTORY
bin/solr create -c mechanical-pencils -n mechanical-pencils ↩ -s 2 -rf 2 |
- Index the mechanical pencils data
This will index the included sample mechanical pencil data into the Solr instance in the mechanical-pencils collection.
Command(s) |
cd SOLR_INSTALL_DIRECTORY
java -Dc=mechanical-pencils -Dtype=application/json -jar example/exampledocs/post.jar ↩ PANL_INSTALL_DIRECTORY/sample/data/mechanical-pencils.json |
- Start the Panl Server
This will start the server and be ready to accept requests.
Command(s) |
cd PANL_INSTALL_DIRECTORY
bin/panl -properties ↩ |
- Start searching and faceting
Open the link http://localhost:8181/panl-results-viewer/ in your favourite browser, choose a collection/fieldset and search, facet, sort, paginate and view the results
For the simple-date dataset, the commands are almost identical, and, assuming that the Solr cloud is set up:
Command(s) |
cd SOLR_INSTALL_DIRECTORY
bin/solr zk upconfig -d PANL_INSTALL_DIRECTORY/sample/solr/simple-date -n simple-date ↩ -z localhost:9983
bin\solr create -c simple-date -n simple-date -s 2 -rf 2 PANL_INSTALL_DIRECTORY/sample/data/simple-date.json |
Additional Solr Version 7 Integration Notes
In the Panl response object, the num_results_exact (line 3 in bold below) will ALWAYS be true, as this version of Solr does not have this data available.
01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 |
{ "num_pages": 6, "num_results_exact": true, "page_uris": { "next": "/page-2/p/", "before": "/page-", "after": "/p/" }, "page_num": 1, "num_results": 55, "num_per_page": 10, "num_per_page_uris": { "before": "/", "after": "-per-page/n/" } } |
Solr Versions 6 and below
Unfortunately version 6 and earlier have no release packages to be built, the only way to have these built is to edit and compile the code from source.
https://github.com/synapticloop/panl/
Starting with the branch solr-panl-7 is probably the wisest - these are the integration points that will need to be looked at.
- Create a new branch with your version of Panl - e.g. Version 6 would be branched as solr-panl-6
- Edit the /build.gradle file
- change the distributionBaseName - e.g. solr-panl-6
|
distributions { main { distributionBaseName = 'solr-panl-6' } } |
- Edit the Solrj dependencies, the dependency for SolrJ can be found on any maven repository:
https://mvnrepository.com/artifact/org.apache.solr/solr-solrj
|
dependencies { ...
// solrj implementation 'org.apache.solr:solr-solrj:?.?.?'
... } |
- Edit the Solr schema configuration and managed-schema
- Copy over the version schema from the Solr version that you are using. (this can be found in the SOLR_INSTALL_DIRECTORY/example/files/conf directory - or equivalent for your Solr version).
- Edit the file to include the relevant fields that you want to index and search on
- Look at the com.synapticloop.panl.server.client package, adding in the Solr Clients that are applicable to your version
- Update the com.synapticloop.panl.server.client.PanlClient class to reference the correct clients.
- Update the com.synapticloop.panl.server.handler.helper.CollectionHelper#getPanlClient() factory method to return the correct client
|
IMPORTANT: The returned JSON object may have changed between versions which may cause problems with generation and returning the Panl response.
There may be other integration points for version 6.x.x and lower, however the instructions above where the changed integration points from version 9 to version 8 and 7. |
~ ~ ~ * ~ ~ ~
Getting Started With Synapticloop Panl
A rather pleasing companion to the Apache® Solr® Faceted Search Engine.
[1] 'Sensibly' is a bit of a vague term... Panl strips out any unexpected characters and ensures that it is valid. For example, if Solr (and therefore) Panl is expecting an integer parameter and the value 5gs6 is passed through, Panl will remove any non-numeric characters and parse the number - returning 56. For values that cannot be converted, the value will be ignored and not passed through to the Solr server.
[2] Thanks for checking out this footnote.
[3] A LPSE length of 3 with the five mandatory codes would provide 185,193 facets, a length of 4 would provide 10,556,001
[4] A LPSE length of 3 with the five mandatory codes and one optional code would provide 175,616 facets, a length of 4 would provide 9,834,496
[5] Examples with specific dates are notoriously hard to put into examples as by the time you read this book, the example dates will be well out of range. There is an example data set (simple-date) which is included within the release package which has random dates spanning 2014 to 2032 which can be used to test out the features, however you will need to index the data set with separate commands.
[6] This is probably not the fairest of comparisons, as a lot of the underlying Solr query implementation could be hidden behind the scenes anyhow. However, what Panl can do is automatically have CaFUPs for multiple FieldSets, facets, and queries which will automatically build the query, the returned facets, the fields, and more.
[7] The exception to this rule are any defined OR facets, which will increase the number of results that are returned.
[8] The example data 'techproducts' included with the Apache Solr instance is a reasonable test dataset, however, the way the schema and collections are designed places an emphasis more on testing ingestion and searching, rather than on a functional search set.
[9] I am using an Apple Mac system, but it is the same for Linux
[10] Commands weren't included in this as a recursive force (i.e. rm -rf or rmdir /S /Q) deletion of directories can be a very dangerous thing.
[11] Historically, Java based examples for servers seem to have been based on the ubiquitous Pet Store, time for something new.
[12] Or, the mistakes that were made with the implementation.
[13] At the time of writing no useful results were returned by any of the large search engines for 'Solr Panl'.
[14] In some instances, the properties file layout would have been better suited to JSON, however, comments are not allowed in JSON files, which makes explaining the file a lot harder. Admittedly, HJSON could have been used, and parsed on the way into Panl, but this would reduce portability - sigh - these are the decisions which can reverberate through time and code.
[15] This started off as a simple way to test the Panl configuration and how it interacted with the Solr search server, over time, it became a little more complex. It also became an incredibly useful tool when adding features to the returned JSON object so that integration and implementation became a more developer friendly experience.
[16] Once again, a simple explainer turned into a more complex application as time and requirements became more involved.
[17]Admittedly this is rather annoying having to know the value ranges ahead of time, however there are some niceties built into Panl to use the minimum and maximum values.
[18]Once again, annoying to have to know the values.
[19] Although German users should be fine with the default implementation from Panl.