README.1ST
Welcome To Synapticloop Panl
Synapticloop Panl is a light-weight application server that is designed to sit between your web application and Solr search server instance(s) seamlessly converting human-readable, SEO friendly URLs into complex Solr search queries, and returning an enhanced JSON object for ease of integration and implementation.
It abstracts away the complexities of the Solr search parameters and building/translating of URLs so that you get the benefit of a human readable (and SEO friendlier) URLs without having to have a deep understanding of the mechanics behind it.
Some examples contained in this book also contain the conversions that Panl performs, and the Solr query that is executed, should you wish to delve deeper into the inner-workings of the Panl server's integration with Solr.
Additional Panl Niceties
- MULTIPLE ways to 'SLICE and DICE' - From one Solr collection, the Panl server can present the results and facets in multiple different ways, providing individual use cases for specific needs.
- PREFIXES and SUFFIXES - Panl can be configured to add prefixes and suffixes to the values within the URL path to increase readability, for example,
The LPSE URL path of /Caran+d'Ache/true/Black/bDW/ could also have the brand Solr field prefixed with 'Manufactured By ' and suffixed by ' Company' to produce the URL path
/Manufactured+By+The+Caran+d'Ache+Company/true/Black/bDW/ - BOOLEAN value translations - For any Solr field that is defined as a solr.BoolField, then an additional translation can be performed. 'True' and 'false' values can be replaced with arbitrary text, which will be transparently converted between Panl and Solr.
For the LPSE URL path of /Caran+d'Ache/true/Black/bDW/ the true value (which is defined as whether the mechanical pencil can be disassembled) could be replaced with 'Able to be disassembled' for true values, and 'Cannot be disassembled' for false values. The above URL path would then become
/Caran+d'Ache/Able+to+be+disassembled/Black/bDW/ - FIELD VALUE validation - By default, Solr will error when an invalid value is passed through - for example, if Solr is expecting a numeric value and it could not parse the passed in value, it will throw an exception. Panl protects against this by attempting to parse the value as best it can, and silently dropping the parameter if it cannot be sensibly[1] parsed. This is only done for numeric types (integer, long, float, and double).
- HIERARCHICAL facets - Only show facets if a separate facet is currently selected, allowing you to lead users through the search journey, only displaying facets that help the user narrow down their results.
For example, you may be presenting a search page for Cars and you only want to show the car models once the brand of cars has been selected first. - SORTED facets - Each individual facet can be sorted by either the facet count (which is the default), or the facet value (e.g. alphabetic/numeric).
- MORE facets - Solr (and Panl) configures a limit for the maximum number of facet values that are returned, this functionality enables you to load more facet values if they are available but weren't returned with the results by default.
- RESULTS SORTING options - Sort by any of the Solr fields, either ascending, or descending and with multiple sub-sorting available - e.g. sorting by a brand name, then the model number.
- PAGINATION - Panl will return all of the variables required to easily generate pagination URL paths giving you options and control over your own implementation.
The returned variables are the number of pages of results, the number of results per page, the total number of results, the current page number and whether the returned results are an exact number. - STATIC SITE GENERATION - With the exception of a query parameter, all available links for every conceivable URL path can be statically generated ahead of time, with canonical URLs.
Be warned that the number of possible pages that can be generated can quickly become incredibly large. - STATELESS - No state is stored in the Panl server, all of the state is from the URL path part that is requested. No sessions, no memory, nothing to backup, easy to update and quick to start and restart.
- TEXT CONFIGURATION - All configuration for Panl is based on text files (Java .properties) files so they can be stored in a source code management system. Additionally, upgrades to the Panl server are easy and with quick startup times, any configuration changes will be seen instantly.
About This Book
This book describes and explains the functionality of the Panl server, how to configure the server, and how this impacts the generated URL paths.
To start with, this book will take you through setting up and running a new Solr instance in cloud mode, creating a new collection, indexing the included sample data, and seeing the results and facets through the in-built Panl Results Viewer web application.
The book then continues to explain the configuration and integration of the Panl server, with the assumption that there is already a running Solr instance behind it.
This book is not designed to be an introduction into Solr configuration, administration, or schema design best-practices, however there are some hints and tips throughout the book. These hints and tips relate to items that will affect the results that you retrieve from the Panl server, Solr configuration, and the integration and implementation.
Nomenclature Used Throughout This Book
When implementing any faceted search interface the following terms are the foundation and are used throughout the book:
Documents
Solr nomenclature for the results that the Solr search server returns, these are a subset of the data that is indexed in the Solr search server. You can think of these as the rows of results that will be returned.
Facets
Facets are specific categories/attributes/values that are extracted from the data and are attached to the index. Each of these attributes can then be used to filter the results such that only the documents that contain those attributes are returned. The image below shows the different parts of the facets.
Image: A screenshot of the Panl Simple Results Viewer showing the 'Mechanism Type' facet and describing the parts of the returned Solr facet information
In the above image:
- The Solr field name is mechanism_type, what is being rendered to the page is the Panl name - 'Mechanical Pencils' , the (m) after the name is the Panl LPSE code, which is output for reference in the Panl Results Viewer web app.
- The facet values represent the indexed attribute values that are attached to the document, they are:
- Clutch
- Click
- Magnetic
- None
- The facet counts represent the number of documents that contain this value, respectively they are:
- 30
- 20
- 1
- 1
- The [add] link is generated by the Panl Simple Results webapp from the returned JSON results object. This link is in the Panl LPSE form.
Search term
The text (either a word or phrase) that is submitted from a form on the web page through to the Panl server, which is passed to the Solr search server to query against the collections' indices. (also known as 'search query', 'keyword search', 'search phrase', or some other combination or words).
Additional introductions to common words and phrases used throughout the book are below. Terms and names are defined where they first appear, for a full list - see the Appendices - Definitions at the end of the book.
CaFUPs
An acronym for Collection and FieldSets URL Paths - Panl allows many different groups of fields (the FieldSet) to be bound to a specific Collection which is a unique URL served by Panl.
CaFUPs allow you to configure multiple ways in which the search results and fields are returned for any specific Solr Collection.
Collection(s)
Solr collections are an index of documents that can be filtered or searched upon.
Panl collections are collections of URL paths and FieldSets
LPSE codes
The foundation of how the Panl server decodes and parses a URL to convert it to a form that the underlying Solr server can understand. A LPSE code is either a number, or an uppercase or lowercase letter of the alphabet (i.e. a-z, A-Z, 0-9). These codes are placed in the last path part of the URL.
LPSE path
The LPSE path is a string of URL path values, that, inconjunction with the LPSE codes above is how the Panl server decodes the URL into a Solr server query.
Panl field
This is the field definition that contains the configuration of what parsing should be done on the incoming value. Additionally, it contains the configuration information as to how to pass this value through to the Solr search server.
Panl generator
A stand-alone utility to quickly generate a panl.properties file and <panl_collection_url>.panl.properties file from an existing Solr managed schema file that can be used as a starting point for configuring the Panl server.
NOTE: The generator does not interact or interfere with the Panl server and the generator codebase is not used when serving production content.
Panl server
The server that handles the URLs, builds the Solr request object, connects to the Solr search server, executes the query, parses the results, builds the JSON response and passes it back to the caller.
Solr field
The definition of the field in the managed-schema.xml Solr configuration file which determines what features can be used by the Panl server.
Solr query
The query string that is sent to the Solr search server, an example of this is:
q=*:*&q.op=OR&facet.limit=100&fl=brand,name&facet.mincount=1&rows=10&facet.field=le
ad_size_indicator&facet.field=colours&facet.field=brand&facet.field=mechanism_type&
facet.field=id&facet.field=hardness_indicator&facet.field=lead_grade_indicator&face
t.field=in_built_sharpener&facet.field=disassemble&facet.field=category&facet.field
=lead_length&facet.field=in_built_eraser&facet.field=grip_shape&facet.field=weight&
facet=true&fq=id:"53"&start=0
Solr search server
The Apache Solr search server that is queried for results.
Tokens
The incoming LPSE code and any associated URL path values for each of the codes. Tokens will be parsed, prefixes and suffixes removed, and validation performed on the incoming value. If any parsing or validation fails, then the token will be marked as invalid, ignored, and not passed through to the Solr server.
Book Format Conventions
Normal Text
Normal paragraph text is Libre Baskerville, 11pt, other formatting conventions are detailed below
Footnotes[2]
Footnotes aren't used very often, but when they appear they can be safely ignored - these are more background thoughts as to why things were implemented the way they were. This will not impact the running of the Panl Server.
Sidebars
|
IMPORTANT: Important notes are within a red side-bordered box, with an exclamation icon, and red background. Careful note should be made of the information contained within these boxes as this will affect the running of the Panl server, and there may be non-obvious side-effects. |
|
Notes: Notes are within a black side-bordered box, with a pencil icon, and grey background. This is something to look out for when you are reading the book, executing a task, or looking at an image or URL. |
|
Tips: Notes are within a black bordered box, with a lightbulb icon, and white background. This is something which is a handy idea to know for the functioning or configuration of the Panl or Solr servers. |
Code Related Snippets
Inline code, or text related snippets are in monospaced text (Inconsolata 11pt), and highlighted in grey, for example: /Caran+d'Ache/true/Black/bDW/. This indicates that the text is exact and should be used as a reference.
For multi-line code related snippets the text appears in a black-bordered grey box prefixed by a line number so that they can be referenced within the description text.
Note that within any line, there may be a line continuation character (↩) which should not be included in the command. Unfortunately, for electronic viewers this means that it is a little more difficult to simply copy and paste the text - my apologies, however I chose readability and explanation of the text over cut-and-paste-ability.
01
02 03 |
# The text file that may be included, with some information or processing ↩ # a line of commented text # this is another line of text |
Commands
Any commands that should be run in your terminal or command line prompt will appear in a formatted table. Note the '↩' character which means a line continuation and should not be included in the command. (As per above with the copy and paste-ability of the lines).
Command(s) |
\the\command\that\needs\to\be\run -with "a parameter" -and-another ↩ |
Links
Links are designated by underlining the text of the link. If the text is underlined, it is either a link to another section of this book, or to an external website.
External links to websites (either local or remote) are in the standard blue underline and will ALWAYS match the URL that will open in your browser (i.e. The URLs are never truncated, even if they take multiple lines):
Links to other sections or chapters within this documentation are bold, underlined, and in black text:
Integrating An Existing Solr Schema
About Panl Server
The Panl server is an interface into the Solr search server converting human-readable, SEO friendly URL paths into complex Solr search queries. Rather than adding query parameters to the URL, Panl automatically generates and returns complete URL path links that can be rendered by your web application.
The Panl server uses last path segment encoding (LPSE) to parse and decode the full URL path, converting a URL path from
/Manufactured+by+Koh-i-Noor+Company/Clutch/Green/bmWsb-sN-/
|----------------LPSE PATH----------------------|LPSE code|
To a search query that will return a list of mechanical pencils that
- Are manufactured by Koh-i-Noor,
- Have a Clutch mechanism, and
- Are Green in colour
And, of the 8 results that are returned, the results will be sorted
- By brand name descending (sb-), then by
- Pencil Model (sN-)
Which is then passed through to the Solr search server as the following query:
q=*:*&q.op=OR&facet.limit=100&fl=brand,name&facet.mincount=1&rows=10&facet.field=lead_size
_indicator&facet.field=colours&facet.field=brand&facet.field=mechanism_type&facet.field=id
&facet.field=hardness_indicator&facet.field=lead_grade_indicator&facet.field=in_built_shar
pener&facet.field=disassemble&facet.field=category&facet.field=lead_length&facet.field=in_
built_eraser&facet.field=grip_shape&facet.field=weight&facet=true&fq=brand:"Koh-i-Noor"&fq
=mechanism_type:"Clutch"&fq=colours:"Green"&sort=brand+desc,name+desc&start=0
|
Tip: See the section on Panl Server for full details on the startup command line options available. |
How Many Facets Does Panl Support?
The number of supported facets depends on the LPSE code length (which by default is 1). A LPSE code is a letter or number which maps to a parameter, operand, field, or filter. There are five mandatory (and one optional) LPSE codes:
- The query parameter,
- The page number,
- The number of results per page,
- The query operand,
- The sort order, and
- (optionally) The pass through parameter
The above configured LPSE codes cannot be registered as facet LPSE codes.
With a LPSE length of 1:
- With the five mandatory codes, Panl will support up to 57 facets.
- With the five mandatory codes and the one optional code, Panl will support up to 56 facets.
With a LPSE length of 2:
- With the five mandatory codes, Panl will support up to 3,249 facets.
- With the five mandatory codes and the one optional code, Panl will support up to 3,136 facets.
The formula for working out what the maximum number of supported facets for the LPSE code is the number of available LPSE codes to the power of the LPSE length:
- With the five mandatory codes[3]:
(62 - 5)^lpse_length = 57^lpse_length - With the five mandatory codes and the one optional code[4]:
(62 - 6)^lpse_length = 56^lpse_length
A LPSE length of 2 should provide more than enough facets for the majority of implementations. Once the LPSE length gets above 2, the LPSE URL path becomes much longer, more quickly, subtly negating the value of the encoding of the URL to be compact and readable.
Remember that you can define multiple Panl collections with CaFUPs for a Solr collection, and each of the CaFUPs can have different LPSE codes. You may have over 56 fields in your indexed Solr collection, but you may wish to have a LPSE length of 1 and just use a subset of the fields for each of the CaFUPs.
Panl URL Structure
Collection Request Handlers
The collection request handler responds with a JSON object that contains the results of a search query with all available facets and documents with the defined FieldSets
Within the Panl server the CaFUPs are defined within the configuration in the form
/<panl_collection>/<fieldset>/<lpse_path>/<lpse_codes>/
And will uniquely return facets and documents with the configured FieldSets. These URLs are configured through the <panl_collection_url>.panl.properties files and will serve as many URLs as configured.
Additionally, there are in-built Panl server URLs which provide additional functionality, the predefined URLs are as follows:
Panl Single Page Handler URLs
The Panl single page handler URLs are designed to return a JSON object that will allow building a single page search interface, they are bound to the following URLs:
/panl-single-page/<panl_collection>/
Where <panl_collection> is the Panl Collection that the single page search UI should be built from.
|
Note: Do not confuse this handler and URL with the in-built Panl Single Page Search UI example bound to the /panl-single-page-search/<panl_collection>/ URL, |
Panl More Facets Handler URLs
The Panl more facets handler URLs are designed to return a JSON object that will provide more facet values for a specific facet and are bound to the following URLs:
/panl-more-facets/<panl_collection>/<fieldset>/<lpse_path>/<lpse_codes>/?code=<lpse_code>&limit=<limit>
Where:
- panl-more-facets is the start of the URL that the Panl server has bound the 'More Facets' handler to
- <panl_collection> is the Panl collection
- <fieldset> is the FieldSet for the fields that will be return with the Solr documents Note: This value is ignored and replaced by the More Facets handler with the 'empty' FieldSet as there is no need to return any other documents
- <lpse_path> is the encoded values for the facets
- <lpse_codes> are the LPSE codes for the <lpse_path> above
- code=<lpse_code> is the query parameter LPSE code for the facet that the additional facet values are requested
- limit=<limit> is the query parameter for the maximum number of facet values to return. Note: If this is set to -1, then all facets will be returned.
This will perform the search and only return the additional facet values for the Solr facet designated by the LPSE code <lpse_code> up to the limit designated by <limit> (or all if the limit is set to -1).
|
IMPORTANT: You __CANNOT__ have a Panl collection start with the prefix 'panl', it is reserved for internal use. If one is attempted to be registered, it will be rejected and the server __WILL_NOT__ start. |
In-Built Web Apps (Viewer / Explainer / Single Search Page)
For testing and debugging of the configured properties, the Panl Results Viewer, Panl Results Explainer, and Panl Single Search Page web apps are included in the Panl release package. This surfaces all Panl functionality and allows integrators and implementers to understand and test the Panl configuration without having to integrate with a separate web application..
|
Tips: The recommendation is to either turn off the Panl Results Viewer / Explainer / Single Page Search, or to not allow public access to these URLs. This can be done by setting the panl.results.testing.urls=false property in the panl.properties file. |
'Simple' Panl Results Viewer Web App
What started as a relatively simple page for testing and debugging turned into a page that was a fully functional faceted search interface, able to highlight all of the functionality of the Panl server and surface most of the Solr search server functionality. It still remains an excellent way to test configuration options.
Below is a screenshot of the in-built Panl Results Viewer web app with all the features and functionality that you would expect from a search page implementation along with some additional features to make searching easier for you and your user.
When the Solr and Panl configuration is set up, the server is up and running, and the testing web apps enabled, it is accessible at:
http://localhost:8181/panl-results-viewer/
Image: The In-Build Panl Results Viewer web app
- A list of available Collections and FieldSet URL Paths (CaFUPs) that Panl is configured to serve. CaFUPs enable different Solr fields and facets to be returned from the same Solr collection.
- A textual representation of the CaFUP that the Panl Results Viewer web app currently is using.
- The canonical URL path (which is returned with the Panl results JSON object) - An important part for search engines to de-duplicate URLs that return exactly the same information. Multiple Panl LPSE URL paths WILL return exactly the same results. You SHOULD use this link as either
- The rel="canonical" link element in the HTML, or
- The rel="canonical" link HTTP header
There is also an [explain] link that will take you to the Panl Results Explainer for this particular canonical URL.
- The search query box, by default, Panl responds to the same URL parameter name as The Solr server - i.e. 'q'. This can be configured to be a different value through the Panl properties file.
- Active filters, either queries, selected facets, or sorting options that are currently limiting the results - the [Remove] link is the URL path that will remove this query, facet, or sorting option from the results. If it is an active sorting filter, the [Change to DESC] or [Change to ASC] links will invert the sorting order without affecting any further sub-ordering.
- RANGE filters, for facets that are defined as ranges - allowing users to select a range of values - the values are inclusive (i.e. include the minimum and maximum values). RANGE filters also include dynamic maximum and minimum values so that the range that is rendered can be automatically updated.
DATE Range filters (not shown[5]), enabling searching over a range of dates (but not a specific date) in the form of:
<next/previous> <any_integer> <hours/days/months/years>.
For example:
Last 30 days
Previous 24 hours
Next 3 years
- Available filters, additional facets that can further refine and limit the Solr search results. This may also display a link to load more facets if the returned number of facets is not the complete set.
- Number of results found, and whether this is an exact match.
- Query operand - whether the Solr search term query is OR, or AND - this affects the search query, not the faceting - i.e. the Solr server q.op parameter.
- Page information, the number of pages, how many results are shown per page, and how many results are shown on this page.
- Sorting options, whether to sort by relevance (the default) or by other configured sorting options with ascending and descending options available. Any Solr field can be configured to be used as a sorting option. And multi-sort orders are available, allowing progressive sorting on more than one field.
- Pagination options - the Panl server returns all information needed to build a pagination system, number of results, number of results shown per page and the current page number.
- Number of results per page. Note: The values 3, 5, and 10 are just examples that are hard-coded into the Panl Results Viewer and can be implemented with any positive integer number.
- Timing information about how long the Panl server took to build and return the results (including how much time the Solr server took to find and return the results).
- The results - the fields that are returned with the documents and are shown in the results sections which are configured by the CaFUPs. Multiple field sets can be configured for the collection, allowing different groups of fields to be returned for different URL paths. In the image, only two fields are configured for this CaFUP, namely Brand, and Pencil Model.
'Simple' Panl Results Explainer
Again, this started as a relatively simple page for testing and debugging of the startup configuration options, rather than trawling through properties files and logs.
Below is a (cut-down) screenshot of the in-built Panl Results Explainer web app with explanations for canonical URL paths, the configuration of the Panl collection URL, and the individually configured properties for each of the fields and how this alters the Solr query. A useful page to see at-a-glance everything that a CaFUP is configured to do.
If enabled, available at http://localhost:8181/panl-results-explainer/
Image: The In-Build Panl Results Viewer web app
- A list of available Collections and FieldSet URL Paths (CaFUPs) that Panl is configured to serve. CaFUPs enable different Solr fields to be returned in the documents with the same search parameters. Clicking on these links will populate the 'Configuration Parameters' and 'Field Configuration Explainer' sections.
- A textual representation of the CaFUP that the Panl Results Explainer web app is currently using.
- The canonical URL path entry field allows you to enter any canonical URL path and have the parsing and tokenising explained to you, including whether the parsed token was valid, the LPSE code found and the original value that Panl attempted to decode. Note: The CaFUP that the canonical URL path came from MUST match the CaFUP on the results viewer.
- The request token explainer - for any canonical URL entered, this will list the parsing and decoding steps, with the following details
- Whether the token is valid (if it is invalid, it will be ignored and not passed through to the Solr search server),
- The type of token that was found,
- The LPSE code,
- The parsed value,
- The original value, and
- Where pertinent, additional information pertaining to the specific code.
- Configuration parameters - parameters that are not fields or facets with information about the value, a description, and the property that set the value.
- Field configuration explainer - for each of the fields or facets that are configured in the LPSE order an explanation of their configuration including:
- The Java field type,
- The LPSE code,
- The Solr field name,
- The Solr field type, the Panl field name, and
- Additional configuration items which may include
- Prefixes,
- Suffixes,
- Ranges,
- Facet type, or
- Minimum/maximum values
- Any configuration warning messages that were found whilst parsing the properties files.
'Simple' Panl Single Search Page Web App
Panl also binds a URL path to enable the building of a single search page interface, and binds a URL path to view a working example of what the single search page could look like.
When the Solr and Panl configuration is set up, the server is up and running, and the testing web apps enabled, it is accessible at:
http://localhost:8181/panl-single-page-search/
Image: The In-Build Panl Single Search Page interface web app for the mechanical pencils collection
- A list of available example Single Search page interfaces that Panl is configured to serve. CaFUPs enable different Solr fields to be returned in the documents with the same search parameters. Clicking on these links will generate a working sample single search page.
- The generated LPSE path that the selections from the search interface will apply.
- All facets that can be selected, presented for the different types of facets, namely OR, RANGE, DATE Range, BOOLEAN, and regular facets .
- The generated LPSE path that the selections from the search interface will apply.
- The search button that will take you to the in-build Panl Results Viewer web app so that you can view the results instantly.
About Panl Generator
The Panl generator is a quick and interactive command line utility built into the Panl release package that, from a Solr managed schema file, generates a default panl.properties and <panl_collection_url>.panl.properties files. This easily and quickly gets things up and running for your existing Solr schema from which you can iterate a solution from.
If you have an existing Solr schema and want to start testing the Panl server integration, then skip to the Integrating An Existing Solr Schema section. If you are skipping ahead and diving straight into the Panl configuration generator, the rest of the sections of the book will give understanding on how to configure the Panl server to suit the requirements of the search page implementation.
|
Tip: See the section on Panl Generator command line options for full details on the options available. |
About Apache Solr
From the Apache Solr website (https://solr.apache.org/)
Solr is the popular, blazing-fast, open source enterprise search platform built on Apache Lucene™
And:
Solr is highly reliable, scalable and fault tolerant, providing distributed indexing, replication and load-balanced querying, automated failover and recovery, centralized configuration and more. Solr powers the search and navigation features of many of the world's largest internet sites.
The Panl server abstracts away the complex Solr query options in both a developer and user friendly way, generating SEO friendlier URLs.
~ ~ ~ * ~ ~ ~