A Walkthrough Example - The Book Store
At this point, you should
- Have a functional Solr server with indexed data.
- Have a brief understanding how to add data to the Solr server (the Additional Data section is a good place to start).
- Have a Panl server instance available and have looked at the functionality provided.
- Also have a good idea as to how Panl interacts with Solr and the options that are available.
Working through this chapter of the book will provide all of the thinking, design, and steps required to index the required dataset, configure the Panl server, and get it all up and running.
This section runs through the setup and configuration of Panl and Solr for a distinct dataset showing the path from defining and setting up a Solr collection, to configuring the Panl server, to rendering the pages that are required for the user journey. This dataset will be based on a book store[11] with the underlying data containing 3 million records - this dataset is not included with the release package (however a cut-down, cleaned version is included for testing). Running through this example will detail the decisions that are required and made and how the data was then formulated.
The process for pulling all of the components together is going to be the same regardless of the dataset that you are using.
- Understand the dataset - Knowing what data is available, what cleanup process may be required, what data is missing and whether data should be derived or included from a separate source.
- Configure the Solr index - Knowing how the underlying data will be indexed and surfaced to the Panl server. In general one Solr collection, if set up correctly, will be able to drive multiple Panl configurations.
- Configure the Panl server - This applies to both the panl.properties file and <panl_collection_url>.panl.properties files. Remember: you may have multiple CaFUPs, so there may be multiple configuration files. Panl allows you to 'slice and dice' the underlying Solr search collections for specific use cases.
Finally, using all of the above information, and the functionality that each decision will provide:
- Determine any additional web pages to render - Apart from the default (or main) search page which is the first page to be configured.
This, most likely, will be an iterative process, with any of the steps requiring updates to the other steps and configuration. Having a good knowledge of the high-level requirements will enable you to quickly build up the configuration files for both the Solr and Panl servers.
For example when looking at the dataset, there are over 500,000 authors and I wanted to be able to have a page for each letter of the alphabet that would list the authors by their surname. The dataset doesn't directly support this, so derived fields will need to be added (i.e. a field that is computed/derived from the dataset) which can then be indexed by the Solr server.
0. High Level Requirements
From a very high level perspective
- For this particular implementation, the keyword search should act upon the
- Title,
- Description,
- Author, and
- Genre
- From the dataset, there will be hundreds of thousands of Authors, consequently, immediately displaying all of the Authors within a search facet would not be useful.
- Have a unique link for every author being able to see the if the author has any series, with links to books in the series
- Users should be able to facet on specific fields that make sense from a search perspective
Note: As the dataset is discovered and understood and the way in which users will interact with the data, this may lead to other use-cases and requirements. (See the section on The Iterative Implementation Process for more details).
1. Understanding the Dataset
Whether you are starting with an existing Solr managed schema, or you are looking to index a new dataset, understanding what the dataset contains is the MOST important part of the process. This will drive all other decisions - after all, without the correct data, you won't be able index it, search on it, facet upon it, or present it to the user.
The above may seem obvious, however, understanding the data, in conjunction with the desired pages that you want to render, will inform whether you can derive data from the underlying dataset, combine the dataset with other external datasets, or need to use a back-end datastore to generate pages with links.
In this example we will be looking at setting a separate dataset containing just over 3 million records based on fiction books.
The dataset that I have access to has duplicates, mis-spellings, missing information, and information that is just plain wrong. Before the indexing process, the data will need to be cleaned and any additional derived data generated. Ignoring the previous data problems, the dataset contains the following information, with some notes about we would want to search and facet the dataset
Data
- Author
- Search on the author, or authors (the book may have a collaboration of authors)
- Facet on the author, however with around 500,000 authors, we don't want to display all of the facets on the initial search page
- See a page that lists all of the author's works (in order of date of publication) with links to the series that are available
- See the author in the returned documents
- Title
- Search on the title
- Will not be faceted, but will be displayed in the returned documents
- Description
- Search on the description
- Will not be faceted, but will be displayed in the returned documents
- Book Image
- See the image of the book
- Will not be faceted, but will be displayed in the returned documents
- Buy URL
- See the link to the URL to buy the book
- Will not be faceted, but will be displayed in the returned documents
- Genre
- Search on the genre
- Be able to select multiple genres as a facet
- Number of Pages
- See the number of pages in a book in the returned document
- Will not be faceted, but will be displayed in the returned documents
- First published year
- See the first published year of the book
- Will be faceted, but as there are close to 100 years of books, we don't want to display every year on the initial search page
- Language
- Select the language of the book
- Be able to select only a single language
- Paperback/Hardcover
- Select whether users want a hardcover, or paperback book
- Series
- Search on the series name
- Will be faceted, however as there will be many series of books, this should not appear with the initial results, but only when an author has been selected
- Price
- Select a price range for books
- Be able to sort by price range
Derived Data
- ID - derived from the database primary key
- This is the required primary key for the Solr search server and the database primary key has been chosen for this purpose
- Author A-Z index - derived from the first character of the 'Author' surname
- Used to present a list of pages by the first letter of the author's surname
- Decade published - derived from 'Year published'
- Can be used to help narrow down the decade in which the book is published, i.e. the user should be able to select the decade first, and then be able to select the year within that decade
- Will be used as a facet
- Book type - derived from the 'Number of pages' - one of 'Flash Fiction', 'Short Story' 'Novelette', 'Novella', or 'Novel'
- Will be used as a facet
- Will not be returned with the search documents.
- Text - A general purpose field that will have its contents analysed and used for search queries
- Will be used for the search query
- Will not be returned in the search documents
- Solr will generate this, rather than data being input.
Now that the data types are known and how we are going to use them, let's determine how we are going to index them in the Solr search engine.
2. Configure the Solr Index
Whilst we are defining how the data will be indexed, we need to keep in mind the configuration of the Panl server as well. Looking at the data that you have indexed, or would like to index, the configuration of Panl is determined on two items:
- Whether the indexed Solr field should be a facet, or just a field, and whether multiple values are available for the field, and
- The data type of the Solr field, which will determine the configuration options that are available through the Panl server
REMEMBER
- Facets will allow you to filter the results.
- Fields will be returned with the search documents - you cannot filter the results of the field.
Whether you choose a Solr field to be configured in Panl to be a Facet, or a Field, both
- Can be returned with the search documents
- Are able to be used for sorting options
The difference with Facets, is that in addition to the above, Facets
- Are able to be selected to filter the results and, depending on the data type, can be further configured for prefixes, suffixes, ranges and value replacements
|
Note: If in doubt as to whether the Solr field will ever need to be configured as a Facet in Panl, err on the side of yes (i.e. set indexed="true"). Remember that the Panl configuration can present a Solr field as either a Facet or a Field, however if it is not set to indexed in the Solr configuration, it can only ever be a Field for Panl. |
Noting the following rules for Solr configuration:
- If we want to be able to search, sort, or facet on the data, then it must be indexed.
- If we want to see the results in the returned documents, then it must be stored
- If the dataset field can only hold a single value, then multivalued is No, otherwise it is set to Yes.
Using information above and the requirements for the dataset, the Solr field definitions for importing and indexing is as follows:
Solr Field Name |
Data Type |
Analysed |
Multivalued |
Indexed |
Stored |
|
Author |
String |
No |
Yes |
Yes |
Yes |
|
Title |
String |
No |
No |
Yes |
Yes |
|
Description |
String |
No |
No |
Yes |
Yes |
|
Book Image |
String |
No |
No |
No |
Yes |
|
Buy URL |
String |
No |
No |
No |
Yes |
|
Genre |
String |
No |
Yes |
Yes |
Yes |
|
Number of Pages |
Integer |
No |
No |
No |
Yes |
|
First Published Year |
Integer |
No |
No |
Yes |
Yes |
|
Language |
String |
No |
No |
Yes |
Yes |
|
Paperback / Hardcover |
Boolean |
No |
No |
Yes |
Yes |
|
Series |
String |
No |
No |
Yes |
Yes |
|
Price |
Float |
No |
No |
Yes |
Yes |
|
Note: The following fields are derived from the dataset before being indexed by Solr |
||||||
ID |
Integer |
No |
No |
Yes |
Yes |
|
Author A-Z index |
String |
No |
No |
Yes |
No |
|
Decade First Published |
Integer |
No |
No |
Yes |
No |
|
Book Length |
String |
Yes |
No |
Yes |
No |
|
Text |
String |
Yes |
No |
No |
No |
The above table would lead to the following snippet of the Solr managed schema file with the Solr field names and values where set to true highlighted.
|
Remember: The way that the Solr managed schema is configured can span across multiple CaFUPs and that you do not need to configure Panl to include each facet or field for each of the pages that you want to render.
Configure the schema so that it will cover all requirements, and then let the Panl configuration define how the facets and results are returned.
If in doubt, you can always set up a separate Solr collection with a Panl configuration for a specific need or use case. |
01 02 03 04
05 06
07
08
09
10
11
12
13
14
15
16
17
18 19
20
21
22 23
24 25 26 27 28 29 30 31 32 33 34 |
<schema name="book-store" version="1.6"> <field name="_version_" type="plong" indexed="false" stored="false"/>
<field name="id" type="string" stored="true" index="true" required="true" ↩ <field name="author" type="string" indexed="true" stored="true" ↩ multiValued="true" /> <field name="title" type="string" indexed="true" stored="true" ↩ multiValued="false" /> <field name="description" type="pint" indexed="true" stored="true" ↩ multiValued="false" /> <field name="book_image" type="string" indexed="false" stored="true" ↩ multiValued="false" /> <field name="buy_url" type="string" indexed="false" stored="true" ↩ multiValued="false" /> <field name="genre" type="string" indexed="true" stored="true" ↩ multiValued="true" /> <field name="num_pages" type="pint" indexed="false" stored="true" ↩ multiValued="false" /> <field name="first_published_year" type="pint" indexed="true" stored="true" ↩ multiValued="false" /> <field name="language" type="string" indexed="true" stored="true" ↩ multiValued="false" /> <field name="is_paperback" type="boolean" indexed="true" stored="true" ↩ multiValued="false" /> <field name="series" type="string" indexed="true" stored="true" ↩ multiValued="false" /> <field name="price" type="pfloat" indexed="true" stored="true" ↩ multiValued="false" />
<field name="a_to_z_index" type="string" indexed="true" stored="false" ↩ multiValued="false" /> <field name="decade_published" type="pint" indexed="true" stored="false" ↩ multiValued="false" /> <field name="book_length" type="pint" indexed="true" stored="false" ↩ multiValued="false" />
<field name="text" type="text_general" indexed="true" stored="true" ↩ multiValued="true"/>
<uniqueKey>id</uniqueKey>
<copyField source="author" dest="text" /> <copyField source="title" dest="text" /> <copyField source="description" dest="text" /> <copyField source="genre" dest="text" /> <copyField source="series" dest="text" />
...
</schema> |
Working through the schema:
Line 1:
This is the schema name which maps to the Solr collection name and will be used by Panl for the CaFUPs. WARNING: if the schema name starts with the string panl- then the Panl server will fail to start.
Line 2:
The _version_ field is required by a Solr Cloud deployment - this is an internal field, generated automatically by Solr and is used by the partial update procedure, the update log process. You may not need to have this field, however it is mandatory for a Solr Cloud instance
Line 4:
The id field for uniquely identifying a Solr document within the collection
Lines 6 - 17:
The fields that come directly from the book store dataset, note the values of the XML element for the attributes multivalued, indexed, and stored
Lines 19-21:
The fields that are derived from the data.
Line 23:
This is a field that is used as a storage area for every other field that needs to be searched on - see lines 27-31 below.
Line 25:
This element tells Solr what the unique key is for this collection
Lines 27-31:
This copies the values from the required fields so that they can analysed by Solr and searched upon with a keyword or keywords.
Line 33:
For clarity and space, the Solr field definitions and additional XML elements were not included and replaced by ellipses.
Line 35:
The end of the Solr schema definition
|
IMPORTANT: When defining the managed schema for a Solr collection, you need to consider __ALL__ of the use cases of the data and whether each field is going to be indexed and/or stored.
You can then configure the Panl server through the CaFUPs to facet and return just the individual facets and fields that you want. |
3. Configure the Panl Server
Now that we understand the dataset, and the Solr search server is going to index the data, we can extend the Solr field definitions for the initial Panl configuration.
|
Notes: The following configuration is for the default search page, alternate configurations will be defined for further search page implementations. |
Solr Field Name |
Data Type |
Facet or Field |
Facet Type |
Sortable |
Additional information |
Author |
String |
Facet |
Regular |
No |
Hierarchical, Prefix/Suffix |
Title |
String |
Field |
N/A |
No |
|
Description |
String |
Field |
N/A |
No |
|
Book Image |
String |
Field |
N/A |
No |
|
Buy URL |
String |
Field |
N/A |
No |
|
Genre |
String |
Facet |
OR |
No |
Prefix/Suffix |
Number of Pages |
Integer |
Field |
N/A |
No |
|
First Published Year |
Integer |
Facet |
Regular |
Yes |
Hierarchical, Prefix/Suffix |
Language |
String |
Facet |
Regular |
No |
|
Paperback / Hardcover |
Boolean |
Facet |
BOOLEAN |
No |
Value replacement |
Series |
String |
Facet |
Regular |
No |
Hierarchical |
Price |
Float |
Facet |
RANGE |
Yes |
|
Note: The following fields are derived from the dataset before being indexed by Solr |
|||||
ID |
String |
Field |
N/A |
No |
|
Author A-Z index |
String |
Facet |
Regular |
Yes |
Prefix/Suffix |
Decade First Published |
Integer |
Facet |
RANGE |
No |
|
Book length |
String |
Facet |
Regular |
No |
|
Text |
String |
Ignored |
N/A |
No |
|
The above can be considered the default search page configuration, there will be other <panl_collection_url>.panl.properties files defined
The Default Search Page Configuration
To generate the default search page configuration the in-built Panl generator utility is the quickest way to get up and running fast.
For more information on the available options, see the section for the command line options for the Panl Generator in the appendices.
|
IMPORTANT: Be aware that everytime that you use the Panl generator, there is a chance that the generated files will change their LPSE codes. This will happen if the Panl generator has a LPSE code that cannot be assigned from the first character of the Solr field name (either upper or lowercase) as they are already both in use and will then choose a random one. |
*NIX command
Command(s) |
cd PANL_INSTALL_DIRECTORY
bin/panl generate ↩ |
Windows command
Command(s) |
cd PANL_INSTALL_DIRECTORY
bin\panl.bat generate ↩ |
This will generate two files in the src/dist/sample/panl/book-store/ directory named panl.properties and book-store.panl.properties the text of which is not included in this book - however the complete file can be seen in the GitHub repository
The Generated panl.properties File
The Panl generator utility will output the file to the path passed in through the -properties command line option.
For all of the Book Store URL path parts, the panl.properties file will be the same. Below (for clarity) all comments have been stripped from the file. Note the last line panl.collection.book-store=book-store.panl.properties has been automatically added to the file.
01 02 03 04 05 06 07 |
solrj.client=CloudSolrClient solr.search.server.url=http://localhost:8983/solr,http://localhost:7574/solr panl.results.testing.urls=true panl.status.404.verbose=true panl.status.500.verbose=true panl.decimal.point=true panl.collection.book-store=book-store.panl.properties |
|
Note: The above is configured for testing purposes, with verbose error messaging and testing URLs live. |
The Generated <panl_collection_url>.panl.properties File
The second file that the Panl generator utility will write is the <panl_collection_url>.panl.properties file. This file has three major parts (in order):
- The general properties configuration,
- The generated fields and facets configuration, and
- The Panl LPSE order, FieldSets, and sorting
General Properties Configuration
Skipping over the defaults values for the panl.param.* properties (and no prefixes or suffixes were added to either the panl.param.page or panl.param.numrows properties) the following properties were changed:
solr.numrows.default=20
This was changed from the default value of 10 as 20 results seems to be a good starting point for such a large collection
solr.highlight=false
No changes for the default value of 'true', no highlighting will be required on the Book Store collection
panl.lpse.ignore=i
We still want to be able to search on this field and be able to pull out the results, generally with a canonical URL - e.g.
/Michael+Connelly+Harry+Bosh+Series+The+Black+Echo/1678/zi/
And use the id Solr field (LPSE code 'i') as the lookup key with the value 1678, the rest of the URL path part will be ignored.
panl.sort.fields=price,a_to_z_index,first_published_year
These are the fields that are going to be able to be sortable - remember that relevancy is always available as a sort order and is the default sort order if no other sort order is selected. The order of the value of this property is the order that Panl will return in the JSON response object.
The Generated Fields and Facets Configurations
Comments providing information about the settings have been removed from the examples below.
Solr Field 'id'
01
02 03 04 05 06 07 08 |
# <field "indexed"="true" "stored"="true" "name"="id" "type"="string" ↩ panl.facet.i=id panl.name.i=Id panl.type.i=solr.StrField #panl.prefix.i=prefix #panl.suffix.i=suffix #panl.when.i= #panl.facetsort.i=index |
This facet will be left as it is for the moment, but this will be ignored by the Panl server as the property panl.lpse.ignore=i has this LPSE code.
You can still return this as a field in the Panl results, so that if you need the unique id of the book for additional functionality (e.g. adding to a cart, linking to a separate page, looking up further details). See the panl.results.fields.* properties.
Solr Field 'author'
01
02 03 04 05 06 07 08 09 |
# <field "indexed"="true" "stored"="true" "name"="author" "type"="string" ↩ panl.facet.a=author panl.name.a=Author panl.type.a=solr.StrField panl.multivalue.a=true panl.prefix.a=Author #panl.suffix.a=suffix panl.when.a=q,A #panl.facetsort.a=index |
A prefix has been added on line 6 of 'Author ' (note the ending whitespace).
There are too many authors to have this as a facet, and they will be ordered by the number of books that have been published, so this facet is configured to only appear if a search query is set, or if the first letter of the surname is selected. Consequently line 8 has been un-commented so that the panl.when.a property has a value. This facet will only appear if the search query (LPSE code 'q') or the a_to_z_index facet (LPSE code 'A') has been selected.
Solr field 'title'
01
02 03 04 05 06 07 08 |
# <field "indexed"="true" "stored"="true" "name"="title" "type"="string" ↩ "multiValued"="false" /> panl.field.t=title panl.name.t=Title panl.type.t=solr.StrField #panl.prefix.t=prefix #panl.suffix.t=suffix #panl.when.t= #panl.facetsort.t=index |
The generator has configured this Solr field as a Panl facet as it is both indexed and stored in Solr - this has been changed to a field, rather than a facet.
Solr field 'description'
01
02 03 04 05 06 07 08 |
# <field "indexed"="true" "stored"="true" "name"="description" "type"="string" ↩ panl.field.d=description panl.name.d=Description panl.type.d=solr.StrField #panl.prefix.d=prefix #panl.suffix.d=suffix #panl.when.d= #panl.facetsort.d=index |
The generator has configured this Solr field as a Panl facet as it is both indexed and stored in Solr - this has been changed to a field, rather than a facet.
Solr Field 'book_image'
No configuration changes made
01
02 03 04 05 06 07 08 |
# <field "indexed"="false" "stored"="true" "name"="book_image" "type"="string" ↩ panl.field.b=book_image panl.name.b=Book Image panl.type.b=solr.StrField #panl.prefix.b=prefix #panl.suffix.b=suffix #panl.when.b= #panl.facetsort.b=index |
Solr Field 'buy_url'
No configuration changes made
01
02 03 04 05 06 07 08 |
# <field "indexed"="false" "stored"="true" "name"="buy_url" "type"="string" ↩ panl.field.B=buy_url panl.name.B=Buy Url panl.type.B=solr.StrField #panl.prefix.B=prefix #panl.suffix.B=suffix #panl.when.B= #panl.facetsort.B=index |
Solr field 'genre'
01
02 03 04 05 06 07 08 09 10 |
# <field "indexed"="true" "stored"="true" "name"="genre" "type"="string" ↩ "multiValued"="true" /> panl.facet.g=genre panl.or.facet.g=true panl.name.g=Genre panl.type.g=solr.StrField panl.multivalue.g=true #panl.prefix.g=prefix #panl.suffix.g=suffix #panl.when.g= #panl.facetsort.g=index |
This will be an OR facet as it is configured with the property panl.or.facet.g=true, meaning that end users can select one or more of the facet values. Note: This is already a multivalued Solr field, however using this as an OR facet means that you can select books which are 'Sci-Fi' OR 'Horror', rather than a book that is 'Sci-Fi' AND 'Horror'
Solr Fields 'num_pages'
No configuration changes made
01
02 03 04 05 06 07 08 |
# <field "indexed"="false" "stored"="true" "name"="buy_url" "type"="string" ↩ panl.field.B=buy_url panl.name.B=Buy Url panl.type.B=solr.StrField #panl.prefix.B=prefix #panl.suffix.B=suffix #panl.when.B= #panl.facetsort.B=index |
Solr Fields 'first_published_year'
01
02 03 04 05 06 07 08 |
# <field "indexed"="true" "stored"="true" "name"="first_published_year" ↩ "type"="pint" "multiValued"="false" /> panl.facet.f=first_published_year panl.name.f=First Published Year panl.type.f=solr.IntPointField panl.prefix.f=First published in #panl.suffix.f=suffix panl.when.f=D #panl.facetsort.f=index |
Add in a prefix of 'First published in ' - Line 5 - and this will only appear when the decade_published facet has been selected (LPSE code 'D') - Line 7.
Solr Fields 'language'
No configuration changes made
01
02 03 04 05 06 07 08 |
# <field "indexed"="true" "stored"="true" "name"="language" "type"="string" ↩ panl.facet.l=language panl.name.l=Language panl.type.l=solr.StrField #panl.prefix.l=prefix #panl.suffix.l=suffix #panl.when.l= #panl.facetsort.l=index |
Solr field 'is_paperback'
01
02 03 04 05 06 07 08 09 10 |
# <field "indexed"="true" "stored"="true" "name"="is_paperback" "type"="boolean" ↩ panl.facet.I=is_paperback panl.name.I=Book Format panl.type.I=solr.BoolField #panl.prefix.I=prefix #panl.suffix.I=suffix panl.bool.I.true=Paperback panl.bool.I.false=Hardcover #panl.when.I= #panl.facetsort.I=index |
The display name has been changed to be 'Book Format' (Line 3) and a Boolean value replacement for both the true and false values (Lines 7 and 8).
Another way that this could have been index by Solr was to derive the data and store the book format as a string - i.e. type="string" with the values 'Paperback' and 'Hardcover', however keeping this as a boolean value with value replacements means that additional CaFUPs could be configured with different values for true and false if additional URLs were needed to be generated.
Solr field 'series'
01
02 03 04 05 06 07 08 |
# <field "indexed"="true" "stored"="true" "name"="series" "type"="string" ↩ panl.facet.S=series panl.name.S=Series panl.type.S=solr.StrField #panl.prefix.S=prefix #panl.suffix.S=suffix panl.when.S=a #panl.facetsort.S=index |
This facet will only be passed through if an author facet (LPSE code 'a') has been selected (Line 7).
Solr field 'price'
01
02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 |
# <field "indexed"="true" "stored"="true" "name"="price" "type"="pfloat" ↩ "multiValued"="false" /> panl.facet.P=price panl.name.P=Price panl.type.P=solr.FloatPointField #panl.prefix.P=prefix #panl.suffix.P=suffix #panl.when.P= panl.range.facet.P=true panl.range.min.P=5 panl.range.max.P=100 panl.range.prefix.P=From panl.range.infix.P=\ to panl.range.suffix.P=\ dollars panl.range.min.wildcard.P=true panl.range.max.wildcard.P=true #panl.facetsort.P=index |
This facet is a RANGE facet (configured with the panl.range.facet.P=true property) - Lines 8 to 15. As an example, the configuration will generate the URL path part.
/From+5+to+100+dollars/P/
Additionally, with the wildcard properties set, it will generate a Solr query when the minimum or maximum values are passed through to use less than or greater than, respectively. I.e. if the URL path part was used, as they are both a minimum and maximum value, the query would prices between 5 or below and 100 and greater.
For a URL path part of
/From+20+to+100+dollars/P/
It would return books greater than 20 (even if they are greater than 100)
For the URL path part of
/From+45+to+50+dollars/P/
It will only return values between 45 and 50 (inclusive)
Solr field 'a_to_z_index'
01
02 03 04 05 06 07 08 |
# <field "indexed"="true" "stored"="false" "name"="a_to_z_index" "type"="string" ↩ "multiValued"="false" /> panl.facet.A=a_to_z_index panl.name.A=Authors (A-Z) panl.type.A=solr.StrField #panl.prefix.A=prefix #panl.suffix.A=suffix #panl.when.A= panl.facetsort.A=index |
The Panl name of the facet has been changed to 'Authors (A-Z)' - (Line 3).
Normally Solr will return facets in order of decreasing count, however by setting Line 8 to index, this will sort the facets by index order, i.e. by the alphabetical order, rather than the number of returned documents for the specific facet. (Line 8)
Solr field 'decade_published'
No configuration changes made
01
02 03 04 05 06 07 08 |
# <field "indexed"="true" "stored"="false" "name"="decade_published" ↩ "type"="pint" "multiValued"="false" /> panl.facet.D=decade_published panl.name.D=Decade Published panl.type.D=solr.IntPointField #panl.prefix.D=prefix #panl.suffix.D=suffix #panl.when.D= #panl.facetsort.D=index |
Solr field 'text'
01
02 03 04 05 06 07 08 09 |
# <field "indexed"="true" "stored"="false" "name"="text" "type"="text_general" ↩ "multiValued"="true" /> #panl.facet.T=text #panl.name.T=Text #panl.type.T=solr.TextField # The following two properties are optional and the values should be changed #panl.prefix.T=prefix #panl.suffix.T=suffix #panl.when.T= #panl.facetsort.T=index |
This is an internal Solr field that is used as a multi valued text field to store all fields that need to be searched against. As such, it is not going to be used as a facet, or a field, so the entire entry has been commented out.
|
IMPORTANT: Ensure that you remove the 'text' field from all Panl configured FieldSets and LPSE orders, as the Panl server will error on startup if it finds a field that it is not defined - see the following properties:
|
Solr field 'book_length'
No configuration changes made
01
02 03 04 05 06 07 08 |
# <field "indexed"="true" "stored"="false" "name"="book_length" "type"="string" ↩ "multiValued"="false" /> panl.facet.L=book_length panl.name.L=Book Length panl.type.L=solr.StrField #panl.prefix.L=prefix #panl.suffix.L=suffix #panl.when.L= #panl.facetsort.L=index |
Panl LPSE Orders, FieldSets, and Sorting
The final part of the Panl configuration is the LPSE order, the FieldSets, and the available fields/facets to sort the results documents.
The Panl LPSE Order
Going through the field and facet configuration items, two of the facets were changed to be fields, and one of the facets was removed (commented out), they were:
- The Solr field title (LPSE code 't') is now a field, not a facet
- The Solr field description (LPSE code 'd') is now a field, not a facet
- The Solr field text (LPSE code 'T') was commented out
So the original LPSE order (strikethrough text below)
01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
panl.lpse.order=z,\ i,\ a,\ t,\ d,\ b,\ B,\ g,\ N,\ f,\ l,\ I,\ S,\ P,\ A,\ D,\ T,\ L,\ q,\ p,\ n,\ s,\ o |
The final version becomes
01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 |
panl.lpse.order=z,\ i,\ a,\ b,\ B,\ g,\ N,\ f,\ l,\ I,\ S,\ P,\ A,\ D,\ L,\ q,\ p,\ n,\ s,\ o |
The Panl FieldSets
There are always going to be at least two FieldSets defined for any Panl collection, namely:
- default - this is ALWAYS available, and if not set then it will return ALL fields in the Solr collection, whether you have defined them in the Panl configuration file or not. The recommendation is to either ignore this in your implementation, or edit this FieldSet to your purposes, as has been done below.
- empty - this is ALWAYS available, and if it appears in the properties file, a warning will be printed and it will be ignored. This will return no fields for the document (i.e. no documents at all).
Here, the only configured FieldSets is going to be the default, with no other FieldSets defined. I.e. the panl.results.fields.firstfive property has been removed.
01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 |
panl.results.fields.default=id,\ author,\ title,\ description,\ book_image,\ buy_url,\ genre,\ num_pages,\ first_published_year,\ language,\ is_paperback,\ series,\ price,\ a_to_z_index,\ decade_published,\ book_length,\ text |
Lines 14 to 17 have been removed. The id field (Line 1) was kept in as it may be useful to link to the database for other purposes. The final property looks thusly:
01 02 03 04 05 06 07 08 09 10 11 12 13 |
panl.results.fields.default=id,\ author,\ title,\ description,\ book_image,\ buy_url,\ genre,\ num_pages,\ first_published_year,\ language,\ is_paperback,\ series,\ price |
The Panl Sort Fields
To define the sort fields, use the panl.sort.fields property with a list of comma separated values. Each of the sort fields must match the Solr field name, NOT the Panl LPSE code as these are passed directly through to the Solr server.
panl.sort.fields=price,a_to_z_index,first_published_year
|
Note: There is only one sorting fields property for the file and spans across all FieldSets defined in this file. You may add as many sorting fields as you would like, you do not need to make the options available to the end user. |
Testing the Configuration
At this point (assuming that the data has been correctly added and indexed to the Solr Search server) you will be able to start the Panl server and view your single CaFUP on the Panl Results Viewer - http://localhost:8181/panl-results-viewer/book-store/default/.
Configuration Change Summary
Panl Field Name |
LPSE code |
Changes |
|||
Author |
a |
Added hierarchy by setting panl.when.a=q,A Added prefix by setting panl.prefix.a=Author . |
|||
Title |
t |
Changed from a facet to a field - i.e. panl.facet.t=title to panl.field.t=title Remove field from panl.lpse.order |
|||
Description |
d |
Changed from a facet to a field - i.e. panl.facet.d=description to panl.field.d=description Remove field from panl.lpse.order |
|||
Book Image |
b |
No changes to Panl field/facet configuration |
|||
Buy URL |
B |
No changes to Panl field/facet configuration |
|||
Genre |
g |
Made this an OR facet by setting panl.or.facet.g=true |
|||
Number of Pages |
N |
No changes to Panl field/facet configuration |
|||
First Published Year |
f |
Added a prefix by setting panl.prefix.f=First published in . Added to sort fields by adding to the property panl.sort.fields |
|||
Language |
l |
No changes to Panl field/facet configuration |
|||
Paperback / Hardcover |
I |
Changed the Panl name by setting panl.name.I=Book Format Added BOOLEAN value replacement values by setting panl.bool.I.true=Paperback panl.bool.I.false=Hardcover |
|||
Series |
S |
Added hierarchy by setting panl.when.S=a |
|||
Price |
P |
Made this a RANGE facet with value replacement by setting the following panl.range.facet.P=true panl.range.min.P=5 panl.range.max.P=100 panl.range.prefix.P=From . panl.range.infix.P=\ to . panl.range.suffix.P=\ dollars panl.range.min.wildcard.P=true panl.range.max.wildcard.P=true Added to sort fields by adding to the property panl.sort.fields |
|||
Note: The following fields are derived from the dataset before being indexed by Solr |
|||||
id |
i |
Added to ignored facet by setting panl.lpse.ignore=i |
|||
Author A-Z index |
A |
Change the Panl field name by setting panl.name.A=Authors (A-Z) Removed field from Panl.results.fields.default Added to sort fields by adding to the property panl.sort.fields Added sorting of the facet values by index, rather than count panl.facetsort.A=index |
|||
Decade First Published |
D |
Removed field from panl.results.fields.default |
|||
Book length |
L |
Removed field from panl.results.fields.default |
|||
Text |
T |
Commented out all properties for this field Remove field from panl.lpse.order Removed field from panl.results.fields.default |
4. Determine the Web Pages to Render
In addition to the default (i.e. main) search page with its functionality, additional page requirements are as follows.
- SEO friendly URLs that list of all books published by author (in order of publication) along with the ability to facet by any series that the author has written.
- SEO friendly URLs that lists all book series for Authors (in order of publication) that exist
- A list of all Authors with their associated books and their series
Author and Author Series
Both of these pages can be generated with a single Panl Configuration (included as the dist/sample/panl/book-store/author-alphabetical.panl.properties file) For each of the links to the authors, a link was generated in the format of /Author+<author_name/a/ and then the Panl server was left to do its work.
|
Tips: The majority of these pages could also be directly generated through a database query, however you would also need to implement sorting, pagination, and any additional faceting as well, all of which Panl has built-in and ready to use. |
Author Listing
The indexing of the author listing page - i.e. a complete list of all authors within the dataset could not sufficiently be satisfied by the current dataset, so a new dataset was created and indexed by Solr (not included in the release package). I was then able to produce the pages that were required by passing it through the Panl server to utilise the searching, sorting, pagination, and hierarchical facets.
A single Panl solution may not fit all use cases so you may need to look at additional datasets, or simply by using pages generated from a database.
The Iterative Implementation Process[12]
When testing the configurations, the original implementation didn't quite make sense, so
- The way the dataset was index by the Solr collection didn't suit all of the needs, so a separate collection was created to hold only an individual author with a multifield list of titles attached to it.
- Searching a book by decade didn't really make sense (or even having the hierarchical facet for first_published_in). They were removed as facets and made to be fields. The data was left in the Solr collection index, as they may be of use later.
- The site that was generated was a joining of the web application server, the database, and then the Panl configuration. Some pages were generated by the database and served up by the web application which were then linked to the Panl implementation.
- Any Solr fields that are of type float, when returned with the documents there may be storage errors, for example, each of the books are priced as a float as 19.99, when returned with the document, it comes back as 19.989999771118164 - which rounds to 19.99. Instead I derived another field to have it as an integer for price in cents, then on the front end, I just formatted it to the correct decimal place.
The changes made and implementation details have not been provided in the included sample dataset.
The Panl server runs purely on configuration, so any changes that are made to either of the configurations will be utilised at runtime. Provided that the Solr collection is set up to allow the broadest array of functionality, this becomes a very short iterative process.
~ ~ ~ * ~ ~ ~