Selecting Input and Output Formats : Understanding Output Formats : WebWorks Reverb : Indexing WebWorks Reverb - Client-Side Search for Source Documents, Baggage Files and External URLs
 
Indexing WebWorks Reverb - Client-Side Search for Source Documents, Baggage Files and External URLs
Reverb now has the ability to index baggage files for use in Reverb’s Client-side Search. A Baggage File in this context is any file that is linked to a source document that will be included in the generated output for producing useful search results. For more detailed information on baggage files, see “Understanding Baggage Files”.
Baggage files are indexed in the same way that source documents are, as long as the “Client-side Search” is ON (see “Client-side Search”). External URLs will be indexed as long as the Index external links setting is Enabled.
Using Tidy for Indexing HTML Pages
In order to index baggage files, Reverb creates an XHTML copy of the files using Tidy (tool for cleaning up HTML files) to get valid XML files that ePublisher can read. But there might be cases it fails and you will have to teach Tidy how to handle it.
One of the things you might need to teach Tidy about is new tags. You’ll know you have to do that if you receive a warning in the log saying something like the following:
line 33 column 3 - Error: <not_recognized_tag> is not recognized!
When Tidy shows this warning that means it wasn’t able to generate an XHTML copy, and therefore ePublisher won’t index that baggage file. But fortunately there is a way we can fix that.
To Teach Tidy About New Tags
1. Go to your Tidy directory under the installation directory in your local computer: ...\WebWorks\ePublisher\<VERSION>\Helpers\tidy\
2. Create an override of the entire folder called tidy and place it in a sub-folder of your project called: Formats.
3. In the newly created Tidy folder, open your config.txt file.
4. Depending on the kind of tag you want to add, you’ll have to uncomment line 8 or 10, or maybe both in the config.txt file.
5. Substitute the placeholder we put there and after the colon, with your new tag name (for example: not_recognized_tag).
6. Save and close the file.
To know more about how to customize Tidy go to https://www.w3.org/People/Raggett/tidy/.
Assigning Relevance Weight to Your Source Documents Styles
Search results are displayed in the Search tab when a user types a word to search for and clicks Go. The search results are sorted by the relevancy ranking, which, in case of source documents, is calculated based on the Search relevance weight option defined in your Paragraph and Marker Styles. By default, WebWorks Reverb assigns relevance weight of 1 to all styles.
To Modify the Relevancy Ranking in Source Documents for Search Results
1. Open your project with ePublisher Designer.
2. Scan the document, to pull all styles into the Style Designer.
3. Open the Style Designer (F10 or View > Style Designer).
4. Select the style you want to assign a weight to (either in Paragraph Styles or Marker Styles).
5. Open the Options window.
6. Change the Value of the Search relevance weight option to an integer number you determine or you can just ignore it (which is going to be 0), meaning that the style is not going to be shown in your results.
Assigning Relevance Weight to Your HTML Baggage Files
The search results are sorted by relevancy ranking, which, in case of HTML baggage files, is calculated based on the scoring preference defined for the HTML tags in the search_settings.xml file. By default, WebWorks Reverb assigns relevancy rankings based on where in a topic a particular item is found.
To Modify the Relevancy Ranking in Baggage Files for Search Results
1. Open your project with ePublisher Designer.
2. If you want to override the relevancy ranking for all WebWorks Help targets, create the Formats\WebWorks Reverb\Transforms folder in your projectname folder, where projectname is the name of your ePublisher project.
3. If you want to override the relevancy ranking for one WebWorks Help target, create the Targets\WebWorks Help 5.0\Transforms folder in your projectname folder, where projectname is the name of your ePublisher project.
4. Create a customization of your search_settings.xml file.
5. You’ll see the following block of code:
<Settings version="1.0" xmlns="urn:WebWorks-Settings-Schema">
<ScoringPrefs>
<meta name="keywords" weight="100"/>
<meta name="description" weight="50"/>
<meta name="summary" weight="60"/>
<title weight="20"/>
<div class="myclass" weight="10"/>
<div weight="2"/>
<h1 weight="15"/>
<h2 weight="10"/>
<caption weight="10"/>
<h3 weight="7"/>
<th weight="5"/>
<h4 weight="5"/>
<h5 weight="4"/>
<h6 weight="3"/>
<h7 weight="2"/>
</ScoringPrefs>
</Settings>
6. Modify the weight attributes for any tags, such as h1 and h2, you want to change.
7. Save and close the search_settings.xml file.
8. Regenerate your project to review the changes.