Site Meta
By default Valu Search reads the sitemap from /sitemap.xml and it will also
walk to any links found on the pages.
Fields
Following fields can be defined in the JSON document.
siteName
- Type:
string
Name of the site. Used in some search interfaces.
maxPages
- Type:
number
How many pages to index at maximum.
sitemap
- Type:
boolean
Whether to read the site map file.
walkLinks
- Type:
boolean
Whether to walk links on pages.
startPaths
- Type:
string[]
Paths to start walking from.
Example
{
"walkPaths": ["/", "/sub-site"]
}
concurrency
- Type:
number
How many pages to process at once. Defaults to 5. If our crawler feels too heavy you may lower this.
crawlDelay
- Type:
number
To lessen the crawler impact even more this setting can be used to make the crawler to sleep given milliseconds after each crawled page. When using this you should set concurrency to 1.
denyPatterns
- Type:
string[]
Path prefixes to ignore. For example with /secrets the crawler does not enter
path like /secrets/foo.html at all.
You may also define regular expressions with a re: prefix. Ex: re:.*nope.*
and the crawler won't enter any path containing the string nope.
Example
{
"denyPatterns": ["/secrets", "re:.*nope.*"]
}
Tip: Valu Search also respects Disallow rules from the robots.txt.
robots
- Type:
boolean
Whether to respect site robots.txt file. Defaults to true.