Site Meta
By default Valu Search reads the sitemap from /sitemap.xml
and it will also
walk to any links found on the pages.
Fields
Following fields can be defined in the JSON document.
siteName
- Type:
string
Name of the site. Used in some search interfaces.
maxPages
- Type:
number
How many pages to index at maximum.
sitemap
- Type:
boolean
Whether to read the site map file.
walkLinks
- Type:
boolean
Whether to walk links on pages.
startPaths
- Type:
string[]
Paths to start walking from.
Example
{
"walkPaths": ["/", "/sub-site"]
}
concurrency
- Type:
number
How many pages to process at once. Defaults to 5. If our crawler feels too heavy you may lower this.
crawlDelay
- Type:
number
To lessen the crawler impact even more this setting can be used to make the crawler to sleep given milliseconds after each crawled page. When using this you should set concurrency to 1.
denyPatterns
- Type:
string[]
Path prefixes to ignore. For example with /secrets
the crawler does not enter
path like /secrets/foo.html
at all.
You may also define regular expressions with a re:
prefix. Ex: re:.*nope.*
and the crawler won't enter any path containing the string nope
.
Example
{
"denyPatterns": ["/secrets", "re:.*nope.*"]
}
Tip: Valu Search also respects Disallow
rules from the robots.txt
.
robots
- Type:
boolean
Whether to respect site robots.txt
file. Defaults to true.