Skip to main content

Site Meta

By default Valu Search reads the sitemap from /sitemap.xml and it will also walk to any links found on the pages.

Fields

Following fields can be defined in the JSON document.

siteName

  • Type: string

Name of the site. Used in some search interfaces.

maxPages

  • Type: number

How many pages to index at maximum.

sitemap

  • Type: boolean

Whether to read the site map file.

  • Type: boolean

Whether to walk links on pages.

startPaths

  • Type: string[]

Paths to start walking from.

Example

{
"walkPaths": ["/", "/sub-site"]
}

concurrency

  • Type: number

How many pages to process at once. Defaults to 5. If our crawler feels too heavy you may lower this.

crawlDelay

  • Type: number

To lessen the crawler impact even more this setting can be used to make the crawler to sleep given milliseconds after each crawled page. When using this you should set concurrency to 1.

denyPatterns

  • Type: string[]

Path prefixes to ignore. For example with /secrets the crawler does not enter path like /secrets/foo.html at all.

You may also define regular expressions with a re: prefix. Ex: re:.*nope.* and the crawler won't enter any path containing the string nope.

Example

{
"denyPatterns": ["/secrets", "re:.*nope.*"]
}

Tip: Valu Search also respects Disallow rules from the robots.txt.

robots

  • Type: boolean

Whether to respect site robots.txt file. Defaults to true.