public final class WebCrawlerConstants
extends java.lang.Object
Modifier and Type | Class and Description |
---|---|
static class |
WebCrawlerConstants.ErrorHandling
what to do on IO errors when fetching links.
|
Modifier and Type | Field and Description |
---|---|
static java.lang.String |
ATTACHMENT_CONTENT
name of attachment containing the content of a web resource.
|
static java.lang.String |
ATTRIBUTE_CHARSET
name of attribute containing the charset of the web resource reported by the web server (if any).
|
static java.lang.String |
ATTRIBUTE_CONTENTTYPE
name of attribute containing the content-type of the web resource reported by the web server (if any).
|
static java.lang.String |
ATTRIBUTE_CRAWL_DEPTH
internal attribute used to apply max crawl depth.
|
static java.lang.String |
ATTRIBUTE_LASTMODIFIED
name of attribute containing the last-modified header reported by the web server (if any).
|
static java.lang.String |
ATTRIBUTE_MIMETYPE
name of attribute containing the mimetype of the web resource reported by the web server.
|
static java.lang.String |
ATTRIBUTE_SIZE
name of attribute containing the content-length of the web resource reported by the web server (if any).
|
static java.lang.String |
ATTRIBUTE_URL
name of attribute containing the URL of the web resource.
|
static int |
DEFAULT_LINKS_PER_BULK
default value for 'linksPerBulk' parameter.
|
static java.lang.String |
DEFAULT_USERAGENT
default user agent, if nothing valid is defined in webcrawler.properties.
|
static java.util.Set<java.lang.String> |
PROPERTY_NAMES
the property names the web ETL workers should support for mapping.
|
static java.lang.String |
TASK_PARAM_LINK_ERROR_HANDLING
Name of the task parameter that tells how to handle links that cannot be fetched.
|
static java.lang.String |
TASK_PARAM_LINKS_PER_BULK
Name of the task parameter that contains the number of links to write to one bulk object.
|
static java.lang.String |
TASK_PARAM_START_URL
Name of the task parameter that contains the start URL for crawling.
|
static java.lang.String |
TASK_PARAM_WAIT_BETWEEN_REQUESTS
Name of the task parameter that contains a long value in milliseconds on how long to wait between http requests.
|
public static final java.lang.String ATTRIBUTE_URL
public static final java.lang.String ATTRIBUTE_LASTMODIFIED
public static final java.lang.String ATTRIBUTE_CONTENTTYPE
public static final java.lang.String ATTRIBUTE_MIMETYPE
public static final java.lang.String ATTRIBUTE_CHARSET
public static final java.lang.String ATTRIBUTE_SIZE
public static final java.lang.String ATTACHMENT_CONTENT
public static final java.lang.String ATTRIBUTE_CRAWL_DEPTH
public static final java.lang.String TASK_PARAM_START_URL
public static final java.lang.String TASK_PARAM_WAIT_BETWEEN_REQUESTS
public static final java.lang.String TASK_PARAM_LINKS_PER_BULK
public static final int DEFAULT_LINKS_PER_BULK
public static final java.lang.String DEFAULT_USERAGENT
public static final java.lang.String TASK_PARAM_LINK_ERROR_HANDLING
public static final java.util.Set<java.lang.String> PROPERTY_NAMES