Crawled Properties

Crawled properties are metadata that is extracted from content sources to make the data available for searching. Crawled properties are typically reported by the Content SSA or other FAST Search Server 2010 for SharePoint connectors, but can also be created during item processing by an IFilter or a property extractor.

A crawled property is uniquely defined by the parameters of Name, Propset, and VariantType.

Two specific managed properties are populated with the crawled property names and values discovered for the given item, as follows:

crawledpropertynames   Holds discovered crawled properties that have a value for a specified item.

crawledpropertiescontent   Holds the value of every crawled property in crawledpropertynames.

Some discovered crawled properties are not mapped into these managed properties. The disadvantage of automatically indexing the content of discovered crawled properties is that not all content is relevant for searching. There may be several reasons for this; for example, the crawled properties might provide sensitive information or contain data that can adversely affect relevance or recall. A crawled property will map to crawledpropertiescontent if the following is true:

The crawled property has variant types that map to a string or list of strings.

Crawled properties that are known to provide unwanted content in the search index are excluded by setting their IsMappedToContents property to False.

Because every crawled property belongs to a category (determined by its Propset), the category has a Boolean property (MapToContents) that sets the default value of the IsMappedToContents property of new crawled properties

So, if the crawled property is a string and its IsMappedToContents property is True, the content of the crawled property should be searchable in crawledpropertiescontent.

Each crawled property belongs to a crawled property category, which is a high-level grouping of crawled properties based on the IFilter and content source that is used to extract the metadata from the content.

The following are examples of categories:

Business Data  Metadata that is associated with content retrieved by using the Business Data Connectivity (BDC) service.

Mail  Metadata that is associated with Microsoft Exchange Server.

Office  Metadata that is contained in Microsoft Office documents such as Microsoft Word, Microsoft Excel, and Microsoft PowerPoint.

People  Metadata that is associated with the people profiles in SharePoint Server 2010. The majority of these are also mapped to various managed properties from Active Directory and SharePoint information.

Web  HTML metadata that is associated with web pages.

A crawled property category may contain multiple property sets. Table 1 describes the interfaces that are related to crawled properties.

Interface Description
CrawledProperty Specifies a crawled property.
Category
You can use the Category interface to specify default mapping behavior that is common to all crawled properties within the category.
You can use the AllCategories property of the Schema interface to retrieve a collection of property categories. You can retrieve a collection of CrawledProperty objects for a given category by using the Category.GetAllCrawledProperties method.
You can create a crawled property by using the Category.CreateCrawledProperty method.

ManagedProperty Managed properties are metadata that can be searched or retrieved in query results.
You can retrieve a collection of CrawledProperty objects that represent the crawled properties mapped to a specific managed property by using the ManagedProperty.GetMappedCrawledProperties method.
You can configure crawled property mappings by using the ManagedProperty.SetCrawledPropertyMappings method.

Comments

Popular posts from this blog

"There's a configuration problem preventing us from getting your document. If possible, try opening this document in Microsoft Word." Office WebApp Error

"Sorry, Word Web App can't open this ... document because the service is busy." Office WebApp

Unable to create a "Send to Connection"- verification failed -url is a not a valid routing destination