April 27, 2020
Estimated Post Reading Time ~

Search&Promote – Crawling(IndexConnector)

IndexConnector:

Enable to define additional input sources for indexing XML pages or any kind of feed

The IndexConnector can be used to index the product data from e-commerce systems with a large number of product data to reduce the crawling and indexing time. IndexConnector approaches better crawling/indexing performance.

An XML data source consists of XML records, that contain information that corresponds to individual documents that can be added to the index

A text data feed contains individual new-line-delimited records that correspond to individual documents that can be added to the index

Mapping can be defined, how each record's items are used to populate the metadata fields in the resulting index

Multiple protocols can be used to connect to the input sources from IndexConnecter – HTTP(S)/FTP/SFTP/FILE



The IndexConnector is not enabled by default in the S&P account, the same should be enabled by the Adobe S&P account team.

DEFINING INDEXCONNECTOR:





Sample product feed file(XML)

<feed
    xmlns:xs="http://www.w3.org/2001/XMLSchema" version="2.0">
    <channel>
        <title>Product Feed</title>
        <Item>
            <link>https://www.example.com/product-title/p/123</link>
            <title>
                <![CDATA[product-title]]>
            </title>
            <pubDate>05/09/2011</pubDate>
            <pubYear>2011</pubYear>
            <description>
                <![CDATA[<p>product description</p>]]>
            </description>
            <productType>Research</productType>
            <category>
                <![CDATA[Financial Planning|Financial Planners|Research]]>
            </category>
            <ProductId>123</ProductId>
            <imageUrl>/content/dam/Images/product/123.jpg</imageUrl>
        </Item>
        <Item>
            <link>https://www.example.com/product-title/p/1234</link>
            <title>
                <![CDATA[product-title]]>
            </title>
            <pubDate>05/09/2011</pubDate>
            <pubYear>2011</pubYear>
            <description>
                <![CDATA[<p>product description</p>]]>
            </description>
            <productType>Research</productType>
            <category>
                <![CDATA[Financial Planning|Financial Planners|Research]]>
            </category>
            <ProductId>1234</ProductId>
            <imageUrl>/content/dam/Images/product/1234.jpg</imageUrl>
        </Item>
        <Item>
            <link>https:/www.example.com/product-title/p/12345</link>
            <title>
                <![CDATA[product-title]]>
            </title>
            <pubDate>05/09/2011</pubDate>
            <pubYear>2011</pubYear>
            <description>
                <![CDATA[<p>product description</p>]]>
            </description>
            <productType>Research</productType>
            <category>
                <![CDATA[Financial Planning|Financial Planners|Research]]>
            </category>
            <ProductId>12345</ProductId>
            <imageUrl>/content/dam/Images/product/12345.jpg</imageUrl>
        </Item>
    </channel>
</feed>

Configure the feed file location and the Item tag



Map the fields from feed file to metadata defined, define a primary key value that will identify each record uniquely.



Preview the configuration



Define the IndexConnector as URL entry point for crawling





Now run the full live index, the new records will be reflected in the search result after the completion of the indexing.


By aem4beginner

No comments:

Post a Comment

If you have any doubts or questions, please let us know.