org.openpipeline.pipeline.connector.linkqueue
Interface LinkQueue

All Known Implementing Classes:
DerbyLinkQueue, MySQLLinkQueue

public interface LinkQueue

A LinkQueue maintains a list of the ids of items that have been crawled, and a crawl timestamp. It's useful when re-crawling data sources so you can locate items that haven't been crawled and may have been deleted.


Method Summary
 void close()
          Close this queue.
 String fetchNextUncrawled(long beforeTimestamp)
          Fetch the id of an uncrawled item.
 String getDescription()
          Return a description of the LinkQueue for display in the Admin UI, for example, "LinkQueue based on XYZ database"
 String getName()
          Return the name of this LinkQueue implementation, for example, "MyLinkQueue"
 long getSignature(String id)
          Get the signature of the item with the given id, or -1 if the id is not found.
 void remove(String id)
          Remove the specified id from the queue.
 void setParams(XMLConfig params)
          Set parameters to configure this LinkQueue.
 void update(String id, long signature, long lastCrawl)
          Update the item with the given id and set the signature and lastCrawl timestamp.
 

Method Detail

getSignature

long getSignature(String id)
Get the signature of the item with the given id, or -1 if the id is not found. A signature is any arbitrary value that can identify if a document has changed. It can be a timestamp, or a checksum, or a version number, or any other value that will change when the document changes. This value should only be used to detect changed documents. The exact value is depends on the implementation of the connector.

Parameters:
id - the id of the item
Returns:
a signature

update

void update(String id,
            long signature,
            long lastCrawl)
Update the item with the given id and set the signature and lastCrawl timestamp. If the id does not exist in the queue, it will be added.

Parameters:
id - the id of the item
signature - see the definition of signature in getSignature()
lastCrawl - timestamp of the last crawl of the item

fetchNextUncrawled

String fetchNextUncrawled(long beforeTimestamp)
Fetch the id of an uncrawled item. An uncrawled item is any item with a lastCrawl timestamp before the specified beforeTimestamp.

Parameters:
beforeTimestamp - a timestamp in millis, typically matching the start time of the current crawl
Returns:
an id, or null if there are no more uncrawled items

remove

void remove(String id)
Remove the specified id from the queue. Normally called if the item cannot be found in the repository.

Parameters:
id - an id of a item.

close

void close()
Close this queue.


getName

String getName()
Return the name of this LinkQueue implementation, for example, "MyLinkQueue"


getDescription

String getDescription()
Return a description of the LinkQueue for display in the Admin UI, for example, "LinkQueue based on XYZ database"


setParams

void setParams(XMLConfig params)
Set parameters to configure this LinkQueue.

Parameters:
params - the params to use