org.openpipeline.pipeline.connector
Class GenericScanner

java.lang.Object
  extended by org.openpipeline.pipeline.connector.GenericScanner

public class GenericScanner
extends Object

This class crawls any data source that implements the FileSystem interface. It's a helper class that gets embedded in Connector classes.


Constructor Summary
GenericScanner()
           
 
Method Summary
 int getDocsProcessed()
           
 long getElapsed()
          Return the elapsed execution time in millis.
 void interrupt()
           
 void lookForDeletes()
          Crawl all the items that didn't get touched, and remove them if not found.
 void scan(FileSystem file)
          Scan the file system, looking for files to process.
 void setAddMetadata(boolean addMetadata)
           
 void setDebug(boolean debug)
           
 void setDocFilterFactory(DocFilterFactory docFilterFactory)
           
 void setDocLoggingCount(int docLoggingCount)
           
 void setLinkQueue(LinkQueue linkQueue)
           
 void setLogger(Logger logger)
           
 void setScanCompressedFiles(boolean scanCompressedFiles)
           
 void setScanSubDirs(boolean scanSubDirs)
           
 void setStageList(StageList stageList)
           
 void setStartOfCrawl(long startOfCrawl)
          Set the timestamp, in millis, when this crawl started.
 void setWildcardMatcher(WildcardMatcher wildcardMatcher)
           
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

GenericScanner

public GenericScanner()
Method Detail

setStartOfCrawl

public void setStartOfCrawl(long startOfCrawl)
Set the timestamp, in millis, when this crawl started.

Parameters:
startOfCrawl - usually set to System.currentTimeMillis()

scan

public void scan(FileSystem file)
          throws Exception
Scan the file system, looking for files to process.

Exception handling: DocFilters trap exceptions internally. If there is an error parsing a document, it just gets logged and the connector continues. Any other exception should probably abort the connector.

Throws:
Exception

lookForDeletes

public void lookForDeletes()
                    throws Exception
Crawl all the items that didn't get touched, and remove them if not found.

Throws:
Exception

interrupt

public void interrupt()

setWildcardMatcher

public void setWildcardMatcher(WildcardMatcher wildcardMatcher)

setScanSubDirs

public void setScanSubDirs(boolean scanSubDirs)

setScanCompressedFiles

public void setScanCompressedFiles(boolean scanCompressedFiles)

setDebug

public void setDebug(boolean debug)

setLogger

public void setLogger(Logger logger)

setDocLoggingCount

public void setDocLoggingCount(int docLoggingCount)

setLinkQueue

public void setLinkQueue(LinkQueue linkQueue)

setDocFilterFactory

public void setDocFilterFactory(DocFilterFactory docFilterFactory)

getDocsProcessed

public int getDocsProcessed()

getElapsed

public long getElapsed()
Return the elapsed execution time in millis.

Returns:
the elapsed time

setStageList

public void setStageList(StageList stageList)

setAddMetadata

public void setAddMetadata(boolean addMetadata)