Say Hello to OpenPipeline
OpenPipeline is new open source software for crawling, parsing, analyzing and routing documents. It ties together otherwise incomplete solutions for enterprise search and document processing. OpenPipeline provides a common architecture for connectors to data sources, file filters, text analyzers and modules to distribute documents across a network. It includes a job scheduler and a full UI with a point-and-click interface.
It comes fully functional with prebuilt components, but also integrates third-party modules. Plugins for crawling content management systems, parsing special file formats, and performing text analytics are available.