OpenPipeline is new open source software for crawling, parsing, analyzing and routing documents. It ties together otherwise incomplete solutions for enterprise search and document processing. OpenPipeline provides a common architecture for connectors to data sources, file filters, text analyzers and modules to distribute documents across a network. It includes a job scheduler and a full UI with a point-and-click interface.
It comes fully functional with prebuilt components, but also integrates third-party modules. Plugins for crawling content management systems, parsing special file formats, and performing text analytics are available.
Posted
on January 8, 2009, 10:11 am,
by admin,
under
News.
Comments Off
We’re getting there — 0.8 is now out. Big changes in this release: 1. The Item class now carries a binary version of the document, which can be useful for transmitting and saving it in a pipeline stage. It’s also useful for the second big change: 2. DocFilters are now a stage in the pipeline. The DocFilter interface has been refactored to look like a stage. This makes it much easier to handle documents that should generate multiple items. (Imagine an XML file with multiple subitems, or a large document that should have one item per chapter). It’s also a much cleaner design, because now connectors don’t need to know anything about DocFilters.
Plenty of bug fixes and small niceties added. Check it out.
We’ve added a wiki to the site. We’ve moved the documentation there and published a roadmap for OpenPipeline’s future. Take a look.
Version 0.7 is available on the download page. This is a minor bug-fix release. We’ve also added HTMLFilter and a feature or two to the FileScanner and the StageSelection modules. See the changelog for more.
A couple improvements in this new release: doc filters are now configurable, an OpenCalais stage has been added, there’s a beta ItemSender stage and ItemReceiver connector, and many small bug fixes.
We’ve posted a minor bug fix release to the downloads page. Build 1678 fixes a few NPEs, has a slightly different way of handling versions.
Version 0.5 is finally out. What’s new?
– Internally, we’ve done a complete refactoring. All the major objects have a better design. Connectors, Stages, and Items have stabilized.
– The UI has many new features and functions.
– There are new Connectors and Stages.
– It’s been in production use for a couple of months now and we’ve shaken out several bugs.
Get it on the download page.
Raritan Technologies has completed development of a Documentum connector for OpenPipeline. For sales or technical information contact Raritan Technologies
We’ve been busily working on the next release of OpenPipeline. Expect to see a cleaner design, more connectors, and more stages. Bookmark our RSS feed to get notified when it happens.
Forrester Research discusses OpenPipeline in a teleconference looking at open source enterprise information access software.
Download the recording here.
OpenPipeline is getting press from more tech publications:
PCWorld Business Center – OpenPipeline Seeks to Ease Document Prep for Search Chris Kanaracus, IDG News Service Wednesday, April 30, 2008
On May 5th OpenPipeline will be available for download! The site will be open to all visitors, registration will not be required.
Download the OpenPipeline presentation slides below.
Introducing OpenPipeline
Boston MA, April 29–Dieselpoint, a leader in enterprise search and navigation, will unveil its new open source product, OpenPipeline, at the Infonortics Search Engine Meeting (SEM) 2008 in Boston on Tuesday, April 29. OpenPipeline is new open source software for crawling, parsing, analyzing and routing documents. Chris Cleveland, Dieselpoint’s founder and CEO, will provide an introduction to the software, including a live demo, implementation and the underlying code.
OpenPipeline is designed to tie together otherwise incomplete solutions for enterprise search and document processing. It provides a common architecture for connectors to data sources, file filters, text analyzers and modules to distribute documents across a network. The software includes a job scheduler and a full UI with a point-and-click interface.
It comes fully functional with prebuilt components, but also integrates third-party modules. Plugins for crawling content management systems, parsing special file formats, and performing text analytics are available.
Anyone who is interested in document processing, text analytics, web services, or enterprise search will be interested in learning about OpenPipeline. Infonortics Search Engine Meeting takes place April 28 and 29, 2008 at the Fairmont Copley Plaza Hotel in Boston, Massachusetts.
About Infonortics Search Engine Meeting This annual meeting, now in its 13th year, provides a forum and point-of-reference for all those interested in the domain of Search and Retrieval. The Meeting draws together those with a professional interest in search engines – such as search engine designers and developers – and those interested in applying search engines in their own professional environments. Search is at the heart of information retrieval; and the Search Engine Meeting provides an annual point of reference as to what is happening in this fast-moving and exciting field. Learn More.
Just go to the download page and get it. Let us know what you think — go to the forums and give us your impressions.