Google Search Appliance (GSA) Crawl Proxy

Readme

Google Search Appliance is able to natively crawl secure content coming from multiple sources using for instance the following main methods:


The above offering is useful in most cases but in some others we would need to use an extended functionality. This is usually occurring when we have some content sources that are not able to manage any of the above security mechanisms or we have more than one Forms based authentication solution, not being protected by a central SSO server.

GSA Crawl Proxy, this open source project, is meant to extend the crawling possibilities taking the advantage of the connectivity provided by the AuthN/AuthZ modules for the GSA Valve Security Framework . These modules implement the integration complexity to securely access to the content sources where documents are. The GSA Crawl Proxy is able to authenticate the crawler user coming from the search appliance using HTTP Basic, and send those credentials to any of these modules that can convert them to any security mechanism the content servers would understand.

The integration between the GSA and this crawling tool is done configuring a proxy access in the appliance. This application is acting as a proxy that gets the crawling requests from the appliance and returns back the result, either the document's content if the request was successful or an HTTP error code if does not.

 

Latest Changes

Release 1.0 - June 18 2008


More Information

Checks the documentation available at the open source project site at code.google.com to get more information and the instructions on how to deploy this application.