外文资料原文Lucene in ActionOtis Gospodnetic Erik HatcherUnderstanding LuceneDifferent people are fighting the same problem—information overload—using different approaches. Some have been working on novel user interfaces, some on intelligent agents, and others on developing sophisticated search tools like Lucene. Before we jump into action with code samples later in this chapter, we’ll give you a high-level picture of what Lucene is, what it is not, and how it came to Lucene isLucene is a high performance, scalable Information Retrieval (IR) library. It lets you add indexing and searching capabilities to your applications. Lucene is a mature, free, open-source project implemented in Java; it’s a member of the popular Apache Jakarta family of projects, licensed under the liberal Apache Software License. As such, Lucene is currently, and has been for a few years, the most popular free Java IR you’ll soon discover, Lucene provides a simple yet powerful core API that requires minimal understanding of full-text indexing and searching. You need to learn about only a handful of its classes in order to start integrating Lucene into an application. Because Lucene is a Java library, it doesn’t make assumptions about what it indexes and searches, which gives it an advantage over a number of other search new to Lucene often mistake it for a ready-to-use application like a file-search program, a web crawler, or a web site search engine. That isn’t what Lucene is: Lucene is a software library, a toolkit if you will, not a full-featured search application. It concerns itself with text indexing and searching, and it does those things very well. Lucene lets your application deal with business rules specific to its problem domain while hiding the complexity of indexing and searching implementation behind a simple-to-use API. You can think of Lucene as a layer that applications sit on top of, as depicted in figure number of full-featured search applications have been built on top of Lucene. If you’re looking for something prebuilt or a framework for crawling, document handling, and searching, consult the Lucene Wiki “powered by” page () for many options: Zilverline, SearchBlox, Nutch, LARM, and jSearch, to name a few. Case studies of both Nutch and SearchBlox are included in chapter Lucene can do for youLucene allows you to add indexing and searching capabilities to your applications (these functions are described in section ). Lucene can index and make searchable any data that can be converted to a textual format. As you can see in figure .图Figure A typical application integration with Lucene外文原文翻译对Lucene的理解人们用不同的方式讨论着同一个问题——信息超载。其中一些致力与新型用户界面的研究,另一些是才智过人的代理,还有些则研究搜索工具,正如Lucene在我们对下一章的样本代码操作之前,我们将向您详细描述什么是Lucene而什么不是,以及怎样来做它。Lucene是什么Lucene是一种高性能,可扩展的信息检索( IR )的资料库。它有添加的应用程序进行索引和搜索能力。 Lucene是一款基于Java语言成熟的,免费开放源代码的项目;它属于流行的Apache Jakarta项目下的一个分支并已领取牌照的自由Apache软件许可证。因此, Lucene目前已成为几年最热门的免费Java软件检索。您很快就会发现,Lucene提供了一个既简单又强大的核心的API代码需要对全文索引和搜索能力的认识。您需要了解少数以开始整合成为一个Lucene的应用。由于Lucene的是一个Java的资料库,它没有对作出有关的假设索引和搜寻赋予它的优势,超过了一些其他的搜索应用程序。新接触的Lucene的往往是错误的,它为随时可以使用的应用就像一个文件搜索程序,网页检索器,或一个网站的搜索引擎。这不是真正的Lucene: Lucene是一个软件库,一个工具包,但它并不是一个功能齐全的搜索应用软件。它是与全文索引和搜索相关。 Lucene让您的应用程序处理业务规则,与相关领域有关,而隐藏在复杂的索引和搜索执行的背后是简单易用的API 。你可以把Lucene作为一个描写成设置在顶层的软件层。一批功能齐全的搜索应用程序,已高在Lucene之一。如果您在寻找一些编译好的核心套件或一个框架,用于抓寻找文件处理,搜索,谘询Lucene的Wiki上的“动力”网页( )对于许多选项: zilverline , searchblox , nutch , larm , jsearch ,仅举几例。案例研究双方的nutch和searchblox是包括在第10章。Lucene能为您做什么Lucene可以让你把索引和搜索功能添加到您的应用程序(这些职能是描述在第 ) 。 Lucene可以编制索引和搜索作出可以转换为文本格式的任何数据。你可以看到在图 。