结构化数据
数据查询方法
索引分类
正向索引:
 “文档1”的ID > 【 {单词1:次数,位置列表},{单词2:次数,位置列表}…】
 “文档2”的ID > 【 {单词1:次数,位置列表},{单词2:次数,位置列表}…】
倒排索引:
 “关键词1”:【{“文档1”的ID,次数,位置列表},{“文档2”的ID,次数,位置列表} …】
 “关键词2”:【{“文档1”的ID,次数,位置列表},{“文档2”的ID,次数,位置列表} …】

创建索引的步骤:原始文档 > 获取文档信息 > 构建文档对象 > Lucene对文档对象进行分词 > 创建索引
搜素文档的步骤:获得查询信息 > 创建查询 > 执行查询 > 渲染结果
文档对象的创建

 文档对象由多个域(Field)组成,每个Document 可以有多个Field,不同的Document 可以有不同的Field。同一个Document 可以有相同的Field(域名和域值都相同)每个文档都有一个唯一的编号(ID)。
分词的过程
 对文档中的域进行分析:提取单词,将字母转为小写,去除标点,去除停用词生成最终的语汇单元。
原始:Lucene is a Java full-text search engine 
分词:lucene java full search engine
 每个单词叫做一个Term ,不同的域中拆分出来的相同的单词是不同的Term。Term 中包含两部分一部分是文档的域名, 另一部分是单词的内容。
public void testCreateIndex() throws Exception {
  // 创建分词器,采用标准分词器对英文支持好,中文支持差
  Analyzer analyzer = new  StandardAnalyzer();
  // 索引库位置,FSDirectory会根据当前环境使用合适的方式打开例如NIOFSDirectory、MMapDirectory等
  Directory directory = FSDirectory.open(Paths.get("E:/luceneIndex"));
  // 配置IndexWriter
  IndexWriterConfig writerConfig = new IndexWriterConfig(analyzer);
  // 创建IndexWriter
  IndexWriter indexWriter = new IndexWriter(directory, writerConfig);
  // 创建文档对象
  List<Document> documents = new ArrayList<>();
  File files = new File("E:/searchsource");
  for (File file : files.listFiles()) {
    Document document = new Document();
    document.add(new TextField("fileName", file.getName(), Store.YES));
    document.add(new TextField("filePath", file.getPath(), Store.YES));
    document.add(new LongPoint("fileSize", file.length()));
    document.add(new StoredField("fileSize",file.length()));
    document.add(new TextField("fileContent", FileUtils.readFileToString(file), Store.NO));
    documents.add(document);
  }
  // 写入文档对象
  indexWriter.addDocuments(documents);
  // 关闭indexWriter
  indexWriter.close();
}
注意:
byte[]建立正排索引SortedSet<byte[]>建立正排索引long建立正排索引SortedSet<long>建立正排索引DocValues其实是Lucene在构建倒排索引时,会额外建立一个有序的正排索引(基于document => field value的映射列表)
正排索引用于排序、聚合、分组、高亮等
DocValues只允许有一个相同域名的域【字段是非数组eg: price:80】,DocValuesSet可以设置多个相同域名不同域值【字段是数组eg: price:[100, 80]】
@Test
public void testTermSearch() throws Exception {
  Directory directory = FSDirectory.open(Paths.get("E:/luceneIndex"));
  // 创建indexReader
  DirectoryReader indexReader = DirectoryReader.open(directory);
  IndexSearcher indexSerch = new IndexSearcher(indexReader);
  // 精确查询条件
  Query query = new TermQuery(new Term("fileContent", "apache"));
  // 查询五条
  TopDocs topDocs = indexSerch.search(query,5);
  // 查询的总数
  System.out.println("totalHits:"+topDocs.totalHits);
  for (ScoreDoc scoreDoc : topDocs.scoreDocs) {
    // 获得分数
    System.out.println("score:"+scoreDoc.score);
    // 获得文档
    Document doc = indexSerch.doc(scoreDoc.doc);
    System.out.println("fileName:"+doc.get("fileName"));
    System.out.println("filePath:"+doc.get("filePath"));
    System.out.println("fileSize:"+doc.get("fileSize"));
    System.out.println("fileContent:"+doc.get("fileContent"));
  }
  indexReader.close();
}
@Test
public void testTermRangeSearch() throws Exception {
  Directory directory = FSDirectory.open(Paths.get("E:/luceneIndex"));
  // 创建indexReader
  DirectoryReader indexReader = DirectoryReader.open(directory);
  IndexSearcher indexSerch = new IndexSearcher(indexReader);
  // 范围查询条件(单词开头大于'b'结束小于'e'),不可以用于数字类型
  Query query = new TermRangeQuery("fileName",new BytesRef("b"),new BytesRef("e"),true,true);
  // 查询二十条
  TopDocs topDocs = indexSerch.search(query,20);
  // 查询的总数
  System.out.println("totalHits:"+topDocs.totalHits);
  for (ScoreDoc scoreDoc : topDocs.scoreDocs) {
    // 获得分数
    System.out.println("score:"+scoreDoc.score);
    // 获得文档
    Document doc = indexSerch.doc(scoreDoc.doc);
    System.out.println("fileName:"+doc.get("fileName"));
    System.out.println("filePath:"+doc.get("filePath"));
    System.out.println("fileSize:"+doc.get("fileSize"));
    System.out.println("fileContent:"+doc.get("fileContent"));
  }
  indexReader.close();
}
@Test
public void testNumericalRangeSearch() throws Exception {
  Directory directory = FSDirectory.open(Paths.get("E:/luceneIndex"));
  // 创建indexReader
  DirectoryReader indexReader = DirectoryReader.open(directory);
  IndexSearcher indexSerch = new IndexSearcher(indexReader);
  // 数字范围查询。newRangeQuery包含开头结尾
  // 如果想不包含请使用 Math.addExact(lowerValue[i], 1)或Math.addExact(upperValue[i], -1)处理
  Query query = LongPoint.newRangeQuery("fileSize", 100L, 800L);
  // 查询二十条
  TopDocs topDocs = indexSerch.search(query,20);
  // 查询的总数
  System.out.println("totalHits:"+topDocs.totalHits);
  for (ScoreDoc scoreDoc : topDocs.scoreDocs) {
    // 获得分数
    System.out.println("score:"+scoreDoc.score);
    // 获得文档
    Document doc = indexSerch.doc(scoreDoc.doc);
    System.out.println("fileName:"+doc.get("fileName"));
    System.out.println("filePath:"+doc.get("filePath"));
    System.out.println("fileSize:"+doc.get("fileSize"));
    System.out.println("fileContent:"+doc.get("fileContent"));
  }
  indexReader.close();
}
@Test
public void testPrefixSearch() throws Exception {
  Directory directory = FSDirectory.open(Paths.get("E:/luceneIndex"));
  // 创建indexReader
  DirectoryReader indexReader = DirectoryReader.open(directory);
  IndexSearcher indexSerch = new IndexSearcher(indexReader);
  // 前缀查询。含有前缀apa的关键词
  Query query = new PrefixQuery(new Term("fileName","apa"));
  // 查询二十条
  TopDocs topDocs = indexSerch.search(query,20);
  // 查询的总数
  System.out.println("totalHits:"+topDocs.totalHits);
  for (ScoreDoc scoreDoc : topDocs.scoreDocs) {
    // 获得分数
    System.out.println("score:"+scoreDoc.score);
    // 获得文档
    Document doc = indexSerch.doc(scoreDoc.doc);
    System.out.println("fileName:"+doc.get("fileName"));
    System.out.println("filePath:"+doc.get("filePath"));
    System.out.println("fileSize:"+doc.get("fileSize"));
    System.out.println("fileContent:"+doc.get("fileContent"));
  }
  indexReader.close();
}
@Test
public void testWildcardSearch() throws Exception {
  Directory directory = FSDirectory.open(Paths.get("E:/luceneIndex"));
  // 创建indexReader
  DirectoryReader indexReader = DirectoryReader.open(directory);
  IndexSearcher indexSerch = new IndexSearcher(indexReader);
  // 通配符查询,*代码任意个字符,?代表一位占位符
  Query query = new WildcardQuery(new Term("fileName","?a*"));
  //查询二十条
  TopDocs topDocs = indexSerch.search(query,20);
  // 查询的总数
  System.out.println("totalHits:"+topDocs.totalHits);
  for (ScoreDoc scoreDoc : topDocs.scoreDocs) {
    // 获得分数
    System.out.println("score:"+scoreDoc.score);
    // 获得文档
    Document doc = indexSerch.doc(scoreDoc.doc);
    System.out.println("fileName:"+doc.get("fileName"));
    System.out.println("filePath:"+doc.get("filePath"));
    System.out.println("fileSize:"+doc.get("fileSize"));
    System.out.println("fileContent:"+doc.get("fileContent"));
  }
  indexReader.close();
}
@Test
public void testBooleanSearch() throws Exception {
  Directory directory = FSDirectory.open(Paths.get("E:/luceneIndex"));
  // 创建indexReader
  DirectoryReader indexReader = DirectoryReader.open(directory);
  IndexSearcher indexSerch = new IndexSearcher(indexReader);
  // Occur.MUST     必须    相当于 AND
  // Occur.MUST_NOT 必须不  相当于 !
  // Occur.SHOULD   应该    相当于OR
  // Occur.FILTER   和MUST功能相同除了不参与计分,用于过滤替代了6.0之前的Filter
  Builder queryBuilder = new BooleanQuery.Builder();
  Query fileNameQuery = new TermQuery(new Term("fileName", "apache"));
  Query fileContentQuery = new TermQuery(new Term("fileContent", "java"));
  queryBuilder.add(fileNameQuery, Occur.MUST);
  queryBuilder.add(fileContentQuery, Occur.SHOULD);
  // 查询二十条
  TopDocs topDocs = indexSerch.search(queryBuilder.build(),20);
  // 查询的总数
  System.out.println("totalHits:"+topDocs.totalHits);
  for (ScoreDoc scoreDoc : topDocs.scoreDocs) {
    // 获得分数
    System.out.println("score:"+scoreDoc.score);
    // 获得文档
    Document doc = indexSerch.doc(scoreDoc.doc);
    System.out.println("fileName:"+doc.get("fileName"));
    System.out.println("filePath:"+doc.get("filePath"));
    System.out.println("fileSize:"+doc.get("fileSize"));
    System.out.println("fileContent:"+doc.get("fileContent"));
  }
  indexReader.close();
}
@Test
public void testPhraseSearch() throws Exception {
  Directory directory = FSDirectory.open(Paths.get("E:/luceneIndex"));
  // 创建indexReader
  DirectoryReader indexReader = DirectoryReader.open(directory);
  IndexSearcher indexSerch = new IndexSearcher(indexReader);
  // apache和solr相差十个单词以内
  Query query = new PhraseQuery(10,"fileName","apache","solr");
  // 查询二十条
  TopDocs topDocs = indexSerch.search(query,20);
  // 查询的总数
  System.out.println("totalHits:"+topDocs.totalHits);
  for (ScoreDoc scoreDoc : topDocs.scoreDocs) {
    // 获得分数
    System.out.println("score:"+scoreDoc.score);
    // 获得文档
    Document doc = indexSerch.doc(scoreDoc.doc);
    System.out.println("fileName:"+doc.get("fileName"));
    System.out.println("filePath:"+doc.get("filePath"));
    System.out.println("fileSize:"+doc.get("fileSize"));
    System.out.println("fileContent:"+doc.get("fileContent"));
  }
  indexReader.close();
}
@Test
public void testFuzzySearch() throws Exception {
  Directory directory = FSDirectory.open(Paths.get("E:/luceneIndex"));
  // 创建indexReader
  DirectoryReader indexReader = DirectoryReader.open(directory);
  IndexSearcher indexSerch = new IndexSearcher(indexReader);
  // 允许aahe有两个字母不同(包括增加减少)例如可以匹配apache
  Query query = new FuzzyQuery(new Term("fileName","aahe"),2);
  // 查询二十条
  TopDocs topDocs = indexSerch.search(query,20);
  // 查询的总数
  System.out.println("totalHits:"+topDocs.totalHits);
  for (ScoreDoc scoreDoc : topDocs.scoreDocs) {
    // 获得分数
    System.out.println("score:"+scoreDoc.score);
    // 获得文档
    Document doc = indexSerch.doc(scoreDoc.doc);
    System.out.println("fileName:"+doc.get("fileName"));
    System.out.println("filePath:"+doc.get("filePath"));
    System.out.println("fileSize:"+doc.get("fileSize"));
    System.out.println("fileContent:"+doc.get("fileContent"));
  }
  indexReader.close();
}
@Test
public void testQueryParserSearch() throws Exception {
  Directory directory = FSDirectory.open(Paths.get("E:/luceneIndex"));
  // 创建indexReader
  DirectoryReader indexReader = DirectoryReader.open(directory);
  IndexSearcher indexSerch = new IndexSearcher(indexReader);
  // QueryParser需要传入分词器
  QueryParser queryParser = new QueryParser("fileName",new StandardAnalyzer());
  Query query = queryParser.parse("fileName:apache AND fileContent:java");
  // 查询二十条
  TopDocs topDocs = indexSerch.search(query,20);
  // 查询的总数
  System.out.println("totalHits:"+topDocs.totalHits);
  for (ScoreDoc scoreDoc : topDocs.scoreDocs) {
    // 获得分数
    System.out.println("score:"+scoreDoc.score);
    // 获得文档
    Document doc = indexSerch.doc(scoreDoc.doc);
    System.out.println("fileName:"+doc.get("fileName"));
    System.out.println("filePath:"+doc.get("filePath"));
    System.out.println("fileSize:"+doc.get("fileSize"));
    System.out.println("fileContent:"+doc.get("fileContent"));
  }
  indexReader.close();
}
注意:queryParser.setAllowLeadingWildcard(false); 当为true时才可使用通配符和前缀查询
QueryParser查询语法:
| 符号 | 作用 | 例子 | 描述 | 
|---|---|---|---|
| + | 相当于MUST 必须满足 | +fileName:apache AND fileName:solr | 必须满足第一个条件,第二个可以不满足(第二个条件此种情况下无效,不论AND\OR) | 
| # | 相当于FILTER必须满足不计分 | #fileName:apache AND fileName:solr | 必须满足第一个条件,第二个可以不满足(第二个条件此种情况下无效,不论AND\OR) | 
| - | 相当于MUST NOT 必须不满足 | -fileName:apache AND fileName:solr | 必须不满足第一个条件,第二个必须不满足(第二个条件此种情况下必须满足不论AND\OR) | 
| AND | 并且,左右两边条件必须满足 | fileName:apache AND fileName:solr | 左右两个条件必须满足 | 
| OR | 或者,左右两边条件满足一个 | fileName:apache OR fileName:solr | 可以使用空格达到同样的效果 | 
| “语句” | 必须完全匹配语句 | fileName:”apache lucene.txt” | 必须安全匹配apache lucene | 
| */? | 通配符匹配 | fileName:apa* | 文件名单词含有apa开头的 | 
| ~ | 短语匹配 | fileName:”apache solr” ~10 | apache和solr相差十个单词以内 | 
| ~ | 模糊匹配 | fileName:apecha~0.4 | 模糊匹配,默认相似的0.5 | 
| [x TO y] | 字符串范围匹配 | fileName:[a TO b] | 使用的比较规则是字符串比较规则,先比较首字母然后依次类推,不包含使用 {} | 
| ^ | 改变提升因子 | fileName:apache^4 | fileName为apache的分数上升 | 
| 字段:(+索引 +索引) | 字段分组 | fileName:(+apache +solr) | fileName既包含apache又包含solr | 
注:DirectoryReader.open(Directory directory)打开之后并不会跟随IndexWriter的删除而感知到。DirectoryReader.openIfChanged(DirectoryReader oldReader)如果索引文件发生变化返回一个新的IndexReader,如果索引文件没有变化返回null可以进行感知;
@Test
public void testSortSearch() throws Exception {
  Directory directory = FSDirectory.open(Paths.get("E:/luceneIndex"));
  DirectoryReader indexReader = DirectoryReader.open(directory);
  IndexSearcher indexSearcher = new IndexSearcher(indexReader);
  QueryParser queryParser = new QueryParser("fileName",new StandardAnalyzer()); 
  Query query = queryParser.parse("*:* fileName:apache");
  // 按照分数排序,倒数第二个参数是是否计算分数,false不计算结果为NAN
  // 最后一个参数为是否计算最高分
  TopDocs topDocs = indexSearcher.search(query,20,Sort.RELEVANCE,true,false);
  // 查询的总数
  System.out.println("totalHits:"+topDocs.totalHits);
  for (ScoreDoc scoreDoc : topDocs.scoreDocs) {
    // 获得分数
    System.out.println("score:"+scoreDoc.score);
    // 获得文档
    Document doc = indexSearcher.doc(scoreDoc.doc);
    System.out.println("id:"+doc.get("id"));
    System.out.println("fileName:"+doc.get("fileName"));
    System.out.println("filePath:"+doc.get("filePath"));
    System.out.println("fileSize:"+doc.get("fileSize"));
    System.out.println("fileContent:"+doc.get("fileContent"));
  }
  indexReader.close();
}
Sort的默认有两种
Sort的自定义实现
//使用Long类型比较,第三个参数是否翻转排序结果
//自定义排序需要在Field上存储对应的DocValuesField才能排序,目的是提高效率
SortField sortField = new SortField("fileSize", Type.LONG, false);
Sort sort = new Sort(sortField);
TopDocs topDocs = indexSearcher.search(query,20,sort,true,false);
使用自定义比较规则
public class MyCoustomFieldComparatorSource extends FieldComparatorSource {
  /**
   * 创建自定义比较器:
   * fieldname	字段名
   * numHits		查出总记录数
   * reversed		是否翻转
   */
  @Override
  public FieldComparator<?> newComparator(String fieldname, int numHits, int sortPos, boolean reversed) {
    return new MyCoustomFieldComparator(fieldname,numHits);
  }
  
  private class MyCoustomFieldComparator extends SimpleFieldComparator<String> {
  
    private String values[];
    private String fieldName;
    private String top;
    private String bottom;
    private LeafReaderContext leafReaderContext;
  
    public MyCoustomFieldComparator(String fieldname,int numHits) {
      this.fieldName = fieldname; 
      this.values = new String[numHits];
    }
  
    // 设置LeafReaderContext
    @Override
    protected void doSetNextReader(LeafReaderContext context) throws IOException {
      this.leafReaderContext = context;
    }
  
    // 存储所有值slot当前下标,doc文档ID
    @Override
    public void copy(int slot, int doc) throws IOException {
      values[slot] = leafReaderContext.reader().document(doc).getField(fieldName).stringValue();
    }
  
    // 比较两个值,slot1,slot2下标
    @Override
    public int compare(int slot1, int slot2) {
      return Integer.compare(values[slot1].length(), values[slot2].length());
    }
  
    // 得到值,slot下标
    @Override
    public String value(int slot) {
      return values[slot];
    }
  
    // 设置最小值,slot最小值下标
    @Override
    public void setBottom(int slot) throws IOException {
      bottom = values[slot];
    }
  
    // 设置最大值,value最大值
    @Override
    public void setTopValue(String value) {
      this.top = value;
    }
  
  
    // 和最小值比较,doc当前要比较的文档ID
    @Override
    public int compareBottom(int doc) throws IOException {
      String currentFileName = leafReaderContext.reader().document(doc).getField(fieldName).stringValue();
      return Integer.compare(bottom.length(), currentFileName.length());
    }
  
    // 和最大值比较,doc当前要比较的文档ID
    @Override
    public int compareTop(int doc) throws IOException {
      String currentFileName = leafReaderContext.reader().document(doc).getField(fieldName).stringValue();
      return Integer.compare(top.length(), currentFileName.length());
    }
  }
}
@Test
public void testSortSearch2() throws Exception {
  Directory directory = FSDirectory.open(Paths.get("E:/luceneIndex"));
  DirectoryReader indexReader = DirectoryReader.open(directory);
  IndexSearcher indexSearcher = new IndexSearcher(indexReader);
  QueryParser queryParser = new QueryParser("",new StandardAnalyzer()); 
  Query query = queryParser.parse("*:*");
  // 创建自定义比较器
  MyCoustomFieldComparatorSource myCoustomFieldComparatorSource = new MyCoustomFieldComparatorSource();
  // 创建SortField使用自定义比较器
  SortField sortField = new SortField("fileName", myCoustomFieldComparatorSource);
  Sort sort = new Sort(sortField);
  TopDocs topDocs = indexSearcher.search(query,20,sort,true,false);
  // 查询的总数
  System.out.println("totalHits:"+topDocs.totalHits);
  for (ScoreDoc scoreDoc : topDocs.scoreDocs) {
    // 获得分数
    System.out.println("score:"+scoreDoc.score);
    // 获得文档
    Document doc = indexSearcher.doc(scoreDoc.doc);
    System.out.println("id:"+doc.get("id"));
    System.out.println("fileName:"+doc.get("fileName"));
    System.out.println("filePath:"+doc.get("filePath"));
    System.out.println("fileSize:"+doc.get("fileSize"));
    System.out.println("fileContent:"+doc.get("fileContent"));
  }
  indexReader.close();
}
@Test
public void testhighlighterSearch() throws Exception {
  Directory directory = FSDirectory.open(Paths.get("E:/luceneIndex"));
  DirectoryReader indexReader = DirectoryReader.open(directory);
  IndexSearcher indexSearcher = new IndexSearcher(indexReader);
  Analyzer analyzer = new StandardAnalyzer();
  QueryParser queryParser = new QueryParser("fileName",analyzer); 
  Query query = queryParser.parse("fileName:apache");
  // 高亮相关部分创建
  QueryScorer queryScorer = new QueryScorer(query);	
  // 简单的分段器
  Fragmenter fragmenter = new SimpleFragmenter();
  // 格式
  SimpleHTMLFormatter simpleHTMLFormatter = new SimpleHTMLFormatter("<font color='red'>","</font>");
  // 高亮查询器
  Highlighter highlighter = new Highlighter(simpleHTMLFormatter, queryScorer);
  // 设置分段,会将长内容拿出含有关键字的部分
  highlighter.setTextFragmenter(fragmenter);	
  TopDocs topDocs = indexSearcher.search(query, 20);
  // 查询的总数
  System.out.println("totalHits:"+topDocs.totalHits);
  for (ScoreDoc scoreDoc : topDocs.scoreDocs) {
    // 获得分数
    System.out.println("score:"+scoreDoc.score);
    // 获得文档
    Document doc = indexSearcher.doc(scoreDoc.doc);
    System.out.println("id:"+doc.get("id"));
    String fileName = doc.get("fileName");
    System.out.println("fileName:"+ fileName);
    if(fileName != null) {
      // 得到对应的TokenStream
      TokenStream tokenStream = analyzer.tokenStream("fileName", fileName);  
      // 拿出得分最高的段
      String highLightText = highlighter.getBestFragment(tokenStream,fileName);  
      System.out.println(highLightText);  
    }
    System.out.println("filePath:"+doc.get("filePath"));
    System.out.println("fileSize:"+doc.get("fileSize"));
    String fileContent = doc.get("fileContent");
    if(fileContent != null) {
      // 得到对应的TokenStream
      TokenStream tokenStream = analyzer.tokenStream("fileContent", fileContent);  
      // 拿出得分最高的段
      String highLightText = highlighter.getBestFragment(tokenStream, fileContent);  
      System.out.println(highLightText);  
    }
  }
  indexReader.close();
}
/**
 * 使用FastVectorHighlighter前置条件是字段创建时存储相关信息,牺牲存储换取时间
 * FieldType fileNameFieldType =  new FieldType();
 * fileNameFieldType.setIndexOptions(IndexOptions.DOCS_AND_FREQS_AND_POSITIONS);
 * fileNameFieldType.setTokenized(true);
 * fileNameFieldType.setStored(true);
 * fileNameFieldType.setStoreTermVectorOffsets(true);    //记录相对增量
 * fileNameFieldType.setStoreTermVectorPositions(true);  //记录位置信息
 * fileNameFieldType.setStoreTermVectors(true);          //存储向量信息
 * fileNameFieldType.freeze();	                         //阻止改动信息
 * Field fileNameField = new Field("fileName", file.getName(), fileNameFieldType);
 * document.add(fileNameField);
 */
@Test
public void testfastVectorHighlighterSearch() throws Exception {
  Directory directory = FSDirectory.open(Paths.get("E:/luceneIndex"));
  DirectoryReader indexReader = DirectoryReader.open(directory);
  IndexSearcher indexSearcher = new IndexSearcher(indexReader);
  Analyzer analyzer = new StandardAnalyzer();
  QueryParser queryParser = new QueryParser("fileName",analyzer); 
  Query query = queryParser.parse("fileName:apache");
  TopDocs topDocs = indexSearcher.search(query, 20);
  // 高亮相关部分创建
  FragListBuilder fragListBuilder = new SimpleFragListBuilder();
  FragmentsBuilder fragmentsBuilder = new SimpleFragmentsBuilder(
                               BaseFragmentsBuilder.COLORED_PRE_TAGS,
                               BaseFragmentsBuilder.COLORED_POST_TAGS);
  FastVectorHighlighter fastVectorHighlighter = new FastVectorHighlighter(
                               true, true,fragListBuilder,fragmentsBuilder); 
  FieldQuery fieldquery = fastVectorHighlighter.getFieldQuery(query);
  // 查询的总数
  System.out.println("totalHits:"+topDocs.totalHits);
  for (ScoreDoc scoreDoc : topDocs.scoreDocs) {
    // 获得分数
    System.out.println("score:"+scoreDoc.score);
    // 获得文档
    Document doc = indexSearcher.doc(scoreDoc.doc);
    System.out.println("id:"+doc.get("id"));
    String fileName = doc.get("fileName");
    System.out.println("fileName:"+ fileName);
    if(fileName != null) {
      String highLightText = fastVectorHighlighter.getBestFragment( 
        fieldquery, indexSearcher.getIndexReader(),
        scoreDoc.doc, "fileName",100);
      System.out.println(highLightText);
    }
    System.out.println("filePath:"+doc.get("filePath"));
    System.out.println("fileSize:"+doc.get("fileSize"));
    String fileContent = doc.get("fileContent");
    if(fileContent != null) {
      String highLightText = fastVectorHighlighter.getBestFragment( 
        fieldquery, indexSearcher.getIndexReader(),
        scoreDoc.doc, "fileContent", 100);
      System.out.println(highLightText);
    }
  }
  indexReader.close();
}
TopDocs topDocs = indexSerch.search(query,200);
// 获取第十条到第十五条记录
int startPos =10;
int endPos =15;
// 查询的总数
System.out.println("totalHits:"+topDocs.totalHits);
for (int i = startPos; i < endPos; i++) {
  ScoreDoc scoreDoc = topDocs.scoreDocs[i];
  // 获得分数
  System.out.println("score:"+scoreDoc.score);
  // 获得文档
  Document doc = indexSerch.doc(scoreDoc.doc);
  System.out.println("fileName:"+doc.get("fileName"));
  System.out.println("filePath:"+doc.get("filePath"));
  System.out.println("fileSize:"+doc.get("fileSize"));
  System.out.println("fileContent:"+doc.get("fileContent"));
}
ScoreDoc  currentScoreDoc = null;
// 每页数量
int pageSize = 5;
// 要查询第几页
int nextPage = 3;
if(nextPage != 1) {
  // 获取上一页的最后是第多少条  
  int num = pageSize*(nextPage-1);
  TopDocs td = indexSerch.search(query, num);  
  currentScoreDoc = td.scoreDocs[num-1];  
}
// 在最后一页的基础上在查几条
TopDocs topDocs = indexSerch.searchAfter(currentScoreDoc, query, pageSize);
System.out.println("totalHits:"+topDocs.totalHits);
for (ScoreDoc scoreDoc : topDocs.scoreDocs) {
  // 获得分数
  System.out.println("score:"+scoreDoc.score);
  // 获得文档
  Document doc = indexSerch.doc(scoreDoc.doc);
  System.out.println("fileName:"+doc.get("fileName"));
  System.out.println("filePath:"+doc.get("filePath"));
  System.out.println("fileSize:"+doc.get("fileSize"));
  System.out.println("fileContent:"+doc.get("fileContent"));
}
@Test
public void testDeleteIndex() throws Exception {
  Analyzer analyzer = new  StandardAnalyzer();	
  // 索引库位置
  Directory directory = NIOFSDirectory.open(Paths.get("E:/luceneIndex"));
  // 配置IndexWriter
  IndexWriterConfig writerConfig = new IndexWriterConfig(analyzer);
  // 创建IndexWriter
  IndexWriter indexWriter = new IndexWriter(directory, writerConfig);
  // 删除所有索引
  indexWriter.deleteAll();
  // 提交删除,会清空所有
  indexWriter.commit();
  indexWriter.close();
}
@Test
public void testRangeDeleteIndex() throws Exception {
  Analyzer analyzer = new StandardAnalyzer();	
  // 索引库位置
  Directory directory = FSDirectory.open(Paths.get("E:/luceneIndex"));
  // 配置IndexWriter
  IndexWriterConfig writerConfig = new IndexWriterConfig(analyzer);
  // 创建IndexWriter
  IndexWriter indexWriter = new IndexWriter(directory, writerConfig);
  Query query = LongPoint.newRangeQuery("fileSize",  1000, 100000);
  // 删除
  indexWriter.deleteDocuments(query);
  // 提交删除,会将删除的内容存放与“回收站中”
  indexWriter.commit();
  // 在未关闭indexWriter的时间段,可以查询到总条数,总可查询条数,总删除条数
  Thread.sleep(100000L);
  indexWriter.close();
}
System.out.println("maxDoc:"+indexReader.maxDoc());
System.out.println("numDocs:"+indexReader.numDocs());
System.out.println("numDeletedDocs:"+indexReader.numDeletedDocs());
修改索引的本质是删除匹配的索引添加新的索引
@Test
public void testModifyIndex() throws Exception {
  Analyzer analyzer = new StandardAnalyzer();	
  // 索引库位置
  Directory directory = FSDirectory.open(Paths.get("E:/luceneIndex"));
  // 配置IndexWriter
  IndexWriterConfig writerConfig = new IndexWriterConfig(analyzer);
  // 创建IndexWriter
  IndexWriter indexWriter = new IndexWriter(directory, writerConfig);
  // 要修改的Term
  Term termOld = new Term("fileName","apache");
  // 要添加的Document
  Document newDoc = new Document();
  newDoc.add(new TextField("fileName", "GOGOGO", Store.YES));
  newDoc.add(new TextField("fileContent", "GOGOGO WOWOWO", Store.YES));
  // 更新
  indexWriter.updateDocument(termOld, newDoc);
  // 提交
  indexWriter.commit();
  indexWriter.close();
}

TokenStream主要存储的元素信息:
@Test
public void testNearRealTimeSearching() throws Exception {
  Directory directory = FSDirectory.open(Paths.get("E:/luceneIndex"));
  StandardAnalyzer standardAnalyzer = new StandardAnalyzer();
  IndexWriterConfig indexWriterConfig = new IndexWriterConfig(standardAnalyzer);
  IndexWriter indexWriter = new IndexWriter(directory, indexWriterConfig);
  // 创建SearchManager实现近实时搜索
  SearcherManager searcherManager = new SearcherManager(indexWriter,true,true,new SearcherFactory());
  // 将SearcherManager给ControlledRealTimeReopenThread管理用于自动更新索引
  ControlledRealTimeReopenThread<IndexSearcher> CRTReopenThread = 
    new ControlledRealTimeReopenThread<IndexSearcher>(indexWriter, searcherManager,5.0, 0.025) ;
  // 设置为后台进程
  CRTReopenThread.setDaemon(true);
  CRTReopenThread.setName("lucene后台刷新服务");
  CRTReopenThread.start();
  //从SearchMananger得到IndexSearch
  IndexSearcher indexSearcher = searcherManager.acquire();
}
IndexWriter下的,不要手动调用,lucene会在合适的时候自己调用优化结构
读取各种各样的文档会使用对应的工具类例如,先可以使用Tika来读取各种文档信息,Tika实现了可视化界面用于查看信息官网提供了tika-app-版本号.jar的jar包下载
Tika使用方式一
static private Tika tika = new Tika();
static public String fileToTxt(File file,Metadata metadata) throws IOException, TikaException {
  // Metadata 元数据信息
  return tika.parseToString(new FileInputStream(file),metadata);
}
Tika使用方式二
static private Parser parser;
static private ContentHandler contentHandler;
static private ParseContext parseContext;
static {
  // 自动解析器
  parser = new AutoDetectParser();
  // 创建ContentHandler,解析结果为XML格式,只解析内容
  contentHandler = new BodyContentHandler();
  // 创建Context
  parseContext = new ParseContext();
  parseContext.set(Parser.class,parser);
}
public static String fileToTxt(File file,Metadata metadata) throws IOException, SAXException, TikaException {
  parser.parse(new FileInputStream(file),contentHandler, metadata,parseContext);
  return contentHandler.toString();
}