这个想法是这样的:
- 在文本中搜索单词
- 如果找到了这个词,那么我想在文本中找到它的位置(而不是在索引中)
我的代码:
public void methodFromStack() throws Exception {
Directory directory = new RAMDirectory();
IndexWriterConfig indexWriterConfig = new IndexWriterConfig(new StandardAnalyzer());
IndexWriter writer = new IndexWriter(directory, indexWriterConfig);
Document doc = new Document();
FieldType type = new FieldType();
type.setStoreTermVectors(true);
type.setStoreTermVectorPositions(true);
type.setStoreTermVectorOffsets(true);
type.setStored(true);
type.setIndexOptions(IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS);
Field fieldStore = new Field("tags", "Kite good world.", type);
doc.add(fieldStore);
writer.addDocument(doc);
writer.close();
DirectoryReader reader = DirectoryReader.open(directory);
IndexSearcher searcher = new IndexSearcher(reader);
//Поиск по словосочетанию с учетом отступа
QueryParser queryParser = new QueryParser("tags", new StandardAnalyzer());
Query query = queryParser.parse("\"Kite World\"~1");
TopDocs results = searcher.search(query, 1);
for ( ScoreDoc scoreDoc : results.scoreDocs) {
Fields termVs = reader.getTermVectors(scoreDoc.doc);
Terms f = termVs.terms("tags");
TermsEnum te = f.iterator();
PostingsEnum docsAndPosEnum = null;
BytesRef bytesRef;
while ((bytesRef = te.next()) != null) {
docsAndPosEnum = te.postings(docsAndPosEnum, PostingsEnum.ALL);
int nextDoc = docsAndPosEnum.nextDoc();
assert nextDoc != DocIdSetIterator.NO_MORE_DOCS;
final int fr = docsAndPosEnum.freq();
final int p = docsAndPosEnum.nextPosition();
final int o = docsAndPosEnum.startOffset();
System.out.println("Word: " + bytesRef.utf8ToString());
System.out.println("Position: "+ p + ", startOffset: " + o + " length: " + bytesRef.length + " Freg: " + fr);
if(fr > 1){
for(int iter = 1; iter <= fr-1; iter++) {
System.out.println("Possition: "+ docsAndPosEnum.nextPosition());
}
}
}
}
}
我知道在 Lucene 版本 3 之后,当切换到版本 4 时,API 发生了变化,并且类 TermFreqVector 和类 TermPositionVector 被删除,而我正在寻找任何其他方法来获得我的单词或术语,但在所有情况下我都是提供使用迭代器并传递索引中的所有术语。
告诉我如何替换迭代器?是否有可能在不通过所有元素的情况下以某种方式找到我的结果术语?
要直接转到所需的术语,请使用
seekExact
: