[C#]Lucene.net–搜尋數字範圍問題以及暫時解答
2012-10-05
上一片文章提到過有關於日期搜尋 透過 TermRangeQuery 進行日期搜尋
其中 我們也可以對一般數字欄位 譬如 年紀、數量 進行搜尋…
資料 :
{"Id":"9",
"Memo":"當麻左手凌空劈出,右掌跟著迅捷之極的劈出,左手掌力先發後到,右手掌力後發先到,兩股力道交錯而前,詭異之極",
"Birthday":"1900-01-10T00:00:00",
"Age":9}
資料有 :
1~1200 Id,Age 欄位也就是 1~1200
11001~12000 id,Age 欄位也就是 11001~12000
這時候我進行建立索引..
Stopwatch sw = new Stopwatch();
// 讀取所有資料
var di = new DirectoryInfo(AppDomain.CurrentDomain.BaseDirectory + "\\Source\\");
sw.Start();
var allObjects = di.GetFiles().Select(
x => JObject.Parse((File.ReadAllText(x.FullName)))).ToArray();
//Index 存放路徑
string indexPath = AppDomain.CurrentDomain.BaseDirectory + "\\Index1\\";
FSDirectory dir = FSDirectory.Open(new DirectoryInfo(indexPath));
//IndexWriter
IndexWriter indexWriter = new IndexWriter(dir, new StandardAnalyzer(Lucene.Net.Util.Version.LUCENE_29), true, IndexWriter.MaxFieldLength.UNLIMITED);
// 還原且加入需做 index 的欄位
foreach (JObject ds in allObjects)
{
Document doc = new Document();
// 把每一個欄位都建立索引
Field f_Id = new Field("Id", ds["Id"].ToString(), Field.Store.YES, Field.Index.ANALYZED, Field.TermVector.NO);
Field f_Age = new Field("Age", ds["Age"].ToString(), Field.Store.YES, Field.Index.ANALYZED, Field.TermVector.NO);
Field f_Memo = new Field("Memo", ds["Memo"].ToString(), Field.Store.YES, Field.Index.ANALYZED, Field.TermVector.NO);
Field f_BirthDay = new Field("BirthDay", DateTime.Parse(ds["Birthday"].ToString()).ToString("yyyyMMdd"), Field.Store.YES, Field.Index.ANALYZED, Field.TermVector.NO);
doc.Add(f_Id); doc.Add(f_Age); doc.Add(f_Memo); doc.Add(f_BirthDay);
indexWriter.AddDocument(doc);
}
indexWriter.Optimize();
indexWriter.Commit();
indexWriter.Close();
sw.Stop();
Response.Write("建立" + allObjects.Length + "筆索引花費時間: " + sw.Elapsed + "");
之後我使用 TermRangeQuery 進行搜尋…
// 啟用監看
Stopwatch sw = new Stopwatch();
sw.Start();
// 讀取索引
string indexPath = AppDomain.CurrentDomain.BaseDirectory.ToString() + "\\Index1\\";
DirectoryInfo dirInfo = new DirectoryInfo(indexPath);
FSDirectory dir = FSDirectory.Open(dirInfo);
IndexSearcher search = new IndexSearcher(dir, true);
// 針對 Memo 欄位進行搜尋
QueryParser parser = new QueryParser(Lucene.Net.Util.Version.LUCENE_29, "Age", new StandardAnalyzer(Lucene.Net.Util.Version.LUCENE_29));
// 搜尋 Age 範圍
Query query = new TermRangeQuery("Age", "11", "20", true, true);
Sort sort = new Sort(new SortField("Id", 4));
// 開始搜尋
var hits = search.Search(query, null, search.MaxDoc(),sort).ScoreDocs;
sw.Stop();
Response.Write("花費時間:" + sw.Elapsed + "<br /><hr />");
Response.Write("資料比數:" + hits.Length + "<br /><hr />");
Response.Write("Result:<br />");
foreach (var res in hits)
{
Response.Write("Id:" + search.Doc(res.doc).Get("Id") + " BirthDay=" + search.Doc(res.doc).Get("BirthDay") + " Memo=" + search.Doc(res.doc).Get("Memo").ToString().Replace(txtKeyword.Text, "<span style='color:red'>" + txtKeyword.Text + "</span>") + "<br />");
}
非常溫馨,跟那一篇搜尋日期一樣..
我搜尋 11 ~20 後竟然出現…
天啊…花惹發~~ 這這這他的搜尋模式跟我們要用的定義是有點不一樣的…
所以 11~20 會有包含 11xxx~ 20xxxx
這時候該怎麼辦…
網路上我找了一陣子幾乎都沒有看到我需要的作法…
山不轉路轉,先解決老闆的需求..
我寫了一支 function
public string ConvertSearchNumver(string num)
{
return NumericUtils.DoubleToSortableLong(double.Parse(num) + 100000000000).ToString();
}
並且我在 index 的時候對 Age 欄位動手腳..
Field f_Age = new Field("Age",ConvertSearchNumver(ds["Age"].ToString()), Field.Store.YES, Field.Index.ANALYZED, Field.TermVector.NO);
這時候當然我搜尋也改變.. 進去前也是先處理過…
// 啟用監看
Stopwatch sw = new Stopwatch();
sw.Start();
// 讀取索引
string indexPath = AppDomain.CurrentDomain.BaseDirectory.ToString() + "\\Index4\\";
DirectoryInfo dirInfo = new DirectoryInfo(indexPath);
FSDirectory dir = FSDirectory.Open(dirInfo);
IndexSearcher search = new IndexSearcher(dir, true);
// 針對 Memo 欄位進行搜尋
QueryParser parser = new QueryParser(Lucene.Net.Util.Version.LUCENE_29, "Age", new StandardAnalyzer(Lucene.Net.Util.Version.LUCENE_29));
// 搜尋 Age 範圍
Query query = new TermRangeQuery("Age", ConvertSearchNumver("11"), ConvertSearchNumver("20"), true, true);
Sort sort = new Sort(new SortField("Id", 4));
// 開始搜尋
var hits = search.Search(query, null, search.MaxDoc(), sort).ScoreDocs;
sw.Stop();
Response.Write("花費時間:" + sw.Elapsed + "<br /><hr />");
Response.Write("資料比數:" + hits.Length + "<br /><hr />");
Response.Write("Result:<br />");
foreach (var res in hits)
{
Response.Write("Id:" + search.Doc(res.doc).Get("Id") + " BirthDay=" + search.Doc(res.doc).Get("BirthDay") + " Memo=" + search.Doc(res.doc).Get("Memo").ToString().Replace(txtKeyword.Text, "<span style='color:red'>" + txtKeyword.Text + "</span>") + "<br />");
}
結果:
OK~ 結果正確…
如果有朋友找到更好的解法,也歡迎提供給我…
標籤:
C#
,
Lucene.net
-- Yesterday I wrote down the code. I bet I could be your hero. I am a mighty little programmer. 如果這篇文章有幫助到您,簡單留個言,或是幫我按個讚,讓我有寫下去的動力...