2012-10-05

[C#]Lucene.net–搜尋數字範圍問題以及暫時解答

 

上一片文章提到過有關於日期搜尋  透過 TermRangeQuery 進行日期搜尋

其中 我們也可以對一般數字欄位 譬如 年紀、數量 進行搜尋…

資料 :

{"Id":"9",
"Memo":"當麻左手凌空劈出,右掌跟著迅捷之極的劈出,左手掌力先發後到,右手掌力後發先到,兩股力道交錯而前,詭異之極",
"Birthday":"1900-01-10T00:00:00",
"Age":9}


資料有 :


1~1200 Id,Age 欄位也就是 1~1200


11001~12000  id,Age 欄位也就是 11001~12000



這時候我進行建立索引..



Stopwatch sw = new Stopwatch();
// 讀取所有資料
 var di = new DirectoryInfo(AppDomain.CurrentDomain.BaseDirectory + "\\Source\\");
sw.Start();
var allObjects = di.GetFiles().Select(
    x => JObject.Parse((File.ReadAllText(x.FullName)))).ToArray();
//Index 存放路徑
 string indexPath = AppDomain.CurrentDomain.BaseDirectory + "\\Index1\\";
FSDirectory dir = FSDirectory.Open(new DirectoryInfo(indexPath));
//IndexWriter
IndexWriter indexWriter = new IndexWriter(dir, new StandardAnalyzer(Lucene.Net.Util.Version.LUCENE_29), true, IndexWriter.MaxFieldLength.UNLIMITED);
// 還原且加入需做 index 的欄位
 foreach (JObject ds in allObjects)
{
    Document doc = new Document();
    // 把每一個欄位都建立索引
    Field f_Id = new Field("Id", ds["Id"].ToString(), Field.Store.YES, Field.Index.ANALYZED, Field.TermVector.NO);
    Field f_Age = new Field("Age", ds["Age"].ToString(), Field.Store.YES, Field.Index.ANALYZED, Field.TermVector.NO);
    Field f_Memo = new Field("Memo", ds["Memo"].ToString(), Field.Store.YES, Field.Index.ANALYZED, Field.TermVector.NO);
    Field f_BirthDay = new Field("BirthDay", DateTime.Parse(ds["Birthday"].ToString()).ToString("yyyyMMdd"), Field.Store.YES, Field.Index.ANALYZED, Field.TermVector.NO);
    doc.Add(f_Id); doc.Add(f_Age); doc.Add(f_Memo); doc.Add(f_BirthDay);
    indexWriter.AddDocument(doc);
    
}
indexWriter.Optimize();
indexWriter.Commit();
indexWriter.Close();
sw.Stop();
Response.Write("建立" + allObjects.Length + "筆索引花費時間: " + sw.Elapsed + "");


之後我使用 TermRangeQuery  進行搜尋…



 

// 啟用監看
 Stopwatch sw = new Stopwatch();
sw.Start();
 
// 讀取索引
 string indexPath = AppDomain.CurrentDomain.BaseDirectory.ToString() + "\\Index1\\";
DirectoryInfo dirInfo = new DirectoryInfo(indexPath);
FSDirectory dir = FSDirectory.Open(dirInfo);
IndexSearcher search = new IndexSearcher(dir, true);
// 針對 Memo 欄位進行搜尋
 QueryParser parser = new QueryParser(Lucene.Net.Util.Version.LUCENE_29, "Age", new StandardAnalyzer(Lucene.Net.Util.Version.LUCENE_29));
 
// 搜尋 Age 範圍
 Query query = new TermRangeQuery("Age", "11", "20", true, true);
 
Sort sort = new Sort(new SortField("Id", 4));
 
// 開始搜尋
 var hits = search.Search(query, null, search.MaxDoc(),sort).ScoreDocs;
 
sw.Stop();
Response.Write("花費時間:" + sw.Elapsed + "<br /><hr />");
Response.Write("資料比數:" + hits.Length + "<br /><hr />");
Response.Write("Result:<br />");
 
foreach (var res in hits)
{
  
    Response.Write("Id:" + search.Doc(res.doc).Get("Id") + "  BirthDay=" + search.Doc(res.doc).Get("BirthDay") + "  Memo=" + search.Doc(res.doc).Get("Memo").ToString().Replace(txtKeyword.Text, "<span style='color:red'>" + txtKeyword.Text + "</span>") + "<br />");
}


 



非常溫馨,跟那一篇搜尋日期一樣..



我搜尋  11 ~20 後竟然出現…



 



2012-10-05_104814



天啊…花惹發~~ 這這這他的搜尋模式跟我們要用的定義是有點不一樣的…


所以 11~20 會有包含 11xxx~ 20xxxx


這時候該怎麼辦…



網路上我找了一陣子幾乎都沒有看到我需要的作法…


山不轉路轉,先解決老闆的需求..


我寫了一支 function



public string ConvertSearchNumver(string num)
{
    return NumericUtils.DoubleToSortableLong(double.Parse(num) + 100000000000).ToString(); 
}

並且我在 index 的時候對 Age 欄位動手腳..



 

Field f_Age = new Field("Age",ConvertSearchNumver(ds["Age"].ToString()), Field.Store.YES, Field.Index.ANALYZED, Field.TermVector.NO);


這時候當然我搜尋也改變.. 進去前也是先處理過…



 

// 啟用監看
 Stopwatch sw = new Stopwatch();
 sw.Start();
 
 // 讀取索引
 string indexPath = AppDomain.CurrentDomain.BaseDirectory.ToString() + "\\Index4\\";
 DirectoryInfo dirInfo = new DirectoryInfo(indexPath);
 FSDirectory dir = FSDirectory.Open(dirInfo);
 IndexSearcher search = new IndexSearcher(dir, true);
 // 針對 Memo 欄位進行搜尋
 QueryParser parser = new QueryParser(Lucene.Net.Util.Version.LUCENE_29, "Age", new StandardAnalyzer(Lucene.Net.Util.Version.LUCENE_29));
 
 // 搜尋 Age 範圍
 Query query = new TermRangeQuery("Age", ConvertSearchNumver("11"), ConvertSearchNumver("20"), true, true);
 
 Sort sort = new Sort(new SortField("Id", 4));
 
 // 開始搜尋
  var hits = search.Search(query, null, search.MaxDoc(), sort).ScoreDocs;
 
 sw.Stop();
 Response.Write("花費時間:" + sw.Elapsed + "<br /><hr />");
 Response.Write("資料比數:" + hits.Length + "<br /><hr />");
 Response.Write("Result:<br />");
 
 foreach (var res in hits)
 {
 
     Response.Write("Id:" + search.Doc(res.doc).Get("Id") + "  BirthDay=" + search.Doc(res.doc).Get("BirthDay") + "  Memo=" + search.Doc(res.doc).Get("Memo").ToString().Replace(txtKeyword.Text, "<span style='color:red'>" + txtKeyword.Text + "</span>") + "<br />");
 }

 



結果:


2012-10-05_104835


OK~ 結果正確…


如果有朋友找到更好的解法,也歡迎提供給我…