2012-10-04

[C#]Lucene.net–透過 NumericField , NumericRangeQuery 建立數字範圍索引,搜尋數字範圍


上次 寫一篇有關於 Lucene.net–搜尋數字範圍問題以及暫時解答 之後..
有一位前輩 sholfen 給了我一個 關鍵字提示 NumericField
我上網查了一下文件,果然這就是我要的東西.. 在也不用利用小技巧來解決數字的問題了.. OH~Ya..
感謝 sholfen 大大~果然寫 blog 也可以學東西..

資料概述

1~1200 Id,Age 欄位也就是 1~1200
11001~12000  id,Age 欄位也就是 11001~12000
結構為
{

"Id":"9",

"Memo":"當麻左手凌空劈出,右掌跟著迅捷之極的劈出,左手掌力先發後到,右手掌力後發先到,兩股力道交錯而前,詭異之極",

"Birthday":"1900-01-10T00:00:00",

"Age":9

}



建立數字索引欄位


之前建立索引方式為


Field f_Age = new Field("Age", ds["Age"].ToString(), Field.Store.YES, Field.Index.ANALYZED, Field.TermVector.NO);

這時候我們要使用 NumericField  來建立..


NumericField f_Age = new NumericField("Age", Field.Store.YES, true);

f_Age.SetIntValue(int.Parse(ds["Age"].ToString()));


C# Code:


Stopwatch sw = new Stopwatch();

// 讀取所有資料

var di = new DirectoryInfo(AppDomain.CurrentDomain.BaseDirectory + "\\Source\\");

sw.Start();

var allObjects = di.GetFiles().Select(

    x => JObject.Parse((File.ReadAllText(x.FullName)))).ToArray();

//Index 存放路徑

string indexPath = AppDomain.CurrentDomain.BaseDirectory + "\\Index5\\";

FSDirectory dir = FSDirectory.Open(new DirectoryInfo(indexPath));

//IndexWriter

IndexWriter indexWriter = new IndexWriter(dir, new StandardAnalyzer(Lucene.Net.Util.Version.LUCENE_29), true, IndexWriter.MaxFieldLength.UNLIMITED);

// 還原且加入需做 index 的欄位

foreach (JObject ds in allObjects)

{

    Document doc = new Document();

    // 把每一個欄位都建立索引

    Field f_Id = new Field("Id", ds["Id"].ToString(), Field.Store.YES, Field.Index.ANALYZED, Field.TermVector.NO);

    Field f_Memo = new Field("Memo", ds["Memo"].ToString(), Field.Store.YES, Field.Index.ANALYZED, Field.TermVector.NO);

    Field f_BirthDay = new Field("BirthDay", DateTime.Parse(ds["Birthday"].ToString()).ToString("yyyyMMdd"), Field.Store.YES, Field.Index.ANALYZED, Field.TermVector.NO);





    // 建立數字型索引欄位

    NumericField f_Age = new NumericField("Age", Field.Store.YES, true);

    f_Age.SetIntValue(int.Parse(ds["Age"].ToString()));

    

    doc.Add(f_Id); doc.Add(f_Age); doc.Add(f_Memo); doc.Add(f_BirthDay);

    indexWriter.AddDocument(doc);



}

indexWriter.Optimize();

indexWriter.Commit();

indexWriter.Close();

sw.Stop();

Response.Write("建立" + allObjects.Length + "筆索引花費時間: " + sw.Elapsed + "");



搜尋數字範圍


這時候搜尋用 NumericRangeQuery


// 搜尋 Age 範圍

NumericRangeQuery nquery = NumericRangeQuery.NewIntRange("Age", 11, 20, true, true); 

C# Code :


// 啟用監看

Stopwatch sw = new Stopwatch();

sw.Start();



// 讀取索引

string indexPath = AppDomain.CurrentDomain.BaseDirectory.ToString() + "\\Index5\\";

DirectoryInfo dirInfo = new DirectoryInfo(indexPath);

FSDirectory dir = FSDirectory.Open(dirInfo);

IndexSearcher search = new IndexSearcher(dir, true);

// 針對 Memo 欄位進行搜尋

QueryParser parser = new QueryParser(Lucene.Net.Util.Version.LUCENE_29, "Age", new StandardAnalyzer(Lucene.Net.Util.Version.LUCENE_29));



// 搜尋 Age 範圍

NumericRangeQuery nquery = NumericRangeQuery.NewIntRange("Age", 11, 20, true, true); 



Sort sort = new Sort(new SortField("Id", 4));



// 開始搜尋

var hits = search.Search(nquery, null, search.MaxDoc(), sort).ScoreDocs;



sw.Stop();

Response.Write("花費時間:" + sw.Elapsed + "<br /><hr />");

Response.Write("資料比數:" + hits.Length + "<br /><hr />");

Response.Write("Result:<br />");



foreach (var res in hits)

{



    Response.Write("Id:" + search.Doc(res.doc).Get("Id") + "  BirthDay=" + search.Doc(res.doc).Get("BirthDay") + "  Memo=" + search.Doc(res.doc).Get("Memo") + "<br />");

}


結果


2012-10-04_154348



OY~ YA~~

終於不用再用用那轉換方法 來做到 ..^^

Source:





參考文章:

http://sholfen.pixnet.net/blog/post/42417709

http://stackoverflow.com/questions/7866376/lucene-searching-for-a-numeric-value-field