[C#] Lucene.net–對於搜尋結果進行排序
2012-10-05
上次有篇文章 如何從大量 JSON 檔案中找尋關鍵字 (Lucene.net 篇 - 關鍵字搜尋)
有朋友提到說,為什麼搜尋結果怪怪的,跟之前幾篇..
如何從大量 JSON 檔案中找尋關鍵字 (JSON.net 還原篇) 為什麼不同..
其實是因為搜尋出來的結果,並沒有排序,這篇文章我們來看看怎麼進行排序 …
看一下原本的搜尋..
C# code:
// 啟用監看Stopwatch sw = new Stopwatch();
sw.Start();// 讀取索引string indexPath = AppDomain.CurrentDomain.BaseDirectory.ToString() + "\\Index1\\";
DirectoryInfo dirInfo = new DirectoryInfo(indexPath);
FSDirectory dir = FSDirectory.Open(dirInfo);
IndexSearcher search = new IndexSearcher(dir, true);// 針對 Memo 欄位進行搜尋QueryParser parser = new QueryParser(Lucene.Net.Util.Version.LUCENE_29, "Memo", new StandardAnalyzer(Lucene.Net.Util.Version.LUCENE_29));
// 搜尋的關鍵字Query query = parser.Parse(txtKeyword.Text);
// 開始搜尋var hits = search.Search(query, null, search.MaxDoc()).ScoreDocs;
sw.Stop();Response.Write(" 花費時間:" + sw.Elapsed + "<br /><hr />");
Response.Write(" 資料比數:" + hits.Length + "<br /><hr />");
Response.Write("Result:<br />");
foreach (var res in hits){Response.Write("Id:" + search.Doc(res.doc).Get("Id") + " Memo=" + search.Doc(res.doc).Get("Memo").ToString().Replace(txtKeyword.Text, "<span style='color:red'>" + txtKeyword.Text + "</span>") + "<br />");
}
搜出來結果:
搜尋資料簡述:
因為方便測試我已經將 10 檔案 濃縮為 2200 個檔案 1~1200 為天龍八部隨機取出文字作為範例、11001~12000 為射鵰英雄傳隨機取出範例,其中編號 9,1199,11009,11999 都加入當麻字樣方便測試
這時候我們需要進行按照數字排序,這時候要加入 Sort
Sort sort = new Sort(new SortField("Id", 4));
C# Code :
// 啟用監看Stopwatch sw = new Stopwatch();
sw.Start();// 讀取索引string indexPath = AppDomain.CurrentDomain.BaseDirectory.ToString() + "\\Index1\\";
DirectoryInfo dirInfo = new DirectoryInfo(indexPath);
FSDirectory dir = FSDirectory.Open(dirInfo);
IndexSearcher search = new IndexSearcher(dir, true);// 針對 Memo 欄位進行搜尋QueryParser parser = new QueryParser(Lucene.Net.Util.Version.LUCENE_29, "Memo", new StandardAnalyzer(Lucene.Net.Util.Version.LUCENE_29));
// 搜尋的關鍵字Query query = parser.Parse(txtKeyword.Text);
// 開始搜尋//4: 依照 ID 4 為排序Sort sort = new Sort(new SortField("Id", 4));var hits = search.Search(query, null, search.MaxDoc(), sort).ScoreDocs;
sw.Stop();Response.Write(" 花費時間:" + sw.Elapsed + "<br /><hr />");
Response.Write(" 資料比數:" + hits.Length + "<br /><hr />");
Response.Write("Result:<br />");
foreach (var res in hits){ // 顯示Response.Write("Id:" + search.Doc(res.doc).Get("Id") + " Memo=" + search.Doc(res.doc).Get("Memo").ToString().Replace(txtKeyword.Text, "<span style='color:red'>" + txtKeyword.Text + "</span>") + "<br />");
}
看一下結果:
如果要降冪 的話 只需要 將後面加入一參數 reverse 改為 true
Sort sort = new Sort(new SortField("Id", 4,true));結果:
到底那 4 是怎麼來的呢?!
查閱這裡參考文件: http://www.cpbcw.com/codetree311/Search/SortField.cs.html
其中做簡單整理:
/// <summary>Sort by document score (relevancy). Sort values are Float and higher/// values are at the front. 透過相關度/// </summary>public const int SCORE = 0;/// <summary>Sort by document number (index order). Sort values are Integer and lower/// values are at the front. 透過文件編號/// </summary>public const int DOC = 1;/// <summary>Guess type of sort based on field contents. A regular expression is used/// to look at the first term indexed for the field and determine if it/// represents an integer number, a floating point number, or just arbitrary/// string characters. 自動/// </summary>public const int AUTO = 2; /// <summary>Sort using term values as Strings. Sort values are String and lower/// values are at the front. 文字 /// </summary>public const int STRING = 3; /// <summary>Sort using term values as encoded Integers. Sort values are Integer and/// lower values are at the front. 數字/// </summary>public const int INT = 4;/// <summary>Sort using term values as encoded Floats. Sort values are Float and/// lower values are at the front. 浮點數/// </summary>public const int FLOAT = 5;/// <summary>Sort using a custom Comparator. Sort values are any Comparable and/// sorting is done according to natural order. 客製化/// </summary>public const int CUSTOM = 9;// IMPLEMENTATION NOTE: the FieldCache.STRING_INDEX is in the same "namespace"// as the above static int values. Any new values must not have the same value// as FieldCache.STRING_INDEX.Source
標籤:
ASP.net
,
C#
,
Lucene.net
-- Yesterday I wrote down the code. I bet I could be your hero. I am a mighty little programmer. 如果這篇文章有幫助到您,簡單留個言,或是幫我按個讚,讓我有寫下去的動力...
