2012-10-05

[C#] Lucene.net–對於搜尋結果進行排序

 

上次有篇文章 如何從大量 JSON 檔案中找尋關鍵字 (Lucene.net 篇 - 關鍵字搜尋)

有朋友提到說,為什麼搜尋結果怪怪的,跟之前幾篇..

如何從大量 JSON 檔案中找尋關鍵字 (JSON.net 還原篇) 為什麼不同..

其實是因為搜尋出來的結果,並沒有排序,這篇文章我們來看看怎麼進行排序 …

看一下原本的搜尋..

C# code:

// 啟用監看
Stopwatch sw = new Stopwatch();
sw.Start();
// 讀取索引
string indexPath = AppDomain.CurrentDomain.BaseDirectory.ToString() + "\\Index1\\";
DirectoryInfo dirInfo = new DirectoryInfo(indexPath);
FSDirectory dir = FSDirectory.Open(dirInfo);
IndexSearcher search = new IndexSearcher(dir, true);
// 針對 Memo 欄位進行搜尋
QueryParser parser = new QueryParser(Lucene.Net.Util.Version.LUCENE_29, "Memo", new StandardAnalyzer(Lucene.Net.Util.Version.LUCENE_29));
// 搜尋的關鍵字
Query query = parser.Parse(txtKeyword.Text);
// 開始搜尋
var hits = search.Search(query, null, search.MaxDoc()).ScoreDocs;
sw.Stop();
Response.Write(" 花費時間:" + sw.Elapsed + "<br /><hr />");
Response.Write(" 資料比數:" + hits.Length + "<br /><hr />");
Response.Write("Result:<br />");
foreach (var res in hits)
{
    Response.Write("Id:" + search.Doc(res.doc).Get("Id") + "  Memo=" + search.Doc(res.doc).Get("Memo").ToString().Replace(txtKeyword.Text, "<span style='color:red'>" + txtKeyword.Text + "</span>") + "<br />");
}

 


搜出來結果:


 


sshot-37_2


搜尋資料簡述:



  因為方便測試我已經將 10 檔案 濃縮為 2200 個檔案 1~1200 為天龍八部隨機取出文字作為範例、11001~12000 為射鵰英雄傳隨機取出範例,其中編號 9,1199,11009,11999 都加入當麻字樣方便測試


 


這時候我們需要進行按照數字排序,這時候要加入 Sort


   Sort sort = new Sort(new SortField("Id", 4));


C# Code :



// 啟用監看
Stopwatch sw = new Stopwatch();
sw.Start();
// 讀取索引
string indexPath = AppDomain.CurrentDomain.BaseDirectory.ToString() + "\\Index1\\";
DirectoryInfo dirInfo = new DirectoryInfo(indexPath);
FSDirectory dir = FSDirectory.Open(dirInfo);
IndexSearcher search = new IndexSearcher(dir, true);
// 針對 Memo 欄位進行搜尋
QueryParser parser = new QueryParser(Lucene.Net.Util.Version.LUCENE_29, "Memo", new StandardAnalyzer(Lucene.Net.Util.Version.LUCENE_29));
// 搜尋的關鍵字
Query query = parser.Parse(txtKeyword.Text);
// 開始搜尋
//4: 依照 ID 4 為排序
Sort sort = new Sort(new SortField("Id", 4));
var hits = search.Search(query, null, search.MaxDoc(), sort).ScoreDocs;
sw.Stop();
Response.Write(" 花費時間:" + sw.Elapsed + "<br /><hr />");
Response.Write(" 資料比數:" + hits.Length + "<br /><hr />");
Response.Write("Result:<br />");
foreach (var res in hits)
{
    // 顯示
    Response.Write("Id:" + search.Doc(res.doc).Get("Id") + "  Memo=" + search.Doc(res.doc).Get("Memo").ToString().Replace(txtKeyword.Text, "<span style='color:red'>" + txtKeyword.Text + "</span>") + "<br />");
}

 


看一下結果:


 


sshot-38_thumb (1)


如果要降冪 的話 只需要 將後面加入一參數 reverse 改為 true



Sort sort = new Sort(new SortField("Id", 4,true));

結果:


 


sshot-40_thumb (1)


 


到底那 4 是怎麼來的呢?!


查閱這裡參考文件:  http://www.cpbcw.com/codetree311/Search/SortField.cs.html


其中做簡單整理:



/// <summary>Sort by document score (relevancy).  Sort values are Float and higher
/// values are at the front.  透過相關度
/// </summary>
public const int SCORE = 0;
/// <summary>Sort by document number (index order).  Sort values are Integer and lower
/// values are at the front. 透過文件編號
/// </summary>
public const int DOC = 1;
/// <summary>Guess type of sort based on field contents.  A regular expression is used
/// to look at the first term indexed for the field and determine if it
/// represents an integer number, a floating point number, or just arbitrary
/// string characters. 自動
/// </summary>
public const int AUTO = 2; 
/// <summary>Sort using term values as Strings.  Sort values are String and lower
/// values are at the front. 文字 
/// </summary>
public const int STRING = 3; 
/// <summary>Sort using term values as encoded Integers.  Sort values are Integer and
/// lower values are at the front. 數字
/// </summary>
public const int INT = 4;
/// <summary>Sort using term values as encoded Floats.  Sort values are Float and
/// lower values are at the front. 浮點數
/// </summary>
public const int FLOAT = 5;
/// <summary>Sort using a custom Comparator.  Sort values are any Comparable and
/// sorting is done according to natural order. 客製化
/// </summary>
public const int CUSTOM = 9;
// IMPLEMENTATION NOTE: the FieldCache.STRING_INDEX is in the same "namespace"
// as the above static int values.  Any new values must not have the same value
// as FieldCache.STRING_INDEX.

Source