上次有篇文章 如何從大量 JSON 檔案中找尋關鍵字 (Lucene.net 篇 - 關鍵字搜尋)
有朋友提到說,為什麼搜尋結果怪怪的,跟之前幾篇..
如何從大量 JSON 檔案中找尋關鍵字 (JSON.net 還原篇) 為什麼不同..
其實是因為搜尋出來的結果,並沒有排序,這篇文章我們來看看怎麼進行排序 …
看一下原本的搜尋..
C# code:
// 啟用監看
Stopwatch sw = new Stopwatch();
sw.Start();
// 讀取索引
string indexPath = AppDomain.CurrentDomain.BaseDirectory.ToString() + "\\Index1\\";
DirectoryInfo dirInfo = new DirectoryInfo(indexPath);
FSDirectory dir = FSDirectory.Open(dirInfo);
IndexSearcher search = new IndexSearcher(dir, true);
// 針對 Memo 欄位進行搜尋
QueryParser parser = new QueryParser(Lucene.Net.Util.Version.LUCENE_29, "Memo", new StandardAnalyzer(Lucene.Net.Util.Version.LUCENE_29));
// 搜尋的關鍵字
Query query = parser.Parse(txtKeyword.Text);
// 開始搜尋
var hits = search.Search(query, null, search.MaxDoc()).ScoreDocs;
sw.Stop();
Response.Write(" 花費時間:" + sw.Elapsed + "<br /><hr />");
Response.Write(" 資料比數:" + hits.Length + "<br /><hr />");
Response.Write("Result:<br />");
foreach (var res in hits)
{
Response.Write("Id:" + search.Doc(res.doc).Get("Id") + " Memo=" + search.Doc(res.doc).Get("Memo").ToString().Replace(txtKeyword.Text, "<span style='color:red'>" + txtKeyword.Text + "</span>") + "<br />");
}
搜出來結果:
搜尋資料簡述:
因為方便測試我已經將 10 檔案 濃縮為 2200 個檔案 1~1200 為天龍八部隨機取出文字作為範例、11001~12000 為射鵰英雄傳隨機取出範例,其中編號 9,1199,11009,11999 都加入當麻字樣方便測試
這時候我們需要進行按照數字排序,這時候要加入 Sort
Sort sort = new Sort(new SortField("Id", 4));
C# Code :
// 啟用監看
Stopwatch sw = new Stopwatch();
sw.Start();
// 讀取索引
string indexPath = AppDomain.CurrentDomain.BaseDirectory.ToString() + "\\Index1\\";
DirectoryInfo dirInfo = new DirectoryInfo(indexPath);
FSDirectory dir = FSDirectory.Open(dirInfo);
IndexSearcher search = new IndexSearcher(dir, true);
// 針對 Memo 欄位進行搜尋
QueryParser parser = new QueryParser(Lucene.Net.Util.Version.LUCENE_29, "Memo", new StandardAnalyzer(Lucene.Net.Util.Version.LUCENE_29));
// 搜尋的關鍵字
Query query = parser.Parse(txtKeyword.Text);
// 開始搜尋
//4: 依照 ID 4 為排序
Sort sort = new Sort(new SortField("Id", 4));
var hits = search.Search(query, null, search.MaxDoc(), sort).ScoreDocs;
sw.Stop();
Response.Write(" 花費時間:" + sw.Elapsed + "<br /><hr />");
Response.Write(" 資料比數:" + hits.Length + "<br /><hr />");
Response.Write("Result:<br />");
foreach (var res in hits)
{
// 顯示
Response.Write("Id:" + search.Doc(res.doc).Get("Id") + " Memo=" + search.Doc(res.doc).Get("Memo").ToString().Replace(txtKeyword.Text, "<span style='color:red'>" + txtKeyword.Text + "</span>") + "<br />");
}
看一下結果:
如果要降冪 的話 只需要 將後面加入一參數 reverse 改為 true
Sort sort = new Sort(new SortField("Id", 4,true));
結果:
到底那 4 是怎麼來的呢?!
查閱這裡參考文件: http://www.cpbcw.com/codetree311/Search/SortField.cs.html
其中做簡單整理:
/// <summary>Sort by document score (relevancy). Sort values are Float and higher
/// values are at the front. 透過相關度
/// </summary>
public const int SCORE = 0;
/// <summary>Sort by document number (index order). Sort values are Integer and lower
/// values are at the front. 透過文件編號
/// </summary>
public const int DOC = 1;
/// <summary>Guess type of sort based on field contents. A regular expression is used
/// to look at the first term indexed for the field and determine if it
/// represents an integer number, a floating point number, or just arbitrary
/// string characters. 自動
/// </summary>
public const int AUTO = 2;
/// <summary>Sort using term values as Strings. Sort values are String and lower
/// values are at the front. 文字
/// </summary>
public const int STRING = 3;
/// <summary>Sort using term values as encoded Integers. Sort values are Integer and
/// lower values are at the front. 數字
/// </summary>
public const int INT = 4;
/// <summary>Sort using term values as encoded Floats. Sort values are Float and
/// lower values are at the front. 浮點數
/// </summary>
public const int FLOAT = 5;
/// <summary>Sort using a custom Comparator. Sort values are any Comparable and
/// sorting is done according to natural order. 客製化
/// </summary>
public const int CUSTOM = 9;
// IMPLEMENTATION NOTE: the FieldCache.STRING_INDEX is in the same "namespace"
// as the above static int values. Any new values must not have the same value
// as FieldCache.STRING_INDEX.
Source