如何從大量 JSON 檔案中找尋關鍵字 (JSON.net 還原篇) 、如何從大量 JSON 檔案中找尋關鍵字 (Regular Expression 篇)
中對 10 萬筆個別 JSON 資料進行測試,但是結果卻是非常的慢…
大概都要花到 25~30 秒的時間…
正所謂..
如何更快.. 是我腦中一直想到的…
如果要更快,我們只能透過對檔案進行索引…
這時候網路上找到一個 Solution 叫做 Lucene ..
Apache Lucene is a free/open source information retrieval software library, originally created in Java by Doug Cutting. It is supported by the Apache Software Foundation and is released under the Apache Software License.
Lucene has been ported to other programming languages including Delphi, Perl, C#, C++, Python, Ruby, and PHP[1].
並且有提供 for .Net framework 的版本 http://incubator.apache.org/lucene.net/
當然我們也可以透過 NUGET 來進行安裝..
NUGET 指令:
Install-Package Lucene.net
你會看到你的 Reference 多了些東西..
接下來我們來看看如何將十萬筆資料建立索引…
C# Code :
Stopwatch sw = new Stopwatch();
// 讀取所有資料
var di = new DirectoryInfo(AppDomain.CurrentDomain.BaseDirectory + "\\Source\\");
sw.Start();
var allObjects = di.GetFiles().Select(
x => JObject.Parse((File.ReadAllText(x.FullName)))).ToArray();
//Index 存放路徑
string indexPath = AppDomain.CurrentDomain.BaseDirectory + "\\Index1\\";
FSDirectory dir = FSDirectory.Open(new DirectoryInfo(indexPath));
//IndexWriter
IndexWriter indexWriter = new IndexWriter(dir, new StandardAnalyzer(Lucene.Net.Util.Version.LUCENE_29), true, IndexWriter.MaxFieldLength.UNLIMITED);
// 還原且加入需做 index 的欄位
foreach (JObject ds in allObjects)
{
Document doc = new Document();
// 把每一個欄位都建立索引
Field f_Id = new Field("Id", ds["Id"].ToString(), Field.Store.YES, Field.Index.ANALYZED, Field.TermVector.NO);
Field f_Age = new Field("Age", ds["Age"].ToString(), Field.Store.YES, Field.Index.ANALYZED, Field.TermVector.NO);
Field f_Memo = new Field("Memo", ds["Memo"].ToString(), Field.Store.YES, Field.Index.ANALYZED, Field.TermVector.NO);
Field f_BirthDay = new Field("BirthDay", DateTime.Parse(ds["Birthday"].ToString()).ToString("yyyyMMdd"), Field.Store.YES, Field.Index.ANALYZED, Field.TermVector.NO);
doc.Add(f_Id); doc.Add(f_Age); doc.Add(f_Memo); doc.Add(f_BirthDay);
indexWriter.AddDocument(doc);
}
indexWriter.Optimize();
indexWriter.Commit();
indexWriter.Close();
sw.Stop();
Response.Write("花費時間: "+sw.Elapsed + "");
範例下載 MakeIndex.aspx