[C#] 使用 Jieba.net 進行中文分詞

2020-12-28

在很久很久以前大概八年前我分享過一篇文章利用盤古分詞來做分詞 ( https://blog.no2don.com/2012/10/c_5.html ) 結果我最近又被問到這需求，不過物換星移，我查了一下，現在使用結巴的人很多，稍微玩了一下，就把測試後的結果留下來，給需要的朋友參考一下…

先說明一下，我是使用 .net core 3.1 console 撰寫的

1. 首先 nuget 一下 https://www.nuget.org/packages/jieba.NET/

2. 然後你必須要準備一個資料夾放入一些設定檔們，在範例專案我放在 jiebanet_config ，並且放入那些設定檔我是在這裡下載的 https://gitlab.jpqhome.com:6/docs/doc-template/tree/0c2b2ea20261660d78b5e9fdf2c24d88afaf1f1e/templates/default/Plugins/Resources ，當然你也可以在我的 github 找到 https://github.com/donma/JiebaPrac/tree/master/JiebaPrac/jiebanet_config

3. 在來就是程式碼的部分

using System; using System.Linq; namespace JiebaPrac { class Program { static void Main(string[] args) { Console.WriteLine("Hello Jiqba!"); var sourceString = "許當麻緩緩的唱著:那樣的回憶那麼足夠足夠我天天都品嚐著寂寞"; JiebaNet.Segmenter.ConfigManager.ConfigFileBaseDir = AppDomain.CurrentDomain.BaseDirectory + @"jiebanet_config"; var segmenter = new JiebaNet.Segmenter.JiebaSegmenter(); Console.WriteLine("Cut -\r\n"); var segmentsCut = segmenter.Cut(sourceString); Console.WriteLine(string.Join("/", segmentsCut)); Console.WriteLine("\r\n--------------------------------\r\n"); var posSeg = new JiebaNet.Segmenter.PosSeg.PosSegmenter(segmenter); var tokens = posSeg.Cut(sourceString); foreach (var token in tokens) { Console.ForegroundColor = ConsoleColor.Yellow; Console.Write(token.Word); Console.ForegroundColor = ConsoleColor.White; Console.Write("/"); Console.ForegroundColor = ConsoleColor.DarkGray; Console.Write(token.Flag); } Console.ReadLine(); } } }
result: