[Azure] C# 關於 Translator 或 Bing 翻譯 數字的小小問題
2021-02-03
最近因為同事在寫有關於 Azure Translator 的東西,因為我們使用場景就是我們一定會談到數字的問題,而且我們幫客戶開發關於翻譯一定是有一端對於英文不是這麼在行,這時候同事跟我回報一個我百思不得其解的問題,現在我們要來稍微解決這些事情,我覺得不完美但是就先這樣做吧,下面是你有可能會看到的狀況..
上面你看到怪的地方了吧(撇除我的英文爛的部分,看翻譯),直接說結論吧,經過我測試基本上他很重視格式這件事情,對這件事情,所以你安安好好打好一個英文完整的句子包含標點符號,他應該會翻譯正確,但是如果不是,那數字的部分真的就是奇形怪狀..
所以我想到的 "半"解決方案就是 在數字後面加上一個 .(dot) 讓他稍微正常,我就是使用 Regular Expression 來 Replace.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
static string ReplaceNumbers(string orgValue) | |
{ | |
string rxPattern = @"\d+(?!\.)($|\s)"; | |
var rx = new Regex(rxPattern, RegexOptions.IgnoreCase); | |
var res = rx.Replace(orgValue, delegate (Match match) | |
{ | |
return match.ToString().Trim() + ". "; | |
}); | |
rxPattern = @"(\.)[A-Za-z_]"; | |
rx = new Regex(rxPattern, RegexOptions.IgnoreCase); | |
res = rx.Replace(res, delegate (Match match) | |
{ | |
return match.ToString().Replace(".", ". "); | |
}); | |
return res; | |
} | |
static void Main(string[] args) | |
{ | |
Console.WriteLine(ReplaceNumbers("I Give you 5000")); | |
Console.WriteLine(ReplaceNumbers("I Give you 5000.")); | |
Console.WriteLine(ReplaceNumbers("I cant give you 5000 please dont bother me.")); | |
Console.WriteLine(ReplaceNumbers("I cant give you 5,000 and 1000 unit foods please dont bother me.")); | |
Console.WriteLine(ReplaceNumbers("I cant give you 5000 , please dont bother me.")); | |
Console.WriteLine(ReplaceNumbers("I cant give you 5000.Please dont bother me.")); | |
//Result | |
//I Give you 5000. | |
//I Give you 5000. | |
//I cant give you 5000. please dont bother me. | |
//I cant give you 5,000. and 1000. unit foods please dont bother me. | |
//I cant give you 5000. , please dont bother me. | |
//I cant give you 5000. Please dont bother me. | |
} |
之後我丟到 Bing 翻譯試試看(因為我覺得 Bing 翻譯跟 Azure Translator 是一樣的)
Result:
當然這不是完美的解決方案像是中間那一句 符號不對,但是人類可以看懂,這當然不是一個很好的解法但是目前就先這樣吧
我覺得完美作法就是 找時間我去找找看能不能 report 給微軟
不過…不過…
我想你應該會跟我說 谷歌翻譯 不香嗎?!..