[C#] Regex筆記- 取得網頁Youtube 相關訊息

2012-11-01


之前寫過但是有人問,所以最近又再整理一次..
想說紀錄一下..

基本上就是不去讀API 透過取得網頁然後parse 相關資訊回來
請注意,此文章為教學用,請勿拿去做非法用途,否則法律行為請自行負責

而且基本上這種作法,只要Youtube 官方改網頁規格就失效拉..

所以讀API才是王道..
2012-09-25_131541
我要取得資訊如下..

YoutubeURL – 該影片網址

Id –該影片Id

Title –該影片標題

Intro –該影片敘述

ImageLarge –該影片大圖

ImageSmall –該影片小圖

直接來看透過Regex 去Fatch 的Class:

using System;
using System.Net;
using System.Text;
using System.Text.RegularExpressions;

namespace FatchYoutueInfo
{
public class FatchU2BUtility
{

public string YoutubeURL { get; private set; }
public string Id { get; private set; }
public string Title { get; private set; }
public string Intro { get; private set; }
public string ImageLarge { get; private set; }
public string ImageSmall { get; private set; }

public FatchU2BUtility(string youtubeURL)
{
// <p id="eow-description" >

var src = GetSourceFromUrl(youtubeURL);
var regexIntro = new Regex(
@"(p id=""eow-description"" >)(?<INTRO>.*?)(</p>)",
RegexOptions.IgnoreCase);
MatchCollection mcIntro = regexIntro.Matches(src);

//<meta name="title" content="
var regexTitle = new Regex(
@"(<meta name=""title"" content="")(?<TITLE>.*?)("">)",
RegexOptions.IgnoreCase);
MatchCollection mcTitle = regexTitle.Matches(src);



var regexId = new Regex(
@"(data-button-menu-id=""some-nonexistent-menu"" data-video-id="")(?<ID>.*?)("")",
RegexOptions.IgnoreCase);
MatchCollection mcId = regexId.Matches(src);


if (mcIntro.Count != 0)
Intro = mcIntro[0].Groups["INTRO"].Value;
else
throw new Exception("Can't find Intro");

if (mcTitle.Count != 0)

Title = mcTitle[0].Groups["TITLE"].Value;
else
throw new Exception("Can't find Title");

if (mcId.Count != 0)
Id = mcId[0].Groups["ID"].Value;
else
throw new Exception("Can't find Id");


ImageSmall = "http://img.youtube.com/vi/" + Id + "/2.jpg";
ImageLarge = "http://img.youtube.com/vi/" + Id + "/0.jpg";

YoutubeURL = "http://www.youtube.com/watch?v=" + Id;


}

/// <summary>
/// 從網路上取得原始碼
/// </summary>
/// <param name="url"></param>
/// <returns></returns>
private string GetSourceFromUrl(string url)
{

var client = new WebClient();

//以防萬一 模擬自己為瀏覽器
client.Headers.Add("User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/536.5 (KHTML, like Gecko) Chrome/19.0.1084.56 Safari/536.5");
client.Headers.Add("Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8");
client.Headers.Add("Accept-Encoding: identity");
client.Headers.Add("Accept-Language: zh-TW,en;q=0.8");
client.Headers.Add("Accept-Charset: utf-8;q=0.7,*;q=0.3");
client.Headers.Add("ContentType", "application/x-www-form-urlencoded");
client.Encoding = Encoding.UTF8;
return client.DownloadString(url);
}
}
}

來看使用方法..
try
{

FatchU2BUtility util = new FatchU2BUtility(txtURL.Text);
ltlResult.Text += "Title:" + util.Title + "<br />";
ltlResult.Text += "Intro:" + util.Intro + "<br />";
ltlResult.Text += "URL:" + util.YoutubeURL + "<br />";
ltlResult.Text += "Id:" + util.Id + "<br />";
ltlResult.Text += "Image Small:" + "<img src='"+util.ImageSmall+"' />" + "<br />";
ltlResult.Text += "Image Large:" + "<img src='" + util.ImageLarge + "' />" + "<br />";
}
catch
{
ltlResult.Text = "Sorry,我抓不到";
}

你一定覺得很奇怪,為什麼我Ctor 要給網址 最後我還要再給一次..

因為,常常youtube 網址並不是每一個人都是正規的給 像是這樣 http://www.youtube.com/watch?v=0cay2dnuhcs

所以最後經過Class 輸出後我還是希望取得正規統一的網址..

結果:


2012-09-25_131330
給有需要用到的人..


當麻許的超技八 2014 | Donma Hsu Design.