Since Twitter only allows you to pull the last 3200 Tweets,
people are using many online services to archive their Tweets. However, if you
are interested to do it yourself in .Net, this tutorial would be a good kickstart.
Requirements
This tutorial assumes you have basic knowledge in:- .Net programming with VB.Net or C#
- JSON (JavaScript Object Notation)
- Microsoft Visual Studio 2008 or other alternative to work with source code.
- Microsoft .NET Framework 3.5 Service pack 1 to run the executable.
Twitter API
We will use 2 resources of Twitter REST API in this application and will work with JSON as response format, check out both resources to understand the response object:1- users/show
This resource returns extended information of a given user, specified by screen name. This resource will tell us if the required Twitter account is existing or not, has Tweets or not and is it public or protected account.
** We can't read Tweets in a protected account without using authentication but that will be out of this tutorial scope .
2-
statuses/user_timeline
This resource can return up to 3200 most recent statuses of some user, specified by screen name parameter.
Also, other parameters are used like:
This resource can return up to 3200 most recent statuses of some user, specified by screen name parameter.
Also, other parameters are used like:
- count: to specify the number of Tweets to retrieve. We will set this to 200, to get user Tweets in minimum API requests.
- max_id: will use last Tweet ID returned from previous request as the value of this parameter to paginate through user timeline and get older Tweets.
- trim_user: set to 1, to omit the extra user info and optimize API requests.
- include_rts: to return native retweets in addition to the standard stream of Tweets.
We will do a maximum of 16 API requests to this resource to return
max. of 3200 Tweets -200 Tweet per request.
Requesting API Resources
To receive response from Twitter API, will use
HttpWebRequest and will retry requesting the resource in case of HTTP errors.
* Those methods are in BackupHelper.vb, a Class that encapsulates all functions needed to for the backup operation.
* Note that we have set the type of decompression to gzip or deflate, to get compressed response from Twitter API and optimize the bandwidth.
* When WebException occurs, We record the error response and the HTTP status code, as API will return error messages with HTTP status codes other than 200(OK). For example, the first API resource would return 404 (Not found) when username is not existing.
Private Function DownloadString(ByVal url As String) As String
Dim Str As String = ""
RetryCount = 0
While RetryCount < RetryMax
Str = _DownloadString(url)
RetryCount += 1
If LastError Is Nothing Then Exit While
End While
Return Str
End Function
Private Function _DownloadString(ByVal url As String) As String
LastError = Nothing
WebErrorResponse = ""
WebErrorStatus = 0
Dim str As String = ""
Dim request As HttpWebRequest = Nothing
Dim response As HttpWebResponse = Nothing
Dim reader As StreamReader = Nothing
Try
request = TryCast(WebRequest.Create(url), HttpWebRequest)
request.UserAgent = "Twitter Backup Application"
' Accept gzip/deflate compression
request.AutomaticDecompression = DecompressionMethods.Deflate Or DecompressionMethods.GZip
response = CType(request.GetResponse, HttpWebResponse)
' Read text from response stream
reader = New StreamReader(response.GetResponseStream(), Encoding.GetEncoding(response.CharacterSet))
str = reader.ReadToEnd()
Catch e As WebException
LastError = e
If e.Response IsNot Nothing Then
' Record API response on HTTP errors
reader = New StreamReader(e.Response.GetResponseStream())
WebErrorResponse = reader.ReadToEnd()
End If
If e.Status = WebExceptionStatus.ProtocolError Then
WebErrorStatus = CType(e.Response, HttpWebResponse).StatusCode
End If
Catch e As Exception
LastError = e
Finally
If request IsNot Nothing Then request = Nothing
If response IsNot Nothing Then response.Close()
If reader IsNot Nothing Then reader.Close()
End Try
Return str
End Function
* Those methods are in BackupHelper.vb, a Class that encapsulates all functions needed to for the backup operation.
* Note that we have set the type of decompression to gzip or deflate, to get compressed response from Twitter API and optimize the bandwidth.
* When WebException occurs, We record the error response and the HTTP status code, as API will return error messages with HTTP status codes other than 200(OK). For example, the first API resource would return 404 (Not found) when username is not existing.
Parsing JSON Response
It is very easy to convert JSON into corresponding .Net type in single line using JavaScriptSerializer class.* Note that JavaScriptSerializer became available since .NET Framework 3.5, and you need to add reference to "System.Web.Extensions" in the properties of the Visual Studio project.
1- The following code will parse JSON string returned from 1st API resource "users/show":
Private JSS As New JavaScriptSerializer
Public User As APIUser
User = JSS.Deserialize(Of APIUser)(JSON)
** you can see that "Deserialize" takes APIUser as the type parameter, and APIUser is a very simple class that contains properties of the returned JSON object.
Public Class APIUser
Public screen_name As String = ""
Public name As String = ""
Public url As String = ""
Public statuses_count As Integer = 0
Public [protected] As Boolean = False
End Class
** Note that we only need to define the properties that is in concern and ignore other JSON properties.
2- The 2nd API resource "statuses/user_timeline" returns JSON array of Tweets objects. So we need to convert it to a list of APITweet objects.
Public Tweets As List(Of APITweet)
Tweets = JSS.Deserialize( Of List(Of APITweet) )(JSON)
And here is the corresponding class "APITweet"
Public Class APITweet
Public id_str As String = ""
Public created_at As String = ""
Public text As String = ""
Public source As String = ""
End Class
Saving Tweets to File
Every time we get 200 Tweets from the API request, we will simply write them to HTML file. HTML is a very easy format to create and the end-user can work on the resulting HTML table in MS Excel.
Private Sub WriteTweets()
For i As Integer = 0 To Tweets.Count - 1
Dim Tweet As APITweet = Tweets(i)
Writer.Write( _
"<tr>" & _
"<td>" & Tweet.id_str & "</td>" & _
"<td>" & _
"<a target="_blank" href='https://twitter.com/" & User.screen_name & "/status/" & Tweet.id_str & "'>" & _
Tweet.created_at & _
"</a>" & _
"</td>" & _
"<td>" & LinkifyTweet(Tweet.text) & "</td>" & _
"<td>" & Tweet.source & "</td>" & _
"</tr>")
Tweet = Nothing
Next
Writer.Flush()
End Sub
Finally, The Backup Operation
To run the time-consuming operation of requesting API resources and saving
Tweets to file without blocking the user interface (UI ), we need to use
BackgroundWorker. BackgroundWorker class allows you to run an operation on a separate thread without
having to deal with threads yourself.
To setup the BackgroundWorker: Will drag it from The toolbox to the application Form, add an event handler for the DoWork event and call backup operation in this event handler.
Inside DoWork handler will call ReportProgress to notify of progress updates. And, in ProgressChanged event will update the ProgressBar and display any status text.
To setup the BackgroundWorker: Will drag it from The toolbox to the application Form, add an event handler for the DoWork event and call backup operation in this event handler.
Inside DoWork handler will call ReportProgress to notify of progress updates. And, in ProgressChanged event will update the ProgressBar and display any status text.
Private Sub BGWorker_DoWork(ByVal sender As System.Object, ByVal e As System.ComponentModel.DoWorkEventArgs) _
Handles BGWorker.DoWork
Dim worker As BackgroundWorker = CType(sender, BackgroundWorker)
Dim Progress As Integer = 0
'1>> init helper
Helper.init()
worker.ReportProgress(Progress, "Init..")
'2>> check user info
Helper.GetUser()
If Helper.WebErrorStatus = HttpStatusCode.NotFound Then
' username not found
worker.ReportProgress(0, "Username was not found!")
Exit Sub
ElseIf Helper.LastError IsNot Nothing Then
' web request error
worker.ReportProgress(Progress, "Request Error: " & Helper.LastError.Message)
Exit Sub
ElseIf Helper.User Is Nothing Then
' no user returned
worker.ReportProgress(0, "Invalid Username")
Exit Sub
ElseIf Helper.User.statuses_count = 0 Then
' no tweets
worker.ReportProgress(0, "@" & Helper.User.screen_name & "'s account has zero tweets!")
Exit Sub
ElseIf Helper.User.protected Then
' protected account
worker.ReportProgress(0, "@" & Helper.User.screen_name & "'s account is protected!")
Exit Sub
Else
' show tweets count
worker.ReportProgress(Progress, "@" & Helper.User.screen_name & " - " & _
Helper.User.statuses_count & " tweets")
Thread.Sleep(1000)
End If
'3>> loop to get 200 tweets each time
While True
worker.ReportProgress(Progress, "API Call " & (Helper.RequestIndex + 1) & " / " & Helper.RequestMax)
'3.a>> Get 200 Tweets
Helper.GetTweets()
' update progress
Progress = CInt(Math.Round(Helper.RequestIndex / Helper.RequestMax * 100))
worker.ReportProgress(Progress, "API Call " & Helper.RequestIndex & " completed")
If Helper.LastError IsNot Nothing Then
' Web Error
Helper.Finish()
worker.ReportProgress(Progress, "Request Error: " & Helper.LastError.Message)
Exit Sub
ElseIf Helper.Tweets Is Nothing OrElse Helper.Tweets.Count = 0 OrElse Helper.RequestIndex >= Helper.RequestMax Then
' No tweets or last request
Exit While
End If
'3.b>> delay
worker.ReportProgress(Progress, "delay " & Delay & " seconds")
For i As Integer = 1 To Delay
' cancel?
If worker.CancellationPending Then
Helper.Finish()
e.Cancel = True
worker.ReportProgress(Progress, "cancelled!")
Exit Sub
End If
' delay 1 sec
Thread.Sleep(1000)
Next
End While
'4>> close file
Helper.Finish()
worker.ReportProgress(100, "Complete")
End Sub
- Operation start by calling init() to initialize Backup helper for new backup operation.
- Call GetUser() in BackupHelper to get user info from API and check that the required Twitter account is existing, has Tweets and is not protected.
- Notify of progress update to show the number of Tweets in that account.
- Request Tweets Loop:
- Call GetTweets() in BackupHelper to get 200 Tweets from Twitter API as described before.
- Notify of progress update to show the request index.
- Exit loop if no Tweets returned or reached end of max requests (16 or less).
- Do delay to avoid getting rate-limited by Twitter API. The Delay is made as loop with delay of 1sec in each iteration to keep the UI responsive to user actions -like canceling the Backup.
- Call Finish() to close the backup file.
- Report operation completed.
0 comments
Post a Comment