A tutorial on how to write a console app that stores millions of tweets in Azure.

I’ve always wanted to create a console app that logged into twitter, searched for a given set of terms, and stored tweets that matched that search into a database that I could query later. I’d created a webapp a couple of years ago that did just this, using PHP and a local MySQL instance. However, after running it for a week and watching my databse grow to 20GB+ and become really slow to query, I gave up on the project.

Two weeks ago I came back to this and decided to rebuild it using C# and Azure Tables, Azure Queues and Azure Webjobs. I’ve been playing around with C# in my spare time, and I saw that Azure Tables would be the perfect place to store millions of tweets. Costs are very low, and it’s easy and quick to query from a webapp/desktop app. Azure Tables takes the pain of managing MySQL away, and since I don’t need anything more than a key-value store, it’s perfect.

Here’s what I’ll use to build this project

###Setting up a Twitter App

Before you get to the actual code, you need to create a Twitter app to get the Access Keys and tokens you need to connect to Twitter. Here’s how:

  1. Go to https://apps.twitter.com
  2. Sign in with your Twitter account
  3. Click on “Create new App” Create New App
  4. Enter the application details. You can put in whatever name you like, and a placeholder URL. App Details
  5. Click on “Keys and Access Tokens” and then generate a new Access Token. It should look like this:Access Tokens
  6. Make a note of the Consumer Key, Consumer Secret, Access Token, and Access Token Secret. Copy to Notepad or whatever.

##Write to the queue

To write to the queue, we’ll create a simple console app, but it could be a web site that writes to the queue, or a Universal App. I’m not going to spend a lot of time going over this app, since it’s a simple modification from the documentation provided by Microsoft.

Here’s the app -> Github Repo

Open this in Visual Studio and replace the connection strings in App.config with the ones from your Azure Storage acccount. Microsoft’s documentation can help if you don’t know how to find connection strings. -> Use Azure Queues

<add key="StorageConnectionString" value="DefaultEndpointsProtocol=https;
AccountName=$REPLACE_THIS;AccountKey=$REPLACE_THIS" />
<!--<add key="StorageConnectionString" value="UseDevelopmentStorage=true" />-->

You can also uncomment the UseDevelopmentStorage key if you want to test locally.

Run the console app and enter the following information

  1. Table Name - Pick anything you like, but keep it lowercase, one word, and no special characters. Just use a word like “test”, “sample” etc. Azure Tables have naming rules you can see here -> Azure Table Name Restrictions
  2. Search Term - You’re looking for any tweets that contain this word. I don’t have phrases or string handling set up, so just enter one word.Examples are “Azure”, “Microsoft”, “SpaceX” etc.
  3. Paste Access Token - You saved this, right? If not, go back to apps.twitter.com, click on your App Name, go to Keys and Access Tokens, and paste it in. Do the same for the next step, the Access Token Secret
  4. Paste Access Token Secret - see above

If all goes well, the app will say “Added to Queue” and exit. You can use the Cloud Explorer in Visual Studio to see if this worked. It’ll look like thisQueue Message

##Reading from the Queue, Searching Twitter, and storing Tweets in a Table

This is the fun part - the actual app! This app will

  1. Constantly watch the queue
  2. When it sees a message in the queue, it
    1. Will read the search term
    2. Create a table in Azure to store Tweets
    3. Open a stream to Twitter to listen for any tweets with the given search term. When it finds a Tweet, it’ll write the tweet into the table

Depending on the search term, you can end up with millions of tweets in a day. Try something popular (“Apple”, “Android” or whatever is trending), and let it run for a few hours.

You can download the source code here -> Github Repo

It’s fairly straightforward. There are three files

  • QueueInstruction.cs
  • TweetClassforAzure.cs
  • Program.cs

###QueueInstruction.cs namespace TweetStreamsConsoleApp { public class QueueInstruction { public string TableName { get; set; } public string SearchTerm { get; set; } public string PartitionKey { get; set; } public string AccessToken { get; set; } public string AccessTokenSecret { get; set; } } }

This is a simple class to create objects with properties that the program reads from the queue. You can see that these are the same properties that the InsertIntoQueue app asks for

###TweetClassforAzure.cs


using System;
using Microsoft.WindowsAzure.Storage.Table;

namespace TweetStreamsConsoleApp
{
    public class TweetClassforAzure : TableEntity
    {
        public TweetClassforAzure(string partitionKey, long id)
        {
            this.PartitionKey = partitionKey;
            this.RowKey = id.ToString();
        }

        public TweetClassforAzure() { }
        public string tweetText { get; set; }
        public string profileImageUrl { get; set; }
        public DateTime Date { get; set; }
        public bool verified { get; set; }
    }
}

This class derives from the TableEntity class. We need two using statements - System (for DateTime) and Microsoft.WindowsAzure.Storage.Table. Windows Azure Tables require a Partition Key and a Row Key. We’re going to set the Partition key from the user input in the Queue. The Row Key is the Tweet ID, which we get from Twitter. I’m also setting a Boolean to filter out Verfied tweets.

###Program.cs


using System;
using Tweetinvi;
using Tweetinvi.Core.Credentials;
using Tweetinvi.Core.Enum;
using Microsoft.WindowsAzure.Storage;
using Microsoft.WindowsAzure.Storage.Table;
using System.Configuration;
using Microsoft.Azure.WebJobs;
using System.Threading.Tasks;

namespace TweetStreamsConsoleApp
{
    public class Program
    {
        public static void Main(string[] args)
        {
            JobHost host = new JobHost(); 
            host.RunAndBlock();
                    
        }
      
        public async static Task StartTweetStream([QueueTrigger("starttweetstream")] QueueInstruction instruction)
        {
            ITwitterCredentials userCredentials = new TwitterCredentials();
            userCredentials.AccessToken = instruction.AccessToken;
            userCredentials.AccessTokenSecret = instruction.AccessTokenSecret;

            //The keys are your App keys from Twitter.
            userCredentials.ConsumerKey = "$REPLACE_THIS_WITH_YOUR_APPS_CONSUMER_KEY";
            userCredentials.ConsumerSecret = "$REPLACE_THIS_WITH_YOUR_APPS_CONSUMER_SECRET";
            
            Auth.SetCredentials(userCredentials); //Setting Twitter Credentials.
            
            CreateAzureTable(instruction.TableName);

            var stream = Stream.CreateFilteredStream();
            stream.AddTrack(instruction.SearchTerm);
            stream.AddTweetLanguageFilter(Language.English);
            stream.MatchingTweetReceived += (sender, sargs) =>
            {
                WriteTweetToAzure(
                    instruction.TableName,
                    instruction.PartitionKey,
                    sargs.Tweet.Text,
                    sargs.Tweet.CreatedBy.ScreenName,
                    sargs.Tweet.Id,
                    sargs.Tweet.CreatedBy.ProfileImageUrlHttps,
                    sargs.Tweet.CreatedAt.Date,
                    sargs.Tweet.CreatedBy.Verified
                    );
            };
        await  stream.StartStreamMatchingAllConditionsAsync();
           
        }

        static void CreateAzureTable(string tableName)
        {
            CloudStorageAccount storageAccount = CloudStorageAccount.Parse(ConfigurationManager.ConnectionStrings["StorageConnectionString"].ConnectionString);

            // Create the table client.
            CloudTableClient tableClient = storageAccount.CreateCloudTableClient();

            // Create the table if it doesn't exist.
            CloudTable table = tableClient.GetTableReference(tableName);
            table.CreateIfNotExists();
        }

        static void WriteTweetToAzure(string tableName, string paritionKey, string tweet, string createdBy, long id, string profileImageUrl, DateTime date, bool verified)
        {
            TweetClassforAzure tweetAzure = new TweetClassforAzure(paritionKey, id);
            tweetAzure.tweetText = tweet;
            tweetAzure.profileImageUrl = profileImageUrl;
            tweetAzure.Date = date;
            tweetAzure.verified = verified;

            CloudStorageAccount storageAccount = CloudStorageAccount.Parse(ConfigurationManager.ConnectionStrings["StorageConnectionString"].ConnectionString);

            // Create the table client.
            CloudTableClient tableClient = storageAccount.CreateCloudTableClient();

            // Create the table if it doesn't exist.
            CloudTable table = tableClient.GetTableReference(tableName);

            TableOperation insertOperation = TableOperation.Insert(tweetAzure);
            table.Execute(insertOperation);
        }
        
    }
}