— RSyvarth

Developing a Newsfeed: The Basics

In this blog post I will cover the process that I went through while developing the newsfeed for Duxter and how we created the amazing product that you see today (or will see soon)!

Newsfeeds are one of the most common features among social networking sites today and while they initially seem relatively simple, as we strive for instant interactions across multiple platforms with content which is generated dynamically to the interests of each user, it can become a bit of a technical nightmare. In order to ensure that we ended up with the best product possible I put in a large amount of time researching different types of feeds used across different sites. There are 2 main approaches to the issue, you either generate a newsfeed for each user as new posts are added to their feed and display already formatted content to the user when they load the page or you build the feed every time the user loads the page as they request it. We opted for the latter of these since it allows for greater flexibility and more real-time interactions while avoiding some of the technical issues of the other approach. Overall the architecture of the feed is inspired by Facebook’s newsfeed (if it works for them!) but it is executed in a more simplified way. The feed is separated into 4 major sections: the client, the communication layer, the feed generator, and member stacks. Each of these areas of the feed perform specific tasks and the entire product is designed with speed and efficiency in mind. Let’s begin with the member stacks.

A visual layout of the newsfeed generation process

Member stacks are at their most basic level are just databases of entry ids and a few related bits of information used for sorting and filtering. They are used to generated a complete structure of a member’s feed that only contains ids representing different newsfeed entries so that we can focus on creating the most condensed version of the information which the user should see without ever pulling more information than is needed. Each member on the site is represented by a different “stack” or a collection in a MongoDB database (redundant, I know). Each stack contains ids for all of the entries which that member has generated either by directly posting content such as a status or automated content generation like posts about the member viewing videos or making purchases in the store. When a user goes to view their newsfeed a certain number of the “latest” entries from each of their friends are pulled from their respective stacks. These ids are then aggregated (through various forms of magic) to prevent your feed from filling up with 100 entries of your friends viewing a cat video. The aggregated entries are then filtered by how important the content is through the use of the previously discussed friend affinity equation and a few other factors such as how old the content is and what type of content was posted (your friend uploading a video is more important than them watching someone else’s video). This is used to come up with an overall rating of how important the content is and only items which meet a certain threshold of importance are kept while the others will not be displayed to the user.* After all of this is done the member stack section returns an array of ids back to the feed generator so that the feed can grab and format the content.

*This is not yet a functional feature, but it will be implemented as soon as it becomes necessary.

Once the feed generator receives the ids of the entries which will be displayed on the feed it pulls the content for the entries from the MySQL database and formats all of the information so that it is ready to be returned to the client. In this step aggregated entries also go through a few extra processes where the aggregated information gets filtered  (depending on the type of entry) so that only the information which is absolutely necessary is passed to the client. The hope with this process is to pass the least amount of information necessary to the client so that when it is used on mobile devices the user doesn’t burn through tons of data trying to load the feed (we also only pass the client raw information instead of html for this reason, save data transfer by eliminating redundant information). Once all of the information is formatted and ready for the client it gets pushed to the communication layer so that it can be directed to the correct user.

The communication layer of the product is based off of websockets to provide real-time, bidirectional, information transfer between the client and the server. This allows for a much better user experience compared to alternative methods such as AJAX. This communication is achieved through the use of NodeJS and Socket.io. While NodeJS could have been used to generate the feed as a whole we have elected to keep this layer as purely an interface between the client and the PHP server. This will let us have many more concurrent connections with each Node server since they only have to handle moving information rather than processing it as well. Keeping the Node server’s load light will make scaling easier since running multiple instances of Node brings up a whole new set of technical issues. In the end all this layer has to do is receive the feed information from the feed generator and pass it to the proper client. The client then takes that information and uses it to generated the markup for display to the user.

The client is essentially in charge of creating the HTML for the feed from the data which is receives from Node. This is done through the use of jQuery and the wonderful JS templating system, Handlebars. The HTML gets various event bindings in JS so that in the end you have an interactive newsfeed that has an open connection with the server so that it can freely send and receive information. The websocket connection really allows for the feed to get new entries are they are posted as opposed to the polling system which we used previously so that the feed only updated every 15 seconds.

That just about covers the basic architecture behind Duxter’s new newsfeed. I will probably go a bit more in-depth on some of the other aspects soon so let me know if you have any questions!

  • DreadfulGlory

    Very interesting. Although Facebook naturally hasn’t released very much information into the backend of their newsfeed system, from what I do know of it, this is very similar. 

    Although I have to ask, are all of these things a single framework? Or are they separate and just bridged together? 

    Forgive my lack of standard vocabulary. I think you will understand what I am asking though. 

    • rsyvarth

      Currently it is 2 major “systems” on the server-side: all of the PHP/MySQL stuff that is shown there is done through IPB’s application system (although I am likely to move that out to a separate API soon as we start developing on other platforms) and then the communication layer is our own custom NodeJS framework. It handles all of the realtime activity yo see on the site (Notifications, Newsfeed, IM). I have it written in a way though so the “member stack” section of the newsfeed backend can be moved out to another environment / language if necessary.