— RSyvarth

Archive
Technical

In this blog post I will cover the process that I went through while developing the newsfeed for Duxter and how we created the amazing product that you see today (or will see soon)!

Newsfeeds are one of the most common features among social networking sites today and while they initially seem relatively simple, as we strive for instant interactions across multiple platforms with content which is generated dynamically to the interests of each user, it can become a bit of a technical nightmare. In order to ensure that we ended up with the best product possible I put in a large amount of time researching different types of feeds used across different sites. There are 2 main approaches to the issue, you either generate a newsfeed for each user as new posts are added to their feed and display already formatted content to the user when they load the page or you build the feed every time the user loads the page as they request it. We opted for the latter of these since it allows for greater flexibility and more real-time interactions while avoiding some of the technical issues of the other approach. Overall the architecture of the feed is inspired by Facebook’s newsfeed (if it works for them!) but it is executed in a more simplified way. The feed is separated into 4 major sections: the client, the communication layer, the feed generator, and member stacks. Each of these areas of the feed perform specific tasks and the entire product is designed with speed and efficiency in mind. Let’s begin with the member stacks.

A visual layout of the newsfeed generation process

Member stacks are at their most basic level are just databases of entry ids and a few related bits of information used for sorting and filtering. They are used to generated a complete structure of a member’s feed that only contains ids representing different newsfeed entries so that we can focus on creating the most condensed version of the information which the user should see without ever pulling more information than is needed. Each member on the site is represented by a different “stack” or a collection in a MongoDB database (redundant, I know). Each stack contains ids for all of the entries which that member has generated either by directly posting content such as a status or automated content generation like posts about the member viewing videos or making purchases in the store. When a user goes to view their newsfeed a certain number of the “latest” entries from each of their friends are pulled from their respective stacks. These ids are then aggregated (through various forms of magic) to prevent your feed from filling up with 100 entries of your friends viewing a cat video. The aggregated entries are then filtered by how important the content is through the use of the previously discussed friend affinity equation and a few other factors such as how old the content is and what type of content was posted (your friend uploading a video is more important than them watching someone else’s video). This is used to come up with an overall rating of how important the content is and only items which meet a certain threshold of importance are kept while the others will not be displayed to the user.* After all of this is done the member stack section returns an array of ids back to the feed generator so that the feed can grab and format the content.

*This is not yet a functional feature, but it will be implemented as soon as it becomes necessary.

Once the feed generator receives the ids of the entries which will be displayed on the feed it pulls the content for the entries from the MySQL database and formats all of the information so that it is ready to be returned to the client. In this step aggregated entries also go through a few extra processes where the aggregated information gets filtered  (depending on the type of entry) so that only the information which is absolutely necessary is passed to the client. The hope with this process is to pass the least amount of information necessary to the client so that when it is used on mobile devices the user doesn’t burn through tons of data trying to load the feed (we also only pass the client raw information instead of html for this reason, save data transfer by eliminating redundant information). Once all of the information is formatted and ready for the client it gets pushed to the communication layer so that it can be directed to the correct user.

The communication layer of the product is based off of websockets to provide real-time, bidirectional, information transfer between the client and the server. This allows for a much better user experience compared to alternative methods such as AJAX. This communication is achieved through the use of NodeJS and Socket.io. While NodeJS could have been used to generate the feed as a whole we have elected to keep this layer as purely an interface between the client and the PHP server. This will let us have many more concurrent connections with each Node server since they only have to handle moving information rather than processing it as well. Keeping the Node server’s load light will make scaling easier since running multiple instances of Node brings up a whole new set of technical issues. In the end all this layer has to do is receive the feed information from the feed generator and pass it to the proper client. The client then takes that information and uses it to generated the markup for display to the user.

The client is essentially in charge of creating the HTML for the feed from the data which is receives from Node. This is done through the use of jQuery and the wonderful JS templating system, Handlebars. The HTML gets various event bindings in JS so that in the end you have an interactive newsfeed that has an open connection with the server so that it can freely send and receive information. The websocket connection really allows for the feed to get new entries are they are posted as opposed to the polling system which we used previously so that the feed only updated every 15 seconds.

That just about covers the basic architecture behind Duxter’s new newsfeed. I will probably go a bit more in-depth on some of the other aspects soon so let me know if you have any questions!

Read More

Whilst redesigning the News Feed solution for Duxter (a topic which I will cover later) we ran into the issue of algorithmically assigning a numerical value to the importance of a post so we can determine whether or not it “deserves” to be displayed on the feed. One of the major factors which we have isolated in our equation is essentially how much you like that person which we call the Friend Affinity Factor (μ). This affinity factor (in theory) will let us decide on how important that person’s posts are to a specific user and how likely they are to be interested in them. The issue with this is that relationships are rather inconsistent between people and it is difficult to distill all of the different factors affecting those interactions down to a single value, we did tried it anyways.

The basic idea behind our ranking system is that every single interaction that two users have on site (and hopefully off site at some point) will be logged and weighted based on how important we deem that reaction to be. For example, liking someone’s status may be worth 1 point while playing a game with them could be worth 5 points. These activity logs will then be summed to form two different values. The first number is a rolling sum of all of the interactions within the last 30 days, the Recent Affinity Sum (Ar), and the second is the total interaction sum divided by the number of months the users have been friends, the Total Affinity Sum (At). The reason to include both values in the calculation of μ is so that both long term and new relationships have a chance of appearing on the feed (although new relationships have a slightly higher weight). The issue with the system of pure summation is that there is a chance that a single user’s Affinity Sum will get so high that only content from that single user will ever be shown on the feed. To combat this issue both the Ar and At values are scaled before they are used to calculate the μ value.

An example of the Arctan parent function

The two main goals of the scaling function are to limit the extreme Affinity Sum values and to maintain the separation between the majority of the users with average Affinity Sums. With these two purposes in mind the Arctan function was selected to be the parent function for this scaling equation. Arctan meets both of these requirements as it has both upper and lower limits which would prevent extreme values from skewing the μ calculation and it has a relatively constant slope on both sides of its inflection point which maintains the separation between most of the inner values for Affinity Sums. From the parent Arctan function several transformations were made in order to make the general tenancies of the function work as we needed for it to scale our summation values. Firstly the entire equation was translated up by π/2 to prevent negative values from being generated and to the right so that the “active” parts of the function are within the first quadrant. From this point various other minor transformations are made so that the resultant value is within the desired range for our applications.

The scaled values of Ar and At are then included in a weighted average to create the final μ value which is used in the rest of the news feed entry rating algorithm. The final equation for this reaction is shown below.

 

Final Friend Affinity Equation

θr = Weight of Recent Affinity Term

θt = Weight of Total Affinity Term

α = Series whose sum is equal to the user’s Affinity Sum

σ = Relative weight of the Affinity Sum (vertical streching)

ω = Horizontal tansformation

ε = Vertical stretch factor, value here will be the max value for the function

 

I hope you enjoyed this post (and that you are not too confused)! There will be more coming soon!

 

Read More