Revealing the Twitter recommendation algorithm and the GitHub release

No time to read?
Get a summary

Twitter disclosed portions of its source code on GitHub, a platform hosting two repositories that reveal elements of the post recommendation system, including how the “For you” partition operates.

Which team is addressing a potential leak currently? The incident involved access to the portal in question by a user identified as a party to the leak without authorization, exposing portions of the social network’s codebase.

GitHub removed this content after receiving a takedown notice alleging copyright infringement tied to Twitter, with requests for both the leaker and any downloader to be identified.

This development comes days after Twitter owner Elon Musk signaled intent to make the platform’s algorithm more transparent, a move that culminated in a public disclosure on a Friday, March 31.

Twitter stated the shared material aims to mark the first step in a new era of transparency, while excluding any code that could compromise user security and privacy, as explained in a blog post accompanying the release.

Two new repositories on GitHub — labeled Main and ML — were introduced to help determine which Tweets are surfaced to users in the For you section.

Twitter also indicated that additional details about how recommendations are routed and filtered were published on its blog, including how related posts are considered for suggestion.

Despite releasing part of the source code to users, the company chose to omit a section dedicated to advertisement recommendations.

recommendation algorithm

The platform provides information to users and selects a stream of Tweets to populate the For you feed, driving the personalized experience.

The recommendation system is described as comprised of many interconnected services and jobs, with information filtered through a three-stage process.

First, the system sources the best Tweets from various sources through a process called lead sourcing; second, each candidate is ranked using a machine learning model; finally, heuristics filter out previously seen content and material deemed unsafe for business contexts.

There is a service referred to as the home mixer, along with a Product Mixer program that acts as the backbone of software assembling content sources and linking lead posts with other ratings that feed into this section.

The initial phase considers a set of resources it extracts, pulling roughly 1,500 Tweets from a vast pool, using accounts both followed and unfollowed by users in the process.

Today the For You chronology is described as about 50 percent on-network Tweets and 50 percent off-network Tweets, with variations across users.

Regarding followed accounts, the aim is to surface the most timely and relevant Tweets from those followed accounts.

A tool called RealChart is used to predict the probability of interaction between two users. The higher the predicted compatibility, the more Tweets are included.

The platform noted ongoing work in this area after it paused use of Fanour Service, a prior caching solution used to boost posts for individual users.

Unfollowed users

The system also explains how it incorporates feeds from unfollowed accounts into For you recommendations, with two perspectives on the approach.

The first considers a social graph to determine which Tweets from people a user follows interact with and who engages with those posts.

Using these insights, a candidate set of Tweets is generated and categorized with a logistic regression model. A graphics processing engine named GraphiJet is described as part of this journey on the platform.

A second approach, called placement spaces, seeks to surface posts from unfollowed accounts by matching interests and explaining which Tweets and users resemble a user’s preferences.

These candidate items are scored for relevance, and roughly 1,500 candidates are evaluated. Each item receives a score representing the likelihood of engagement.

The scoring relies on a neural network with tens of millions of parameters that is continually trained to optimize positive interactions on the platform. Ten tags are generated to assign each Tweet a probability of interaction.

Subsequent filters refine recommendations to deliver targeted results. Current measures include removing suggestions from blocked accounts and limiting consecutive Tweets from a single account, among other safeguards.

When the final suggestions are selected, Home Mixer activates and delivers the recommendations to each device. At this stage, the classification system blends Tweets with other content, such as ads or follow suggestions for other accounts.

Following the publication of this portion of Twitter’s recommendation algorithm, Elon Musk indicated on the platform that more details would be released in the coming weeks.

Twitter also affirmed plans to expand the referral system with real-time features and enhanced user representations as new capabilities are developed.

No time to read?
Get a summary
Previous Article

Group-IB Highlights New Scam Sites Targeting Pharma Buyers in North America

Next Article

Kuril Islands dispute evolves as China signals a shift and Russia pursues a broader strategy