Rabbit Holes and Echo Chambers are processes that can be observed on Reddit
This shows that the platform, even if the "suggestion logic" that governs other social media platforms (e.g. Youtube) is less manifested, it is not safe and sound in the least.
Communities tend to form around topics and ideologies, reproducing something more similar to the basic human need to find assurance and stable truths to base their behavior on (e.g. confirmation bias or selective perception).
Rabbit Holes and Echo Chambers can create more cohesive subcommunities around various topics
By analyzing the structure of the network formed by subreddits as nodes and connected by crossposts it will be possible to highlight, using analytical tools such as Network Analysis, overarching communities.
These communities should represent real thematic conglomerates of subreddits. This does not mean that there are strongly interrelated subreddits with just one point of interest but wider and multifaceted connections constructed thanks to social, political and ideological connections (e.g subreddits related to extreme left and subreddits related to veganism, which has been, in the last years linked to strong political actions and political movements such as PETA).
Users approach radicalized contents through gateways communities that can lead from moderate and socially acceptable contents to more radicalized ones
Users participate to discussions in subreddits communities and often connect them with one another through crossposting of contents. This allows for radicalized contents to reach new users, through users interaction, connecting them to less moderate communities.
It is also possible to observe, through the analysis of contents, how sectorized subreddits, participating in smaller communities, connect to wider ones through gateways consisting in subreddits grouping linked topics in the same environment.
An evidence of radicalization may also appear from the tone of conversation. As social media allows for free expression of hatred, we expect most comments in radicalized communities to be negative.
Identification and extraction of connections between subreddits
You can follow our methods in our Jupyter Notebook.
What we needed to start our analysis was to identify a starting subreddit and then expand its connections and examine them. Our starting point was r/conspiracy in order to be sure to include at least one conspirational subreddit in the analysis.
We then decided to consider the top 5000 posts from the subreddit, extract their url and query the whole Reddit to search for posts sharing their URL, through PRAW.
We were then able to obtain the subreddits r/conspiracy was most connected with, at least in terms of shared contents, and save them in a CSV file: this consisted in our network's first level, containing the resharing's location, their link and their number.
We then proceeded to expand this first level into the second one applying the exact same logic to each of the subnetwork identified in the CSV: this generated a list of CSV files, each corresponding to the resharings subreddits relative to the first 5000 top posts of its naming subreddit.
For the first level we randomly selected 10 of this CSVs and performed the same task, but this time considering only the first 500 top posts.
Network construction and identification of communities
With this data we were able to extract the information needed to construct our network of subreddits, considering each of them as a node, weighted by the amount of connections possessed, while edges were identified as crossposts between nodes and weighted by the number of actual crossposts present, which was one of the information previously memorized in the CSV files.
The network was then subdivided in communities through the application of Modularity measures and the communities were separated before submitting the network to a ForceAtlas algorithm to allow the observation of more and less central communities. This also allowed us to identify groups of communities sharing connections and sub-group the whole networks in such aggregated subnetworks, along with the single ones previously highlighted.
Comment extraction: Sentiment Analysis, Topic Modeling and Wordclouds
The last step consisted in extracting each posts' comments, considering also the first level of replies to each comment, and save them in order to access them later for analysis. This was done both at a single subnetwork level than at a grouped-subnetwork one.
Comments were then submitted to Sentiment Analysis and Topic Modelling: the first one is a methodology employing supervised machine learning able to identify affective states and subjective information, while the latter is a statistical method used to discover recurring topics in a collection of texts. Vader lexicon was chosen for both processes in order not to underestimate social networks slangs and typical expressions.
Also wordclouds were computed from comments through the use of WordCloud.