Through the Looking Glass: Study of Transparency in Reddit’s Moderation Practices 17:31
MC 1 AVGN, BestOfNoPolitics, Coinex, CuckoldPregnancy, DNCleaks, Diepio_, ElderScrolls, EthTrader_Test, Italian, Kelloggs,
Labour, MontanaPolitics, MurderedByWords, NSFW_Snapchat, NSFWarframe, NeutralCryptoTalk, Orego_Politics, PO-
TUSWatch, Pennsylvania_Politics, ShitClashRoyaleSays, Stu, SugarBaby, TheCinemassacre, UpliftingKhabre, ZeroCoin,
batonrouge, bonehurtingjuice, chickengifs, horny, iotchain, iranian, moderatepolitics, neogaming
MC 2 AllModsAreBastards, AltcoinBeginners, AnswersFromHistorians, AsianFeet, BadRedditNoDonut, BioshockInnite, Bit-
coinCashLol, Bitcoin_Exposed, Blackout2015, CardanoMarkets, CuckoldCommunity, Dcrtrader, ElonMuskTweets, Ev-
erythingFoxes, FMTClinics, GGinSF, GitInaction, HoMM, HyperSpace, JustNews, MeanJokes, MemoCash, Modera-
tionLog, Morrowind, Ninjago, OUR_PUBLIC_ACCOUNT, Oensive_Speech, OpenFacts, POLITIC, PicEra, Privacy-
CoinMatrix, ProjectMDiedForThis, SRC_Meta, ScarletSquad, Skeletal, Stranger_Things, TrumpSalt, UncensoredPolitics,
WarFrameCirclejerk, WatchRedditDie, YourOnlySubreddit, askSteinSupporters, autogynephilia, btcfork, cryptotaxation,
cyubeVR, dark_humor, dnl, evergreenstate, fuckthealtfurry, healthdiscussion, paradoxpolitics, picsUL, pushshift, swcoun-
cil, trueaustralia, verylostredditors
MC3 3 ArkEcosystem, Automate, Bellingham, BitcoinDiscussion, BitcoinSerious, Bitcoincash, BytecoinBCN, CAMSP, Car-
danoCoin, Corridor, CryptoCurrency, CryptoCurrencyMeta, CryptoMarkets, CryptoTechnology, CryptoWikis, Cuckold,
Dirtybomb, EVEX, Ellenpaoinaction, EthereumClassic, FoxesInSnow, Gangstalking, Hotwife, HumanMicrobiome, Indi-
aNonPolitical, IndiaSpeaks, Iowa, KotakuInAction, Libertarian, Lightbulb, Lisk, LitecoinTraders, MakingaMurderer, Mass-
EectAndromeda, Oppression, PRPS2, PhantomForces, PhillyPA, Playdate, RBI, RedditCensors, ReportTheBadModerator,
Ripple, SRSsucks, SocialistRA, SpaceStationThirteen, SubredditSentinals, TIL_Uncensored, The_Cabal, TotalWarArena,
TrueSPH, Vinesauce, WeAreTheMusicalMakers, WhereIsAssange, XRP, animenocontext, arizonapolitics, btc, cardano,
cfs, chrisolivertimes, conspiracy, decred, ethereum, ethtrader, gamers, i_irl, information, knives, liberalgunowners, ndp,
neutralnews, nyancoins, olympia, pivx, pussypassdenied, pythoncoding, racistpassdenied, radeon, recycling, reverseani-
malrescue, seedboxes, siacoin, smallboobproblems, socialism, speedrun, subredditcancer, talkcrypto, tanlines, tezos, the-
witcher3, torrentlinks, uber, uberdrivers, viacoin, virgin
MC 4 ConspiracyII
MC 5 Indian_Academia
Table 6. Meta communities and constituting subreddits
For every post/comment, we calculated
p(tarдet_body|topic
x
)
—probability that a post/comment
belongs to topic x. This value is the sum of probabilities of occurrence of words present in the
post/comment, in topic x. Finally, the topic for which the calculated sum of probabilities is the
highest (max_sum) is assigned to the post/comment. Step 4 in Figure 1 illustrates this approach
with an example. Then, we sorted all posts/comments belonging to topic x in decreasing order
of their max _sum and extracted the top ones. We don’t employ a tie breaking strategy. If several
posts/comments have same max _sum, we randomly selected the posts/comments for analysis. We
used the top 25 highest ranked posts/comments to interpret each of the 55 topics obtained after
ATM step. We present the method of coding these topics as well as the codes briey in Appendix B.
It is important to note that we study all these topics in detail from each of the meta communities as
well. If we study top posts/comments directly from topics, few huge subreddits might dominate. To
ensure that subreddits are equally represented in the qualitative analysis, we study them in meta
communities.
A.0.4 Community Detection using Louvain. We used python
'
s
community
package’s imple-
mentation of Louvain’s community detection algorithm. To empirically nd “meta communities”,
this algorithm requires a graph input representing distance between data points. Therefore, we built
a graph in the form of an adjacency matrix containing distance between subreddits. Each subreddit
is represented by its topic distribution obtained from the Author LDA step. In other words, each
subreddit is represented by a vector of length 55 where the
i
t h
entry in the vector corresponds
to the probability of occurrence of
i
t h
topic in that subreddit. Several similarity measures can
be used to quantify the distance between two probability distributions, such as Kullback_Leibler
divergence, Wasserstein distance, Bhattacharyya distance or the Hellinger distance. We chose
Hellinger distance metric, a probabilistic equivalent of Euclidean distance that returns similarity
value in the range of [0,1]. Values closer to 0 indicate that probability distributions are more similar.
We calculated distance between subreddit pairs using this metric, lled in the adjacency matrix and
fed the matrix as input to Louvain. After the application of this algorithm, we obtained 5 clusters.
Each cluster is considered a
“meta community”
that disallows the same kinds of infractions. We
describe each cluster below and present its constituting subreddits in Table 6.
PACM on Human-Computer Interaction, Vol. 4, No. GROUP, Article 17. Publication date: January 2019.