Results: The bagging classifier model was able to classify the subreddits with an accuracy score of 91%, which performed significantly better than the baseline score of 60%. The most important word features for the model were: "bitcoin", "doge", "dogecoin", and "emoji_nan". In other words, these four features were most significant to the model when determining whether a post was from the Dogecoin subreddit or Bitcoin subreddit. Further, upon analyzing the frequency of emojis used, emojis were used .03% of the time in Bitcoin posts and 20% of the time in Dogecoin posts.
Bitcoin vs Dogecoin Subreddits Classifier
A subreddit classifier used to determine if a given subreddit was from -r/Bitcoin or -r/Dogecoin. Created using Reddit's Pushshift API, feature engineering supported by a Python Emoji library, and a Bagging Classifier model.