Tracing violent Tweets can help prevent or reduce violent events, and Voice4Impact is a non-profit organization that works on solving this problem. Working with Omdena, I lead a team to apply natural language processing to identify violent Tweets in the Chicago area.
Our optimal solution is an XGBoost model with: - binary violen vs non-violent targets - character-level modeling due to the heavy use of slang - Undersampling due to heavily unbalanced classes, fortunately violent Tweets are rare
We were also mindful about not including usernames or other user data to avoid profiling particular users.
We achieved above 50% above baseline F2 score, and below are examples of top violent and non-violent Tweets.