While I was navigating through the studies of deception detection using natural language processing and machine learning techniques, I stopped for reading a study that could be considered as the most interesting and the most important ones in the field of deception detection and the study of pseudoscientific media. In the abstract of the study Eugenio et al mentions that their method could reach a 99% accuracy in detecting hoax posts, even if they received 1% of the posts as training data.
To clarify the terms of machine learning and training data. Machine learning is a set of statistical methods that recognizes results or the solution by depending on training data which is a set of data that was pre-judged by experts or by a known fact about the data, namely a set of correct results. Then, the machine learning model will predict the results / solution for a different unknown set, using the settings and relationships that were understood by the training data. It is similar to what we do when we solve the problems of the missing value like the one in the shape.
While deception detection have many techniques, some of them depend on crowdsourcing, which is to collect information from a huge amounts of audience (which started to be more important upon the technological advances and the increased number of internet users); mathematical and statistical methods; fact checking methods; or other methods for find the deception automatically.
Usually you need enough data for training, to be able to make your machine learning model understand the situation (mathematically) and have the correct settings to find the prediction according to the input. However, what Tacchini et al have done, is to get a correct prediction even by 1% of data for training. The study was relying on the huge amounts of Facebook likes that indicate the state of the post. Previously they prepared a list of pro-hoax and scientific Facebook pages, then they classified their users according to their likes to these groups of pages. After that, it was easy to specify if the post was a hoax or not, by depending on the user likes as a judgement.
This study tells us that the there are two forms of thinking toward pseudoscience and science and the two forms are not having a big common intersection area. The two forms of thinking are far away from each other so it was possible to make the high accuracy predictions easily in this way. Therefore, we should consider the way we fight pseudoscience, and the way we teach and publish popular science as well.
Tacchini, Eugenio, et al. “Some like it hoax: Automated fake news detection in social networks.” arXiv preprint arXiv:1704.07506 (2017).