Alan Maloney, Project Manager, Online Products
If you wanted to read every book and journal article that SAGE published in 2012, and you read 24/7 at an average pace starting right now, you wouldn’t be done until February 2016. And that’s just one publisher among many. What do you do when there is simply more information than you – or anyone – could ever read and digest?
Predictably, when something is beyond human capability, we get a machine to do it – and that’s the principle behind using text and data mining to discover information hidden in massive bodies of content. This is not really news in scientific, medical and technical publishing, where researchers have been using computers and natural language processing for decades in order to discover insights like protein interactions and drug side-effects. But most of the books and journals that SAGE publishes in 2012 are from the humanities and social sciences, where text and data mining techniques are more experimental (and more interesting).
It’s an exciting time for text and data mining, with exponentially more content being made available online, and text and data mining tools becoming more and more sophisticated. But having a computer read and summarise text for you still has its challenges, especially in the humanities and social sciences. A knowledgeable human may tell you that Hamlet is a story of revenge and encourage you to check out Edward II, but a computer might see the word ‘gravity’ and recommend a work by Isaac Newton instead. Is an author talking about orange the colour, orange the fruit or Orange the company? That’s why we have to know our content really well – so that we can teach a computer rules and exceptions to make sense of things to the reader. Text mining will never give perfect results, but when you need an insight into hundreds of thousands of documents, it’s a good start.
SAGE wants to be a champion of text and data mining in the social sciences. This year we used text mining techniques to apply hundreds of thousands of keywords to our book content in SAGE Knowledge, as well as recommend related documents in SAGE Research Methods and SAGE Journals. None of the links were made by a human.
Over the coming months, SAGE will be rolling out a number of new and experimental enhancements to its online books and journals, so watch this space. And in the meantime, if you have any thoughts on how SAGE can do more in this area, do get in touch!