As a historian, I privilege physical archives. I love to touch the original documents, smell them and mentally go back in time to when they were created. I had some skepticisms when I ventured into textmining these annual colonial reports. Most of my research has been on colonial development history and I have not used the Annual Colonial Reports as my primary sources. As I have been expanding my research interests, I have begun to wonder if the Annual Colonial Reports from Nigeria might reveal some important themes and shifts that will help me expand upon my research. From the onset, my interest was to look at the shifts in terms of colonial development.
The annual colony reports are vast documents and they contained over 1.4million words. These documents are dry and boring to read. I wondered if a digital tool would help me accomplish my purposes. That is where Voyant came in. It is a textmining digital tool that allows you to distant read large textual documents to see trends and shifts of words over time and it also create Word Clouds. I had 63 annual reports in pdf format that had been put through OCR. They contained a total of over 1.4million words. Some of these files I had downloaded over the years from different websites and a bulk of them I downloaded from the Hathi Trust website. It took a long time to download the files as the downloads kept being interrupted as a result of bandwith issues. I knew this was not a problem at my end as I was using a 150mbs download speeds. After downloading the files, I need to clean them up by removing some pages that were not part of the annual colonial reports. I also had to rename each document with a number and according to the year. The process of getting the documents ready took several hours. When the documents were ready, I zipped them into one file.
My first process was to use the downloaded Voyant Server. Uploading the zipped file to the downloaded server was a breeze. I did my initial word clouds and analysis using the downloaded server. I ran into two issues that forced me to abandon the downloaded server. After adding several Stop Words to my list, it cut me off and informed me that there was a limit to the number of Stop Words I could use. This was a problem because my Word Cloud was still meaningless to me as it still contained many general words that do not provide any important information about my corpus. The second problem I encountered was embedding my Voyant generated visualizations for interactivity in this WordPress website. By default, WordPress blocks iframes. I had to download the plugin “Iframe” and ran it on the website. Even then, I continued to get a server error when I try to run the visualizations. Dr. Stephen Robertson advised that I needed to run the Voyant online tools in order for visualizations to work. This meant, I had to abandon the downloaded server and move to the online tools.
I had thought that grabbing the url and current data from the downloaded server to the online tool will work. It did not. So, I had to upload the zipped file to the Voyant online tool. I started uploading the file at about noon on my macbook and for all day it kept showing that the file had not finished uploading. After about eight hours, I assumed that they must be issues with using wifi, so I decided to upload it on my iMac. When I tried to access Voyant tool, I kept getting the server error “Service Temporarily Unavailable.” For the next two hours I kept trying without any success. I decided to search to see if there a twitter account associated with Voyant tool. Behold, there was. I sent a tweet and I got a response within about two hours that Voyant was working fine. I went back on my iMac and it was now working. I uploaded the zipped file and within about two hours, the whole file was ready to be analyzed. That was not the end of my challenge. Voyant online tool fail again. I tweeted on the tool’s handle and I got a response within a few minutes telling me that Voyant was back up. They were very helpful and told me to contact should I have any more problems. For a free online tool, I was amazed by their responsiveness.
The first word cloud created by Voyant online tool was useless. It had “the” as the most used word in the document. I had to go through the process of fine tuning the word cloud by creating a Stop Words list. I did not realize how many hours this process was going to take. In the peer review process, I was asked to further fine tune my word cloud, creating pages with links and also analyze my word cloud to see what it points me to. The word cloud took the most significant amount of my time leaving me with less time for the rest. I created pages on my word clouds, word frequency graphs, a timeline on colonialism in Nigeria, a link to the corpus for people to explore it and an analysis of the word cloud. I highlighted only a few things from the corpus. I looked at the words expenditure and revenue to see how they correlated the colonial doctrine of financial self-sufficiency. The word slaves that was visualized in the word cloud caught my attention and I used it to explore slavery in Northern Nigeria during this time period. I analyzed the word cloud as a whole arguing that there is an inter-relatedness in the things visualized as these tell us about the things that were vital to the smooth running of the colonial state.
This work is far from complete. There is still much to be done. The frequency trend of some of the words in the word cloud need to be carefully looked at to see the shifts over time. The word cloud itself deserves further analysis. There is more it can tell us about the colonial state. Explore the corpus here and see what you can find. I am interested in hearing from you.