-
Intro:
For this midterm, I chose to discover the most popular X-Men characters from a section of the ‘Uncanny X-Men’ Chris Claremont run, or more specifically from issues 100 -149. From the data found, I believe that the most popular characters were the ones whose names were used the most frequently.
Sources:
The dataset I used was the spreadsheet titled “uncanny-xmen-100-149-characters.csv”
Processes:
The most difficult part of this assignment was refining the dataset to what I specifically wanted. When I got to this step, I understood that my goal was to account for the number of times a character is doing something (and by something, I mean if they’re appearing in that issue of ‘Uncanny X-Men’). The spreadsheet has the first row as the issue number, with the corresponding character in that column, along with the numerous interactions that they could have possibly done in that issue (number of kills, hand holding, kissing another, etc.).
At first, I didn’t necessarily know what I should have limited my dataset to, but I decided to use Open Refine and work from there. Going off what we learned in class, I opted to try and detect any clusters in the dataset, but to my surprise, there were none! After a few more minutes of messing with OpenRefine and some help from Austin, I decided to move on to manipulating the datasheet itself and using the summation tool to count the use of the character by row. However, there were already two issues with this: (1) It was, of course, by row, and was not a sum of everything that character did throughout the given issues, meaning I would have to count every character still and I was still as square one. And (2), should I follow through with (1), I did not know how to get a summation for every character efficiently! I decided it was best to move onto Voyant Tools.
I used Voyant Tools to do most of my actual data cleaning. I specifically used Voyant because of the visuals I would be given as a result. Prior to finding the best specifications with my limited knowledge, I had a few trial-and-error cases. For the best results, I had to convert the csv file to an xlsv file. Next, I went into the “Tables” section of options and specified that I wanted the document to be extracted “from cells in each row” and combine column 1 and 2 in one document, then use column 3 for a second document (it’s typed out as 1+2,3). Once the table was generated, I also removed the word ‘Marvel’ (the company name) so that it would only be focused on the content in the story. From here, I was able to create a table and visual example that was sufficient for an analysis.
Presentation:
As I said earlier, the result I wanted was the visual representation of the dataset. Through Voyant Tools, I have provided the number of times a name has appeared via chart and a “cirrus” (it shows the most popular words via the size it appears as). You can view a bigger version of both of these by clicking on their respective images to the right.
Significance:
I believe a general assessment can be made from the data I collected. At a glance, we can see that the bigger the name, the more times it was used by issue. This implies that said character with the respective name was very imperative in the story arc that was occurring during this time period. However, one thing that glaringly obvious the word ‘unknown’, though all this means is that the character that this word is tied to did not have a confirmed alter-ego/secret identity (ex: not an X-Men, but Spider-man’s secret identity is Peter Parker. Don’t tell anyone!). We can see that the most prominent characters during this run were people like ‘Summers’ (ergo Scott Summers, or Cyclops) and ‘Phoenix’ (ergo Jean Grey). However, a big issue I am still having trouble with is the characters having alter-egos, because the datasheet considers them separate. Because of this, we have words like ‘Carol’ and ‘Danvers’ appear separately, even though, presumably, they are both being used to address the character ‘Carol Danvers’. This was an issue I still cannot solve.
Please look at the data via the images! Excelsior!