From Objects to Data

Here you can view our data
just scroll down…

About our Findings

Main Research Question:

Our main research question was to find an explanation to the "disappearance" of "Humanities" articles on the front page of the New York Times from the 1980's onward.

This seemed like an interesting research question; "How come Humanities falls of the map completely and stops being featured on the front page in The New York Times newspaper?". Sadly the answer was much simpler than our assumptions and theories: it turned out to be the simple fact that within the New York Times API the value Front-Page was not used anymore for articles that were published in the 1980's and onwards.

This weird inconsistency in the New York Times API was a little bit of a surprise. But on the other hand it meant that we found our answer and could hand in our results.

But we didn't…

Instead we got together and tried to come up with a different research question. At least one we hoped would be a bit more of a challenge than our initial question.

And so our virtual journey continued into the digital wonderland of an inconsistent API, a lot of grep, wget, sed, AWK and jq.

our fantastic five

During our digital journey some of us got lost, but we managed to find our orientation again and stick together. Determined to tackle every problem thrown at us by our Teacher, the theory and the sometimes dirty hands-on hardcore coding we manned up and pulled ourselves together and came up with several questions and the acompanying answers.

It was hard… It was dirty and someone even got lost in jq-limbo never to be seen again until the final end of our journey.

Now we're ready to present you our results.

Our Scripts:

New York Times Data Script

New York Times Data Script

This script will perform different actions needed to accomplish it's goal. (wow, that's vague…)

How to run nytimesdata.sh

  • download the script;
  • place it in a convenient location on your computer;
    • open Terminal and point the path towards the directory that contains the script;
    • Tip: you can open Terminal, type cd followed by whitespace, drag the folder that contains the script into the terminal window and hit the Enter key.
    • type chmod +x nytimesdata.sh and hit the Enter key;
    • type ./nytimesdata.sh and hit the Enter key;
    • the script will run and do it's thing.

Issues

  • Several issues that need addressing;
    • script exits when the directory exists;
    • no if statements;
    • no getopts;
    • many others...
  • Because of these problems somewhere in the world a kitten dies.

View and run the New York Times Data Script by downloading it.


Script One

This script draws information from the NY-Times article-search API.

  • It defines the total number of news-items kept within the dataset that is attached to the API.
  • It defines the total number of front-page articles kept within the dataset that is attached to the API.
  • It defines the total number of 2000's news-items kept within the dataset that is attached to the API.
  • It defines the total number of 2000's front-page articles kept within the dataset that is attached to the API.
  • It prints the results into the command-line interface.

Developments needed:

Instead of the total number of ALL news-items it should rule out blogs, since those form a very specific type of news-item that only came into excistence over the last few decades. They do not form part of the actual newspaper. Or do they? Somebody willing to find out?

Can someone figure out a way to show the results for each decade automatically?

View and run Script One by downloading it.


Script Two

This script is the sum of multiple previous scripts from which the best scripting-solutions have remained and the broken or ugly (yes, we are judgemental) bits have been thrown away. The output of the original scripts was meant to help us get a more detailed look into the spread of Humanities front page articles viewed over different kinds of time-periods.

When writing the script we where still under the impression that only articles that had "The New York Times" as source where actually published in the NYT. Now we know this is not the case, as New York Times sometimes (re)publishes articles written by other news agencies. We've kept NYT as a selected in the script though, since we value originality but more importantly, it is one of the 'scars' that shows our learning curve. In the end, the script now shows:

  • Percentage of Humanities articles on the front page of The New York Times per decade
  • Average percentage of Humanities articles on the front page of The New York Times per month
  • Average percentage of Humanities articles on the front page of The New York Times per day of the month.

Although the graphs appear to show some interesting things; May and October aren't popular humanity-months (1), the 10th and 15th of each month seem a bad time to publish a Humanities article if it is your goal to reach the front page (2), it is hard to say that any of these differences are valid and significant. The most eye-catching results are the high percentages of front-page articles during the 1850's and 1870's (graph 1) can easily be discredited once we look at the low amount of articles published in total during those decades. In short, we still need to spend more time on interpreting our results.

View and run Script Two by downloading it.


Script Seven

This scripts extracts the total amount of articles and front-page of 2 free-to-choose articles directly from the API. The scripts however, stops working after the 1980's since the data-field which houses the front-page indication is then shifted. Since the new data-field is ridiculously parsed as a string by the API, the API won't allow us to do a valid search. Therefor, the entire dataset from the two queries is downloaded into .JSON files and searched with some help of JQ.

However, since the dataset of 'Science' is too big to download in one go, the script now uses to ready-made .JSON files. Otherwise, we would still be sitting here tomorrow... the exemplary and ready-made .JSON files will be delivered alongside the scripts in our final product.

View and run Script Seven by downloading it.


Now, let's take a look at the wonderful results

Finding numer One

The difference between the Science vs Humanities Articles in the New York Times

Finding numer Two

Percentage of science vs humanities articles on the Front page of the New York Times