From Objects to Data Research Group

Our Scripts:

New York Times Data Script

This script will perform different actions needed to accomplish it's goal. (wow, that's vague…)

How to run nytimesdata.sh

download the script;
place it in a convenient location on your computer;
- open Terminal and point the path towards the directory that contains the script;
- Tip: you can open Terminal, type cd followed by whitespace, drag the folder that contains the script into the terminal window and hit the Enter key.
- type chmod +x nytimesdata.sh and hit the Enter key;
- type ./nytimesdata.sh and hit the Enter key;
- the script will run and do it's thing.

Issues

Several issues that need addressing;
- script exits when the directory exists;
- no if statements;
- no getopts;
- many others...
Because of these problems somewhere in the world a kitten dies.

View and run the New York Times Data Script by downloading it.

Script One

This script draws information from the NY-Times article-search API.

It defines the total number of news-items kept within the dataset that is attached to the API.
It defines the total number of front-page articles kept within the dataset that is attached to the API.
It defines the total number of 2000's news-items kept within the dataset that is attached to the API.
It defines the total number of 2000's front-page articles kept within the dataset that is attached to the API.
It prints the results into the command-line interface.

Developments needed:

Instead of the total number of ALL news-items it should rule out blogs, since those form a very specific type of news-item that only came into excistence over the last few decades. They do not form part of the actual newspaper. Or do they? Somebody willing to find out?

Can someone figure out a way to show the results for each decade automatically?

View and run Script One by downloading it.

Script Two

This script is the sum of multiple previous scripts from which the best scripting-solutions have remained and the broken or ugly (yes, we are judgemental) bits have been thrown away. The output of the original scripts was meant to help us get a more detailed look into the spread of Humanities front page articles viewed over different kinds of time-periods.

When writing the script we where still under the impression that only articles that had "The New York Times" as source where actually published in the NYT. Now we know this is not the case, as New York Times sometimes (re)publishes articles written by other news agencies. We've kept NYT as a selected in the script though, since we value originality but more importantly, it is one of the 'scars' that shows our learning curve. In the end, the script now shows:

Percentage of Humanities articles on the front page of The New York Times per decade
Average percentage of Humanities articles on the front page of The New York Times per month
Average percentage of Humanities articles on the front page of The New York Times per day of the month.

Although the graphs appear to show some interesting things; May and October aren't popular humanity-months (1), the 10th and 15th of each month seem a bad time to publish a Humanities article if it is your goal to reach the front page (2), it is hard to say that any of these differences are valid and significant. The most eye-catching results are the high percentages of front-page articles during the 1850's and 1870's (graph 1) can easily be discredited once we look at the low amount of articles published in total during those decades. In short, we still need to spend more time on interpreting our results.

View and run Script Two by downloading it.

Script Seven

This scripts extracts the total amount of articles and front-page of 2 free-to-choose articles directly from the API. The scripts however, stops working after the 1980's since the data-field which houses the front-page indication is then shifted. Since the new data-field is ridiculously parsed as a string by the API, the API won't allow us to do a valid search. Therefor, the entire dataset from the two queries is downloaded into .JSON files and searched with some help of JQ.

However, since the dataset of 'Science' is too big to download in one go, the script now uses to ready-made .JSON files. Otherwise, we would still be sitting here tomorrow... the exemplary and ready-made .JSON files will be delivered alongside the scripts in our final product.

View and run Script Seven by downloading it.

Now, let's take a look at the wonderful results

From Objects to Data

About our Findings

Main Research Question:

Our Scripts:

New York Times Data Script

New York Times Data Script

Script One

Script Two

Script Seven

Finding numer One

The difference between the Science vs Humanities Articles in the New York Times

Finding numer Two

Percentage of science vs humanities articles on the Front page of the New York Times