This site is not optimized for Internet Explorer 9 and lower. Please choose another browser or upgrade your existing browser in order get the best experience of this website.

Investigative Journalism: On the Face of a Woman

On the face of a woman: Investigative Journalism with ONgDB on GraphGridAll data tells a story—a story to be investigated. The beautiful thing about data is its ability to avoid all subtext and ambiguity and go straight to the facts. Swimming through large amounts of it however, can easily become overwhelming and unproductive. In order to be able to critically instigate the data and its story, one must be able to model and query it. On the Face of a Woman is one such story we created by connecting data in context.

Last year using ONgDB with GraphGrid Connected Data Platform we created a property graph model for a data set released to the public through the CHHS Open Data Portal. What struck us about this data was its enormous potential to be controversial. The data contained all cosmetic products sold in California “known or suspected to cause cancer, birth defects, or other developmental or reproductive harm” (CHHS). Many Californians assume that any makeup product they see on the shelves has passed every safety and health code before it reaches the public. This dataset said otherwise. Using the property graph model, we were able to bring meaning to the numbers.

The Chemicals in Cosmetics data set was richly interconnected. ONgDB brought this to our advantage since we were able to put properties on both the nodes and node relationships. Things like dates and times could be placed on the relationship connecting a Brand to its Product for example. It was much easier to understand the information and what each brand and company was responsible for. The more we queried the data, the richer and more revealing it became. Starting with simple questions such as “Which brand has the most products reported?” helped us funnel the graph down to more specific questions and queries. With ONgDB we were able to seamlessly navigate and scrutinize the data until we had overturned all of its subtleties.

Read the full exploration and detailed writeup, which was the wining GraphGist entry for Investigative Journalism below (the submission site no longer has the writeup):

Our Inspiration:
Look into the face of a woman in makeup and you travel faster than the speed of light, peering into decades of history on the female gender. The Victorian woman who wasn’t allowed to wear even a pinch of makeup (in the danger of being thought promiscuous) became the 1920s flapper who defiantly traded the prison-like corsets and bare faces for loose dresses and heavily smudged black eyeliner. More recently, the woman of WWII used the classic red lipstick and a perfectly powdered complexion as remedies of beauty and femininity in a time of death and suffering.

A common theme that threads through the history of makeup is a longing desire for freedom. Women (or even men e.g. David Bowie) who applied a trace of makeup on their faces were smearing on their battle paint, kindling a freedom fire that has yet to be quenched.

Our Problem:
However, makeup also has a dark history of exploitation. Was there mercury melted in mascara and lead laden in lipstick? These were important questions in the early 1900s. Some women even died at the hand of unethical makeup ingredients. But are dangerous chemicals in cosmetics only a thing of the past? What harmful ingredients exist in makeup now? We find out today what is on the face of a woman.

Our Weapons of Investigation:
On the related subject of all things beautiful, data and specifically data modeled by graphs, cuts through all fluffy communication and gets to the cold, hard facts. Using ONgDB and the graph query language Geequel we will model our explorations so that any average Joe or Jane can discover the truths we are going to bring to light.

We are going to be using a dataset from the California Department of Public Health that contains the any cosmetic product that contains chemical(s) that cause or are suspected to cause cancer, developmental birth defects, or harm to the reproductive system.

Geequel Query 1:
What brands have the most products reported?
MATCH (b:Brand)-[:PRODUCES]->(p:Product)
RETURN AS Brand, count(p) AS numOfProducts
ORDER BY numOfProducts DESC;
Philosophy has 17 & Revlon has 342 while others don’t even hit double digits in their amt of products

Geequel Query 2:
What types of makeup are the biggest offenders?
MATCH (p:Product)-[:BELONGS_TO]->(s:Category)
RETURN c, count(p) as productCount
ORDER BY count(p) DESC;

Foundations and Bases: 133
Eye Shadow: 113
Lip color-Lipsticks, Liners, and Pencils: 65
Lip Gloss/Shine: 30
Artificial Nails and Related Products: 22

Geequel Query 3:
Titanium Dioxide is the chemical in question for every single product in these 5 categories.

MATCH (ch:Chemical)-[:USED_IN]->(p:Product)-[:BELONGS_TO]->(c:Category)
RETURN c, count(p) AS productCount, ch AS chemical
ORDER BY count(p) DESC;

Geequel Query 4:
Lookup DESC Amount of Products Where Chemicals have Not Been Removed and the Product has not been Discontinued (Categorized by Brand)

MATCH (b:Brand)-[:PRODUCES]->(p:Product)<-[r:USED_IN]-(ch:Chemical) WHERE r.dateChemicalRemoved IS NULL AND p.discontinuedDate IS NULL RETURN b AS brand, count(p) AS productCount ORDER BY productCount DESC LIMIT 10; Geequel Query 5a: Are companies still selling products with possibly harmful chemicals? MATCH (b:Brand)-[:PRODUCES]->(p:Product)<-[r:USED_IN]-(ch:Chemical) WHERE r.discontinuedDate IS NULL RETURN b AS brand, collect(p), count(p) AS productCount, ch AS chemical All of the brands returned had 1-3 products (at most) still being sold. Except for Revlon, which had a whopping 342 products still for sale with a potentially harmful chemical. So every single product of Revlon’s in this data set can still be purchased in stores. 5b: Maybe Revlon had kept the product but removed the potentially harmful chemical? MATCH (b:Brand)-[:PRODUCES]->(p:Product)<-[r:USED_IN]-(ch:Chemical) WHERE r.dateChemicalRemoved IS NULL RETURN b AS brand, collect(p), count(p) AS productCount, ch AS chemical The results are the same as the previous query for Revlon and therefore the chemical remains intact. 5c: Maybe these reports are recent and Revlon hasn’t had time to adjust these products or take them off the shelves? MATCH (b:Brand {name: “Revlon”})-[:PRODUCES]->(p:Product)<-[r:USED_IN]-(ch:Chemical)
RETURN b AS brand, collect(p), count(p) AS productCount, r.initialDateReported, ch AS chemical

80 products were returned for Revlon—all initially reported on “09/02/2009 12:00:00 AM” so there is a large gap of time between the present moment and the time that these products were reported.