Berlin-Kay flickr

From Sedes Draconis

Jump to: navigation, search
Show me the graph already!



Definitions of terms

Independent variable: Berlin-Kay Index

Short description: A way of indexing colors (color words) by some kind of "basicness", proposed in a work of Linguistic Anthropology in 1969 by Brent Berlin and Paul Kay: Basic Color Terms: Their Universality and Evolution.

Skip the longer description.

Longer Description

Berlin & Kay (1969) claim a fixed progression of basic (monoleximic, non-subset, psychologically salient) color terms. Which they broke into a series of phases. For the record, I believe they were not entirely right and not entirely wrong, but that my results have interesting implications either way.

Berlin-Kay model:

Phase Colors included Interpretation of claim
Phase I White, Black If there are only two color terms, their prototypical values are white and black.
Phase II White, Black, Red If there are three color terms, the first two are as in Phase I, and the third has a prototypical red value.
Phase IIIa White, Black, Red, Green
Phase IIIb White, Black, Red, Yellow If there are four, the fourth is prototypically either yellow or green.
Phase IV White, Black, Red, Green, Yellow If five, both of Phase III colors.
Phase V White, Black, Red, Green, Yellow, Blue Then blue.
Phase VI White, Black, Red, Green, Yellow, Blue, Brown
Phase VII White, Black, Red, Green, Yellow, Blue, Brown, Purple, Pink, Orange, Grey Then the final four colors are added. The original model does claim that these four are added all together.

I am translating the phases into an "index" value for each color which, I believe, does not change the theoretical claim of the model, but only creates a notation that is more appropriate and simpler for my purposes.

My notation, instead of indicating the progression of languages through this sequence, indicates the status assigned to each color by the model. For the most part, this means, each index value contains the color which is added in the corresponding phase. Yellow and Green are given the same index value in recognition that the Berlin-Kay model gives them equal status in the progression. I have also started the numbers from 0, instead of 1 (out of arbitrary preference).

My Berlin-Kay index values:

  • 0: White, Black (Phase I)
  • 1: Red (added in Phase II)
  • 2: Green, Yellow (Phases IIIa, IIIb, and IV)
  • 3: Blue (Phase V)
  • 4: Brown (Phase VI)
  • 5: Purple, Pink, Orange, Grey (Phase VII)

2008 Photos

Short description: the number of photos on flickr from 2008 described by each color word.

Long Description

For each color term, this value is the number of photos returned by a full text search over the photos with dates between the end of 2007 and the end of 2008; the values were obtained using a query using the flickr API.

For example the API returned:

<photocounts busiest_date="2007">
<photocount count="274246" fromdate="2004" todate="2005"/>
<photocount count="831895" fromdate="2005" todate="2006"/>
<photocount count="1723734" fromdate="2006" todate="2007"/>
<photocount count="2641788" fromdate="2007" todate="2008"/>
<photocount count="70298" fromdate="2008" todate="1261440000"/>

So the number of 2008 photos which are somehow described as "red" is 2 641 788. This is not the number of photos which "actually are red" (however that gets defined), but those described with the word, "red". For example, a picture titled "Red Panda" is counted the same as a picture titled "Red Rose" or tagged "red" (actually all of those linked photos are tagged "red", but the point remains that the first two would have been counted anyway).

This could be counted either as noise or as valid data, depending on how your preferred analysis. I would argue this should not be considered worthless noise, because there's a reason it's called a "red panda" and not a "brown panda".



Personal tools