2006-07-11

bcholmes: (Default)
2006-07-11 07:21 pm

What's Newsworthy?

Some statistical analysis. Articles from many dozen online news sources. Over 18,000 articles sampled over a span of two weeks or so. Strip out advertising, "related stories", and other stuff, leaving only the article content.

Ignore the following words: and, but, or, the, an, a

Here are some word frequencies:

  • he: ~50,000 instances
  • his: ~34,000 instances
  • she: ~11,000 instances
  • her: ~11,000 instances

"He" was also the second-most common word, after "said". "Their" was more common than "she".