What's Newsworthy?
Jul. 11th, 2006 07:21 pmSome statistical analysis. Articles from many dozen online news sources. Over 18,000 articles sampled over a span of two weeks or so. Strip out advertising, "related stories", and other stuff, leaving only the article content.
Ignore the following words: and, but, or, the, an, a
Here are some word frequencies:
- he: ~50,000 instances
- his: ~34,000 instances
- she: ~11,000 instances
- her: ~11,000 instances
"He" was also the second-most common word, after "said". "Their" was more common than "she".