From: Andy Latto <andy.latto@pobox.com> What would you consider the "keywords" for lisp; the special forms? Seems somewhat arbitrary; for example, common lisp specifies that IF is a special form and COND is a macro, but could easily have made the opposite choice, and only a small number of programs that do analysis of other programs would need to be any different.
Andy
--I do not know. As Andy correctly points out, there is a lot of arbitrariness is defining exactly what we want these statistics to be ... library calls, common idioms, and macros seem fair game even though perhaps technically not "keywords" ... so the vague answer is "just do something reasonable." You could try macsyma as a big LISP program, I guess (I'm assuming various math-funners have the LISP source for it and tons of expertise, which I don't) And to do a really good job, you really should instrument a compiler, or something, to gain a high degree of understanding of the programs you are collecting stats on. However, I'm way too lazy to do that. I used fairly crude word counting scripts and made some arbitrary decisions about what to count. So my counts are not as accurate as they could be, but it does not matter all that much. I redid my counts for gcc, by the way, by making my counter scripts remove comments before counting (which for some programs makes little difference, but in the case of gcc it made a considerable difference) and also added counts for some rather-arbitrarily selected characters and character combinations. The new gcc results are: " = "=82943, "-"=73769, IF=72024, "{"=59278, ">"=39119, "=="=33504, " & "=32425, RETURN=26285, RAND=29586, "&&"=26653, CASE=26098, STR=15288 (count includes many functions), ELSE=20575, CONST=16284, GOTO=15666, " + "=14668, "<"=13228, VOID=11991, INT=11824, FOR=10148, "!="=10095, STATIC=9980, DEFINE=8750, BREAK=7925, ENDIF=6755, CHAR=6524, UNSIGNED=6512, "++"=4869, ENUM=3844, PRINTF=2798, DEFAULT=2519, TRUE=2352, BOOL=2263, FALSE=2105, MIN=1961, MAX=1927, WHILE=1929, SIZEOF=1924, SWITCH=1915, "+="=1617, "/"=1327, CONTINUE=1185, UNDEF=1159, INLINE=1138, FILE=1129, INCLUDE=845, SIGNED=656, FLOAT=541, TYPEDEF=513, DOUBLE=457, UNION=438, DO=438, CLEAR=395, STDERR=360, MEMSET=316, "^"=303, "&="=296, SORT=275, UCHAR=274, SWAP=211, VOLATILE=194, UINT=181, ASSERT=129, "*="=57, CLZ=46, CTZ=36, "^="=29, POPCOUNT=28, STDIN=24, FCLOSE=18, STDOUT=14, LSB=11. and you will notice that indeed, some of the stuff I counted like MIN, POPCOUNT, RAND, SWAP, MEMSET are not C keywords, they are macros or library calls.