Re: [math-fun] keyword frequencies in programming languages
Allan Wechsler <acwacw@gmail.com> Such a table fails to distinguish between programmer-defined items, and ones that are really part of the language core. I get the impression you are interested in the latter, although I confess that I'm not sure what the agenda is. Are we supposed to learn something about the languages in question? What? All Lisp dialects have an intentionally vague boundary between terms that are defined by the language, and those defined by the programmer. Another way to put this is that all Lisp programs are Lisp language extensions. I would argue, for example, that "point", which features prominently in the list above, reveals something about the application (Emacs) rather than the language (Emacs Lisp). "Save-excursion" -- even more so. In Lisp, there is no syntactic clue that infallibly singles out language primitives. What are we trying to find out? Perhaps there is a better way than just counting lemmas; or perhaps, for Lisp at least, we are asking the wrong question.
--I agree with AW (and knew that already). And all the counts by me and by Diffie are somewhat flawed due to our laziness about not really counting things "right". What am I trying to figure out? Well, there are probably many uses for keyword frequency data from real programs. But just one is: if you are trying to design a new language, then this data will be useful to you. I am on a likely fool's errand of trying to, at least sketchily, design a new language, and if you want to correspond with me about that, go ahead...
... all the counts by me and by Diffie are somewhat flawed due to our laziness about not really counting things "right".
It wasn't laziness on my part (although I have plenty); I didn't know what you wanted. It wouldn't be difficult to produce counts of two sorts of primitives. I could do this study on my code, counting only the lisp commands I did not write myself or I could do it on the system code and count only the ones defined in C. That is easy to get from the documentation system and I believe I can send a list without much trouble. I believe that Allan Wechsler is right in thinking that lots of the Elisp primitives are about the data structures. Weeding those out would take more judgement. Presumably plus and times are about the fact that a language does arithmetic. As I said earlier, I think a count of words in Common Lisp would be rather different. Whit
On 6/18/14, Whitfield Diffie <whitfield.diffie@gmail.com> wrote:
... all the counts by me and by Diffie are somewhat flawed due to our laziness about not really counting things "right".
It wasn't laziness on my part (although I have plenty); I didn't know what you wanted. It wouldn't be difficult to produce counts of two sorts of primitives. I could do this study on my code, counting only the lisp commands I did not write myself or I could do it on the system code and count only the ones defined in C. That is easy to get from the documentation system and I believe I can send a list without much trouble.
I believe that Allan Wechsler is right in thinking that lots of the Elisp primitives are about the data structures. Weeding those out would take more judgement.
--that is the benefit of also counting for a quite different big LISP program such as "Macsyma" rather than "Emacs" -- it will not have a lot of stuff about buffers and the cursor, but will focus artificially on other stuff. If we compared the two we could start seeing what is real versus what is an illusion. Best of all is a large corpus of different programs by different authors at different times for different purposes, but that's too much work for me. The serious studies I cited put 400 to 2000 programs in their corpus, but did only 1 language for only 1 year.
Presumably plus and times are about the fact that a language does arithmetic. As I said earlier, I think a count of words in Common Lisp would be rather different.
--would also be of some interest... I don't think it is worth getting too buried in the details of just exactly what is the "right" question since at least for the purposes I care about, fairly low relative-accuracy counts suffice, plus it is not obvious what the "right" question is exactly anyhow. I'd also like to see a quite different language type, such as ML, examined, to see if it changes anything. As far as I can see the LISP counts mostly agree fairly well with what would naively be expected from the counts in C-like languages, if just look at the C-like subset of LISP. The stuff in LISP like car and cdr which has no C counterpart, is another thing entirely.
="Warren D Smith" <warren.wds@gmail.com> I am on a likely fool's errand of trying to, at least sketchily, design a new language, and if you want to correspond with me about that, go ahead...
I admire your pluck, but... "be careful what you wish for"! It's always seemed a bit pretentious and maybe even arouses suspicions of "liberal-arts envy" that awkward coding systems were styled as "languages". Whereby, for example, do they support philosophical discourse, or poetry? I'm tempted to say a great deal more, but instead will just commend to your attention the entertaining and I hope inspiring presentation of Guy Steele: "Growing a Language" Video: http://www.youtube.com/watch?v=_ahvzDzKdB0&feature=kp Paper: http://www.cs.virginia.edu/~evans/cs655/readings/steele.pdf Please be patient in viewing: it takes ~9 minutes for his deeper point to emerge, but the reveal is fun, and his message hopefully thought-provoking. "Enjoy"! --MLB PS: these lists of keywords are reminiscent of archeological core samples, with the appearance of various fossil forms in successive strata strongly signaling dating along the history of ideas. In Lisp, for example, the primitive SETQ was conceptually (though not so much in practice) superseded by the more general, implementation-hiding, SETF. This affiliates the implementation of Emacs to a specific era in that arc of craft artifacts.
participants (3)
-
Marc LeBrun -
Warren D Smith -
Whitfield Diffie