SCIgen - An Automatic CS Paper Generator
SCIgen is a program that generates random Computer Science research papers, including graphs, figures, and citations. It uses a hand-written context-free grammar to form all elements of the papers. Our aim here is to maximize amusement, rather than coherence.
One useful purpose for such a program is to auto-generate submissions to conferences that you suspect might have very low submission standards. A prime example, which you may recognize from spam in your inbox, is SCI/IIIS and its dozens of co-located conferences (check out the very broad conference description on the WMSCI 2005 website). There's also a list of known bogus conferences. Using SCIgen to generate submissions for conferences like this gives us pleasure to no end. In fact, one of our papers was accepted to SCI 2005! See Examples for more details.
We went to WMSCI 2005. Check out the talks and video. You can find more details in our blog.
Also, check out our 10th anniversary celebration project: SCIpher!
Generate a Random Paper
Want to generate a random CS paper of your own? Type in some optional author names below, and click "Generate".
SCIgen currently supports Latin-1 characters, but not the full Unicode character set.
ExamplesHere are two papers we submitted to WMSCI 2005:
- Rooter: A Methodology for the Typical Unification of Access Points and Redundancy (PS, PDF)
Jeremy Stribling, Daniel Aguayo and Maxwell Krohn
This paper was accepted as a "non-reviewed" paper!
We received many donations to send us to the conference, so that we can give a randomly-generated talk.
- The Influence of Probabilistic Methodologies on Networking (PS, PDF)
Thomer M. Gil
For some reason, this paper was rejected. We asked for reviews, and got this response.
Thanks to the generous donations of 165 people, we went to WMSCI 2005 in Orlando and held our own "technical" session in the same hotel. The (randomly-generated) title of the session was The 6th Annual North American Symposium on Methodologies, Theory, and Information. The session included three randomly-generated talks:
As promised, we videotaped the whole thing. You can download the resulting movie, titled Near Science, below. Movie length: 13:15.
Trouble playing the AVI? Try downloading a DivX codec for Windows or Mac, or try the open source VideoLAN player.
You can read more about the trip here, and check out some pictures here.
Many thanks to everyone who made this possible, especially Tadd Torborg and family, Open Clipart, the PDOS research group, and of course all the SCIgen donors.
The code for SCIgen is released under GPL, and is now available via github!
If you are a time-traveler from 2002 and prefer anonymous CVS, here you go:
We're still working on documentation and making it more user-friendly, but you should be able to figure most of it out from the code. Here's what you need on your computer to run it (we've run it on FreeBSD and GNU/Linux platforms):
If you would like to contribute code to this project (i.e., by helping us expand our context-free grammar with more sentences, nouns, etc.), please contact us with any patches and we'll apply them if they seem reasonable. We hope to set up a better system sometime in the near future.
Running the code. We've been getting a lot of questions about how to run the code. There are quite a few misleading files in the source -- sorry about that. All you need to do to generate a paper is to run (also look at ). You can also use to generate any arbitrary starting target. See for most of the grammar rules.
As indicated above, one of our generated papers got accepted to WMSCI 2005. Our plan was to go there and give a completely randomly-generated talk, delivered entirely with a straight face. However, this is very expensive for grad students such as ourselves. So, we asked visitors to this site to make small donations toward this dream of ours; the response was overwhelming.
Amount of donations:$2401.43(after PayPal fees)
Number of donations:165
Amount of time:72 hours
We used this money to hold our own session at the same hotel as WMSCI 2005.
Related WorkOther papers:Other generators:Other SCIgen successes:
The publishers Springer and IEEE are removing more than 120 papers from their subscription services after a French researcher discovered that the works were computer-generated nonsense.
Over the past two years, computer scientist Cyril Labbé of Joseph Fourier University in Grenoble, France, has catalogued computer-generated papers that made it into more than 30 published conference proceedings between 2008 and 2013. Sixteen appeared in publications by Springer, which is headquartered in Heidelberg, Germany, and more than 100 were published by the Institute of Electrical and Electronic Engineers (IEEE), based in New York. Both publishers, which were privately informed by Labbé, say that they are now removing the papers.
Among the works were, for example, a paper published as a proceeding from the 2013 International Conference on Quality, Reliability, Risk, Maintenance, and Safety Engineering, held in Chengdu, China. (The conference website says that all manuscripts are “reviewed for merits and contents”.) The authors of the paper, entitled ‘TIC: a methodology for the construction of e-commerce’, write in the abstract that they “concentrate our efforts on disproving that spreadsheets can be made knowledge-based, empathic, and compact”. (Nature News has attempted to contact the conference organizers and named authors of the paper but received no reply*; however at least some of the names belong to real people. The IEEE has now removed the paper).
*Update: One of the named authors replied to Nature News on 25 February. He said that he first learned of the article when conference organizers notified his university in December 2013; and that he does not know why he was a listed co-author on the paper. "The matter is being looked into by the related investigators," he said.
How to create a nonsense paper
Labbé developed a way to automatically detect manuscripts composed by a piece of software called SCIgen, which randomly combines strings of words to produce fake computer-science papers. SCIgen was invented in 2005 by researchers at the Massachusetts Institute of Technology (MIT) in Cambridge to prove that conferences would accept meaningless papers — and, as they put it, “to maximize amusement” (see ‘Computer conference welcomes gobbledegook paper’). A related program generates random physics manuscript titles on the satirical website arXiv vs. snarXiv. SCIgen is free to download and use, and it is unclear how many people have done so, or for what purposes. SCIgen’s output has occasionally popped up at conferences, when researchers have submitted nonsense papers and then revealed the trick.
Labbé does not know why the papers were submitted — or even if the authors were aware of them. Most of the conferences took place in China, and most of the fake papers have authors with Chinese affiliations. Labbé has emailed editors and authors named in many of the papers and related conferences but received scant replies; one editor said that he did not work as a program chair at a particular conference, even though he was named as doing so, and another author claimed his paper was submitted on purpose to test out a conference, but did not respond on follow-up. Nature has not heard anything from a few enquiries.
“I wasn’t aware of the scale of the problem, but I knew it definitely happens. We do get occasional e-mails from good citizens letting us know where SCIgen papers show up,” says Jeremy Stribling, who co-wrote SCIgen when he was at MIT and now works at VMware, a software company in Palo Alto, California.
“The papers are quite easy to spot,” says Labbé, who has built a website where users can test whether papers have been created using SCIgen. His detection technique, described in a study1 published in Scientometrics in 2012, involves searching for characteristic vocabulary generated by SCIgen. Shortly before that paper was published, Labbé informed the IEEE of 85 fake papers he had found. Monika Stickel, director of corporate communications at IEEE, says that the publisher “took immediate action to remove the papers” and “refined our processes to prevent papers not meeting our standards from being published in the future”. In December 2013, Labbé informed the IEEE of another batch of apparent SCIgen articles he had found. Last week, those were also taken down, but the web pages for the removed articles give no explanation for their absence.
Ruth Francis, UK head of communications at Springer, says that the company has contacted editors, and is trying to contact authors, about the issues surrounding the articles that are coming down. The relevant conference proceedings were peer reviewed, she confirms — making it more mystifying that the papers were accepted.
The IEEE would not say, however, whether it had contacted the authors or editors of the suspected SCIgen papers, or whether submissions for the relevant conferences were supposed to be peer reviewed. “We continue to follow strict governance guidelines for evaluating IEEE conferences and publications,” Stickel said.
A long history of fakes
Labbé is no stranger to fake studies. In April 2010, he used SCIgen to generate 102 fake papers by a fictional author called Ike Antkare [see pdf]. Labbé showed how easy it was to add these fake papers to the Google Scholar database, boosting Ike Antkare’s h-index, a measure of published output, to 94 — at the time, making Antkare the world's 21st most highly cited scientist. Last year, researchers at the University of Granada, Spain, added to Labbé’s work, boosting their own citation scores in Google Scholar by uploading six fake papers with long lists to their own previous work2.
Labbé says that the latest discovery is merely one symptom of a “spamming war started at the heart of science” in which researchers feel pressured to rush out papers to publish as much as possible.
There is a long history of journalists and researchers getting spoof papers accepted in conferences or by journals to reveal weaknesses in academic quality controls — from a fake paper published by physicist Alan Sokal of New York University in the journal Social Text in 1996, to a sting operation by US reporter John Bohannon published in Science in 2013, in which he got more than 150 open-access journals to accept a deliberately flawed study for publication.
Labbé emphasizes that the nonsense computer science papers all appeared in subscription offerings. In his view, there is little evidence that open-access publishers — which charge fees to publish manuscripts — necessarily have less stringent peer review than subscription publishers.
Labbé adds that the nonsense papers were easy to detect using his tools, much like the plagiarism checkers that many publishers already employ. But because he could not automatically download all papers from the subscription databases, he cannot be sure that he has spotted every SCIgen-generated paper.
- Journal name: