Comments on: 50 shades of gray: A research story
http://www.metafilter.com/130438/50-shades-of-gray-A-research-story/
Comments on MetaFilter post 50 shades of gray: A research storyMon, 29 Jul 2013 11:08:52 -0800Mon, 29 Jul 2013 11:08:52 -0800en-ushttp://blogs.law.harvard.edu/tech/rss6050 shades of gray: A research story
http://www.metafilter.com/130438/50-shades-of-gray-A-research-story
<a href="http://pps.sagepub.com/content/7/6/615.full">Psychologists recount a valuable lesson about the fragility of statistical validity and the state of publishing.</a> "Two of the present authors, Matt Motyl and Brian A. Nosek, share interests in political ideology. We were inspired by the fast growing literature on embodiment that demonstrates surprising links between body and mind to investigate embodiment of political extremism. Participants from the political left, right, and center (N = 1,979) completed a perceptual judgment task in which words were presented in different shades of gray. Participants had to click along a gradient representing grays from near black to near white to select a shade that matched the shade of the word. We calculated accuracy: How close to the actual shade did participants get? The results were stunning. Moderates perceived the shades of gray more accurately than extremists on the left and right (p = .01). Our conclusion: Political extremists perceive the world in black and white figuratively and literally. Our design and follow-up analyses ruled out obvious alternative explanations such as time spent on task and a tendency to select extreme responses. Enthused about the result, we identified Psychological Science as our fallback journal after we toured the Science, Nature, and PNAS rejection mills. The ultimate publication, Motyl and Nosek (2012), served as one of Motyl's signature publications as he finished graduate school and entered the job market.
The story is all true, except for the last sentence; we did not publish the finding." <br /><br />"Before writing and submitting, we paused. Two recent articles have highlighted the possibility that research practices spuriously inflate the presence of positive results in the published literature (John, Loewenstein, & Prelec, 2012; Simmons, Nelson, & Simonsohn, 2011). Surely ours was not a case to worry about. We had hypothesized it; the effect was reliable. But we had been discussing reproducibility, and we had declared to our lab mates the importance of replication for increasing certainty of research results. We also had an unusual laboratory situation. For studies that could be run through a Web browser, data collection was very easy (Nosek et al., 2007). We could not justify skipping replication on the grounds of feasibility or resource constraints. Finally, the procedure had been created by someone else for another purpose, and we had not laid out our analysis strategy in advance. We could have made analysis decisions that increased the likelihood of obtaining results aligned with our hypothesis. These reasons made it difficult to avoid doing a replication. We conducted a direct replication while we prepared the manuscript. We ran 1,300 participants, giving us .995 power to detect an effect of the original effect size at α = .05. The effect vanished (p = .59)."
via <a href="http://andrewgelman.com/2013/07/28/50-shades-of-gray-a-research-story/">Andrew Gelman</a>. See also Frederick Guy's <a href="http://frederickguy.com/2013/07/29/spurious-significance-junk-science/">commentary</a>.post:www.metafilter.com,2013:site.130438Mon, 29 Jul 2013 10:56:03 -0800MisantropicPainforeststatisticspsychologyacademiapublishingBy: yoink
http://www.metafilter.com/130438/50-shades-of-gray-A-research-story#5112971
Very interesting. I wonder if it would be possible in these kinds of scientific realms to get an agreement to publish <em>regardless of the results of the experiment</em>? You would circulate a paper in a form that described only the structure of the experiment, the way the results would be analyzed, the questions the experiment would address, the reasons for addressing such a question etc. etc. etc. But no hint at all was made as to what the results actually were. I mean in a way the answer to that is "obviously not" and I can understand all the practical reasons why this wouldn't work, but it's interesting, at least, to think about what the world of publishing in experimental science might look like if that were the practice. Because in theory, at least, any experiment worth actually bothering conducting should be written up whether the results confirm or disprove the null hypothesis. If there were more emphasis on "is this a question worth asking and have you asked it well" and less on "did you find something superficially startling in your results" there'd be less of a bias towards finding and publishing these kinds of statistically marginal effects.comment:www.metafilter.com,2013:site.130438-5112971Mon, 29 Jul 2013 11:08:52 -0800yoinkBy: Potomac Avenue
http://www.metafilter.com/130438/50-shades-of-gray-A-research-story#5112977
Yoink the article addressed that possibility, in a way:
<small>
<em>Crowd sourcing replication efforts
Individual scientists and laboratories may be interested in conducting replications but not have sufficient resources available for them. It may be easier to conduct replications by crowd sourcing them with multiple contributors. For example, in 2011, the Open Science Collaboration began investigating the reproducibility of psychological science by identifying a target sample of studies from published articles from 2008 in three prominent journals: the Journal of Personality and Social Psychology, the Journal of Experimental Psychology: Learning, Memory, and Cognition, and Psychological Science (Carpenter, 2012; Yong, 2012). Individuals and teams selected a study from the eligible sample and followed a standardized protocol. In the aggregate, the results were intended to facilitate understanding of the reproducibility rate and factors that predict reproducibility. Further, as an open project, many collaborators could join and make small contributions that accumulate into a large-scale investigation. The same concept can be incorporated into replications of singular findings. Some important findings are difficult to replicate because of resource constraints. Feasibility could be enhanced by spreading the data collection effort across multiple laboratories.</em></small>comment:www.metafilter.com,2013:site.130438-5112977Mon, 29 Jul 2013 11:11:39 -0800Potomac AvenueBy: thelonius
http://www.metafilter.com/130438/50-shades-of-gray-A-research-story#5112984
I feel a lot better about having majored in philosophy nowcomment:www.metafilter.com,2013:site.130438-5112984Mon, 29 Jul 2013 11:14:15 -0800theloniusBy: eviemath
http://www.metafilter.com/130438/50-shades-of-gray-A-research-story#5112985
I was thinking something similar: if high impact factor journals began including sections (or even better, just including, not in a segregated way) for negative results and results of replicated studies, this could help change the incentives for focusing on research with novel results. Likewise, there needs to be a shift in criteria for PhD granting, hiring, tenure, and promotion decisions to give some value to time spent working to replicate the results of others. For example, at my university, we need to have something to show for ourselves both in terms of teaching and research (and service, to a lesser degree) - we can't get tenure with only one strength. So within the research category, experimentalists should need to show that they have been productive both in original research and in the testing and replication work that is necessary for science to function well.comment:www.metafilter.com,2013:site.130438-5112985Mon, 29 Jul 2013 11:18:15 -0800eviemathBy: yoink
http://www.metafilter.com/130438/50-shades-of-gray-A-research-story#5113090
Potomac Avenue: that suggestion doesn't quite address the problem, though, which is that there is currently an active incentive for researchers to play the "statistical outlier on high buzz-factor research" lottery in order to get publication in a prestige journal. Sure, there are all kinds of ways to make replication experiments more affordable and more common, but if they're essentially anonymous "fact checking" operations they won't really feature meaningfully in promotion and tenure decisions. In fact, in a weird way they might almost reduce whatever scruples a research team might have about rushing into print with a small-sample high-wow-factor result ("Yeah, sure this might be BS, but the replication crowd will eventually check it out and by then I'll have already been hired/promoted/counted it on my resume.") There needs to be some way to rebalance incentives at the front end and not just try to catch mistakes at the back end.comment:www.metafilter.com,2013:site.130438-5113090Mon, 29 Jul 2013 11:59:00 -0800yoinkBy: Pyrogenesis
http://www.metafilter.com/130438/50-shades-of-gray-A-research-story#5113150
This is pretty cool. The experiment's relation to actual stuff being done under the rubric of "embodiment" is tenuous at best, but that's beside the point anyway. But I do love papers that take as the object of their experiment the experimental situation itself. It was, in fact, one of the rather overlooked facets of the "science wars" of the 90s: one of the major claims from the "relativist" science studies people was that science in the making looks rather different from settled science. Ever since then I've been fascinated about studying science itself by scientific means.comment:www.metafilter.com,2013:site.130438-5113150Mon, 29 Jul 2013 12:24:59 -0800PyrogenesisBy: Omnomnom
http://www.metafilter.com/130438/50-shades-of-gray-A-research-story#5113242
So...why did the effect vanish?comment:www.metafilter.com,2013:site.130438-5113242Mon, 29 Jul 2013 13:06:50 -0800OmnomnomBy: yoink
http://www.metafilter.com/130438/50-shades-of-gray-A-research-story#5113244
<em>So...why did the effect vanish?</em>
Because it was a statistical fluke.comment:www.metafilter.com,2013:site.130438-5113244Mon, 29 Jul 2013 13:08:12 -0800yoinkBy: srboisvert
http://www.metafilter.com/130438/50-shades-of-gray-A-research-story#5113310
The problem with the 'replication crowd' is that this can also be done in bad faith. It is very easy to fail to replicate research. You can do it just by being lazy and sloppy. A failed replication can also be a statistical fluke.
Further, research materials are often times very time consuming and labor intensive to produce. Then some replicating lab demands them before your research program has finished using them. Do you share them out right away?
That said there is a pretty active whisper network in social psychology about findings that only work in certain labs or research 'families'.comment:www.metafilter.com,2013:site.130438-5113310Mon, 29 Jul 2013 13:37:38 -0800srboisvertBy: no regrets, coyote
http://www.metafilter.com/130438/50-shades-of-gray-A-research-story#5113373
<i>The problem with the 'replication crowd' is that this can also be done in bad faith. It is very easy to fail to replicate research. You can do it just by being lazy and sloppy. A failed replication can also be a statistical fluke.</i>
But that's kind of how science works. Any study can be lazy or sloppy or an outlier. Taking a failed replication of an experiment as the full story is as silly as taking the original study as the full story. Individual studies only have meaning in the context of the greater body of research on a topic.
<i>I wonder if it would be possible in these kinds of scientific realms to get an agreement to publish regardless of the results of the experiment?</i>
I totally agree with this. In my previous field (experimental particle physics) we were very good about this. Negative results of a well designed experiment were considered just as interesting as positive results. Now I work in a medical field and there is much more pressure to make a novel claim, and I see papers that are pretty clearly the results of statistical fishing expeditions, Bonferroni be damned.comment:www.metafilter.com,2013:site.130438-5113373Mon, 29 Jul 2013 14:06:57 -0800no regrets, coyoteBy: Omnomnom
http://www.metafilter.com/130438/50-shades-of-gray-A-research-story#5113469
I'm reminded of a couple of posts here about scientists and university professors faking their results. And I'm really glad these guys didn't. The temptation must be great.comment:www.metafilter.com,2013:site.130438-5113469Mon, 29 Jul 2013 14:52:25 -0800OmnomnomBy: Halogenhat
http://www.metafilter.com/130438/50-shades-of-gray-A-research-story#5113782
A researchers wildest dream come true: results that are actually shocking.comment:www.metafilter.com,2013:site.130438-5113782Mon, 29 Jul 2013 18:38:27 -0800HalogenhatBy: chortly
http://www.metafilter.com/130438/50-shades-of-gray-A-research-story#5113847
It seems crazy to me that a p-value threshold -- 0.05 -- which was selected in the early days of statistical science when almost no one was doing it, is still the standard threshold for "significant" results. Sure, replication is important, but simply reducing the significance threshold to reflect the increased number of researchers -- say, by a factor of 10 or 100 -- would go a long way to insulate the social and medical sciences from false positives. The 5-sigma of physics might be a bit much, but even 3 would presumably rule out the vast majority of junk.comment:www.metafilter.com,2013:site.130438-5113847Mon, 29 Jul 2013 19:40:49 -0800chortlyBy: Canageek
http://www.metafilter.com/130438/50-shades-of-gray-A-research-story#5113970
The one problem I see, is if they are using a web browser then people will be influenced by the brightness, contrast, calibration of the monitor and their surroundings. It sounds like they did the replication online, when they would have no control over these conditions, which could lead to a false negative. I mean, when was the last time YOU calibrated your monitors colour? I haven't, I don't have the (expensive) equipment for it. My brother has, as he is a professional photographer, but only once, when he could borrow the gear from a friend of his.comment:www.metafilter.com,2013:site.130438-5113970Mon, 29 Jul 2013 22:07:43 -0800CanageekBy: Blasdelb
http://www.metafilter.com/130438/50-shades-of-gray-A-research-story#5114011
"<em>I wonder if it would be possible in these kinds of scientific realms to get an agreement to publish regardless of the results of the experiment?</em>"
This could only really conceivably work for a very small fraction of scientific research. Where really, contrary to popular perception, the lions share of what we do is not coming up with solid answers to obvious questions exactly but instead coming up with clever questions that should produce useful answers if framed properly. Indeed, for the vast majority of us who have the potential of getting negative results, those negative results simply mean we've asked a stupid question with an answer that isn't at all interesting, which is what happened here. Negative results like "Does Vitamin NewFadThing cure X? Nope, not even close" should get published because that is still a useful answer even if it isn't an interesting one, but results like this have no business crowding out useful and interesting research from the literature.
Also, so long as grant/hiring/tenure committees are beholden to people who are only capable of judging research based on numbers crunched from a CV, rewarding stupid questions that produce results that are both useless and uninteresting the same way as smart questions will continue to be a bad idea that could only be toxic to the scientific community. That said, these guys are awesome and, while the strength of character they clearly have at least should be a baseline expectation, their enthusiasm about it is inspiring.comment:www.metafilter.com,2013:site.130438-5114011Tue, 30 Jul 2013 00:02:05 -0800BlasdelbBy: Cannon Fodder
http://www.metafilter.com/130438/50-shades-of-gray-A-research-story#5114037
<em>Sure, replication is important, but simply reducing the significance threshold to reflect the increased number of researchers -- say, by a factor of 10 or 100 -- would go a long way to insulate the social and medical sciences from false positives. The 5-sigma of physics might be a bit much, but even 3 would presumably rule out the vast majority of junk.
</em>
The problem with this is that this reduces the power of the experiment, of course (the ability to detect a difference if it is actually present). Most experiments are rather underpowered. This actually does matter a lot, because, in drug trials for instance, we're not just testing for efficacy but for toxicity as well. The best way to boost power is to increase the number of participants, of course.
I do agree that p=0.05 is a problem, because it means lots of situations where there is no difference will be recorded as significant.
Of coruse p values are a bit of a red herring. In practice the null hypothesis is almost never true: your drug might be marginally different from placebo. What we care about is the effect size.comment:www.metafilter.com,2013:site.130438-5114037Tue, 30 Jul 2013 01:33:28 -0800Cannon FodderBy: metaBugs
http://www.metafilter.com/130438/50-shades-of-gray-A-research-story#5114156
Much as I like the idea of journals agreeing to publish before seeing the results, I really can't see for-profit journals going for it. They need to stay full of exciting data on sexy topics, otherwise no-one will want to pay the subscriptions. I could imagine the various titles having a sort of "overflow" journal, for pre-accepted papers which turned out to have results too dull for the flagship, but this would be costly to run (all the same admin, editing, proofing and hosting costs) and rarely cited.
Part of the problem is addressed by things like the <a href="http://www.jnrbm.com/">Journal of Negative Results</a>, but I'd worry about trying to impress a hiring committee with that on my CV. I like the approach taken by a few of the free/open journals, who do claim to screen only on basis of scientific merit. PLOS One, for example, <a href="http://www.plosone.org/static/publication#reporting">explicitly state</a> that they welcome papers showing negative data. They also have a <a href="http://www.plos.org/plos-one-launches-reproducibility-initiative/">reproducibility initative</a>, encouraging labs to submit their protocols etc for validation by (blinded?) collaborators, although I don't know how well this is going.
But it's true that this doesn't solve the problem that we tend to reward researchers for being lucky, at least to a degree. In Iain Banks' <i>Matter</i> (very minor spoiler warning:), one of the characters tells the story of the "hundredth idiot": <blockquote>One hundred idiots make idiotic plans and carry them out. All but one justly fail. The hundredth idiot, whose plan succeeded through pure luck, is immediately convinced he's a genius</blockquote>
I'm not asserting that scientists are necessarily idiots, but I think we can all acknowledge that there's an element of this in research. We're always striking out into the unknown, and choosing research paths based on incomplete information, or educated guesses, or the equipment and reagents available to us. Most of those explorations yield uninspiring results, but every now and again, one of those roughly chosen paths yields gold. And it's the person on the lucky path whose career gets made.
It's not all luck, by any means: fortune favours the prepared mind, and all that. But it's a significant factor, and I agree that working out how to recognise and reward intelligent, industrious work that hasn't (yet) been lucky should be a big priority. I just don't know how to do it.
<b>Pyrogenesis</b> - <em>I do love papers that take as the object of their experiment the experimental situation itself. ... Ever since then I've been fascinated about studying science itself by scientific means.</em>
That sounds really interesting. Do you have any recommendations for a good place to start reading?comment:www.metafilter.com,2013:site.130438-5114156Tue, 30 Jul 2013 06:06:06 -0800metaBugsBy: hydrobatidae
http://www.metafilter.com/130438/50-shades-of-gray-A-research-story#5114533
The other thing about adjusting the alpha value is that for a lot of sciences, increasing the power through increasing the sample size is next to impossible.
I'm in ecology and our sample sizes are constrained by a lot of things that medicine and physics aren't constrained by - funding (strangely enough, no one is giving us a million dollars to run much of anything), time (field seasons are only so long and if you can only afford 4 people to help, you can only get so much data; then next year is another subset of data). Occasionally I'll argue that the alpha value in ecology should be lowered to 0.1 because that's much more realistic for a system over which you have very little control.
An example from my own work - I looked at diet in an Arctic bird. You had to net them off their nests which required repelling down anchored in very unstable soil. We were pretty happy because we got 12 birds one summer (because that's all we could reach). We could have got the partners too for 24 birds but then you're affected by pseudo-replication. This project was repeated for a couple years and we actually got results at 0.08 (a 'trend'). Our publishing was limited by this 'non-significance' even though for the power level we were working with, we were getting some really good suggestions of what was going on. If we had had better p-values (randomly), we could have got into some really fancy journals.
I now appreciate that my old supervisor inserts 'randomly selected an alpha value = 0.05' in his papers.comment:www.metafilter.com,2013:site.130438-5114533Tue, 30 Jul 2013 08:33:31 -0800hydrobatidaeBy: Mental Wimp
http://www.metafilter.com/130438/50-shades-of-gray-A-research-story#5115356
<em>via Andrew Gelman. </em>
Who, by the by, rocks. And, note to y'all, 2013 is the <a href="http://www.huffingtonpost.com/marie-davidian/2013-the-international-ye_b_2670704.html">International Year of Statistics</a>. Statistics, bitches!comment:www.metafilter.com,2013:site.130438-5115356Tue, 30 Jul 2013 13:09:27 -0800Mental Wimp
"Yes. Something that interested us yesterday when we saw it." "Where is she?" His lodgings were situated at the lower end of the town. The accommodation consisted[Pg 64] of a small bedroom, which he shared with a fellow clerk, and a place at table with the other inmates of the house. The street was very dirty, and Mrs. Flack's house alone presented some sign of decency and respectability. It was a two-storied red brick cottage. There was no front garden, and you entered directly into a living room through a door, upon which a brass plate was fixed that bore the following announcement:¡ª The woman by her side was slowly recovering herself. A minute later and she was her cold calm self again. As a rule, ornament should never be carried further than graceful proportions; the arrangement of framing should follow as nearly as possible the lines of strain. Extraneous decoration, such as detached filagree work of iron, or painting in colours, is [159] so repulsive to the taste of the true engineer and mechanic that it is unnecessary to speak against it. Dear Daddy, Schopenhauer for tomorrow. The professor doesn't seem to realize Down the middle of the Ganges a white bundle is being borne, and on it a crow pecking the body of a child wrapped in its winding-sheet. 53 The attention of the public was now again drawn to those unnatural feuds which disturbed the Royal Family. The exhibition of domestic discord and hatred in the House of Hanover had, from its first ascension of the throne, been most odious and revolting. The quarrels of the king and his son, like those of the first two Georges, had begun in Hanover, and had been imported along with them only to assume greater malignancy in foreign and richer soil. The Prince of Wales, whilst still in Germany, had formed a strong attachment to the Princess Royal of Prussia. George forbade the connection. The prince was instantly summoned to England, where he duly arrived in 1728. "But they've been arrested without due process of law. They've been arrested in violation of the Constitution and laws of the State of Indiana, which provide¡ª" "I know of Marvor and will take you to him. It is not far to where he stays." Reuben did not go to the Fair that autumn¡ªthere being no reason why he should and several why he shouldn't. He went instead to see Richard, who was down for a week's rest after a tiring case. Reuben thought a dignified aloofness the best attitude to maintain towards his son¡ªthere was no need for them to be on bad terms, but he did not want anyone to imagine that he approved of Richard or thought his success worth while. Richard, for his part, felt kindly disposed towards his father, and a little sorry for him in his isolation. He invited him to dinner once or twice, and, realising his picturesqueness, was not ashamed to show him to his friends. Stephen Holgrave ascended the marble steps, and proceeded on till he stood at the baron's feet. He then unclasped the belt of his waist, and having his head uncovered, knelt down, and holding up both his hands. De Boteler took them within his own, and the yeoman said in a loud, distinct voice¡ª HoME²¨¶àÒ°´²Ï·ÊÓÆµ ѸÀ×ÏÂÔØ ѸÀ×ÏÂÔØ
ENTER NUMBET 0016www.ltjrhy.org.cn jssjnj.com.cn jsjoxx.com.cn www.hyboao.com.cn gbngqw.com.cn www.mscpw.com.cn www.njfi.com.cn www.qzpock.com.cn wztnre.com.cn xfde.com.cn