Comments on: Metrics for Community Toxicity

Metrics for Community Toxicity

CrystalDave — Thu, 23 Feb 2017 11:22:47 -0800

From Google, Perspective API for scoring comments Perspective is an API that makes it easier to host better conversations. The API uses machine learning models to score the perceived impact a comment might have on a conversation. [...] We'll be releasing more machine learning models later in the year, but our first model identifies whether a comment could be perceived as "toxic" to a discussion.

Now Anyone Can Deploy Google's Troll-Fighting AI (Wired) Type "you are not a nice person" into its text field, and Perspective will tell you it has an 8 percent similarity to phrases people consider "toxic." Write "you are a nasty woman," by contrast, and Perspective will rate it 92 percent toxic, and "you are a bad hombre" gets a 78 percent rating.

By: Faintdreams

Faintdreams — Thu, 23 Feb 2017 11:32:13 -0800

Is this the same algorithm that Twitter is using to dole out Time Out sessions to Trans people challenging Nazis on the platform.? People with bad intentions will find a way to game this - I suspect it will only make communication harder for non Trolls. Currently nothing is better than actual Human moderation and curation of forums

By: Faint of Butt

Faint of Butt — Thu, 23 Feb 2017 11:32:24 -0800

"You look nice today.": 2% "u luk nice 2day": 39% It checks out. Also of note: "I voted for Hillary Clinton.": 3% "I voted for Donald Trump.": 9%

By: mushhushshu

mushhushshu — Thu, 23 Feb 2017 11:36:17 -0800

'My vacuum sucks' is 93% toxic, apparently.

By: tobascodagama

tobascodagama — Thu, 23 Feb 2017 11:38:09 -0800

Ahem.

By: Greg_Ace

Greg_Ace — Thu, 23 Feb 2017 11:40:57 -0800

/Cortex sets up a Google Home device on his desk, connects to this API /Fucks off for the day

By: Greg_Ace

Greg_Ace — Thu, 23 Feb 2017 11:41:20 -0800

/the rest of Metafilter immediately starts trying to break the device

By: sfenders

sfenders — Thu, 23 Feb 2017 11:43:33 -0800

I was going to say it's interesting that they seem to think an assortment of unrelated comments on the same general topic constitutes a "discussion". It's hard to imagine the kind of automated filtering and re-ordering of comments enabled by this failing to destroy any possibility of actual conversation or debate going on wherever it's used. But instead let me just say "screw you, google. take your so-called algorithms and fuck off." That gets a much higher score.

By: jetsetsc

jetsetsc — Thu, 23 Feb 2017 11:43:56 -0800

"Black Male: = 49% toxic "White male" = 37% "Black female" = 51% "White female" = 38%

By: TheWhiteSkull

TheWhiteSkull — Thu, 23 Feb 2017 11:45:12 -0800

*types* Well, yes I'm sure that machine learning is at the point where it can very easily distinguish between the various nuances of language usage. I don't imagine this will be susceptible to false positives or abuse in any way. 7% toxic. Still needs a bit of work on the sarcasm detector.

By: Jpfed

Jpfed — Thu, 23 Feb 2017 11:46:16 -0800

I could see this being a useful tool to direct moderator attention, similar to how flags are used here. Like many kinds of automation, it's better to think of it as a force-multiplier for humans than as a total replacement for humans.

By: verb

verb — Thu, 23 Feb 2017 11:48:58 -0800

Oh, this is awesome. Now shitposters will have a set of AI tools helping them figure out the cruelest things to say at any given moment — with science.

By: Western Infidels

Western Infidels — Thu, 23 Feb 2017 11:52:57 -0800

Agree that dedicated assholes will game this, but the Facebooks and Googles of the world have learned something that so many smaller web companies can't seem to wrap their head around: convenience matters a lot. You wouldn't have to offer much discouragement to get many jerks to find something else to do. It's possible a system descended from this could do a lot of good even if it isn't as good as human moderation. I don't have high hopes, but human moderation is unlikely to ever scale up in the way we need, so I'm glad somebody is devoting real resources to this effort.

By: skymt

skymt — Thu, 23 Feb 2017 11:53:20 -0800

"Actually, it's about ethics in game journalism." (4%) "Apparently creating sophisticated machine-learning systems is cheaper than hiring moderators that understand context." (4%)

By: Kabanos

Kabanos — Thu, 23 Feb 2017 11:54:15 -0800

"Bless your heart": 0% toxic

By: soren_lorensen

soren_lorensen — Thu, 23 Feb 2017 11:54:21 -0800

I just pasted in half of the current election thread. 36% similar to comments people said were "toxic"

By: bonehead

bonehead — Thu, 23 Feb 2017 11:55:49 -0800

methylmercury is 34% toxic dihydrogen monoxide, 19% toxic oxygen is only 4% Iocaine powder is 12% (must have built up an immunity) unicorn horns are 4% toxic. Needs more training. Also better defined toxilogical endpoints.

By: AFABulous

AFABulous — Thu, 23 Feb 2017 11:55:55 -0800

conservatives are a threat to society: 44% gays are a threat to society: 85%

By: cortex

cortex — Thu, 23 Feb 2017 11:56:05 -0800

So: 1. This is really interesting! The concepts aren't new but applying this kind of data set comparison to "toxicity" sentiment analysis etc. makes sense as a building block in the context of trying to build out large-scale moderation toolsets. 2. Really really clearly this is not a moderation solution of any sort by itself, nor would this-with-a-few-tweaks comprise one. There's no way it'd do better than really, really porous and error-prone red-flagging of stuff. Anybody criticizing it for that failure is probably over-reading the situation, but then anyone lauding it as more than a really rough potential component in a more nuanced and human-intensive moderation rubric is selling something, so those will probably even out on the train ride to East Hottakesville. And I want to emphasize point 1, because I think this really is interesting as something to incorporate into early warning or triage aspects of large-scale moderation projects. Like, a lot of preemptive MeFi moderation work is based on porous red-flag stuff: we don't generally shut down a new account based on things that make us go hrmmm, but there are things about an new account that will make us pay more attention. Likewise a sketchy comment or two isn't usually a ban but it's a good reason to take a closer look. And sometimes the worry is justified and we take action later; sometimes it turns out that the initial weirdness was just weirdness/idiosyncracy/coincidence and the new user's totally fine. Any approach that collapsed that evaluation process we do down to a flat up/down decision based on numeric thresholds would be hugely problematic on both the false positives and the false negatives. But those processes as just warnings and nudges are very useful. So a thoughtful incorporation of something like this as a front-line tools for directing limited attention more closely seems like it could have legs. Not a basis for taking an action, but a basis for considering the possibility of action. More narrowly, the idea of a system like this helping to identify stuff that is toxic on the DL—dogwhistles and microaggressions and such that manage to be awful while looking nondescript—could be pretty useful in large-scale contexts especially. Some of which may come down to using more focused and domain-specific training data.

By: Kabanos

Kabanos — Thu, 23 Feb 2017 11:57:51 -0800

"oil spill" = 18% toxic

By: Faint of Butt

Faint of Butt — Thu, 23 Feb 2017 11:59:32 -0800

Still needs a bit of work on the sarcasm detector. Oh, a sarcasm detector. That's a real useful invention.

By: saulgoodman

saulgoodman — Thu, 23 Feb 2017 12:00:30 -0800

"All lives matter." : 3%

By: TypographicalError

TypographicalError — Thu, 23 Feb 2017 12:01:59 -0800

cortex: "More narrowly, the idea of a system like this helping to identify stuff that is toxic on the DL—dogwhistles and microaggressions and such that manage to be awful while looking nondescript—could be pretty useful in large-scale contexts especially. Some of which may come down to using more focused and domain-specific training data." I get what you're saying and thanks for providing context, but this paragraph may as well have said "And in the future magical fairies will moderate all our comment threads." The system you're describing does not and will never exist in our lifetimes, as it would require the ability of a computer to assess language in a complete cultural context that would amount to human or greater intelligence. The entirety of this system is currently on display, and I would not think it will get much better at classification. The suggestion that this may reduce toxicity only by requiring someone to click a textbox that says "Yes, I know I'm posting something rated toxic" is super valid, but I don't expect this to do much more.

By: saulgoodman

saulgoodman — Thu, 23 Feb 2017 12:03:21 -0800

"I moved on her like a bitch, but I couldn't get there. And she was married." "I did try and fuck her. She was married." "Just kiss. I don't even wait. And when you're a star, they let you do it. You can do anything." "Grab them by the pussy. You can do anything." 90% toxic. Yep, it's at least a useful first approximation.

By: Kabanos

Kabanos — Thu, 23 Feb 2017 12:05:45 -0800

"pull your head out of your ass" = 94% "please withdraw your head from your posterior thank you" = 43% So you just have to be polite.

By: Quindar Beep

Quindar Beep — Thu, 23 Feb 2017 12:05:52 -0800

"Make America Great Again" -- 4% toxic "We hold these truths to be self-evident, that all men are created equal, that they are endowed by their Creator with certain unalienable Rights, that among these are Life, Liberty and the pursuit of Happiness." -- 6% toxic This is fine. (1% toxic)

By: jason_steakums

jason_steakums — Thu, 23 Feb 2017 12:07:07 -0800

"With a taste of your lips I'm on a ride You're toxic I'm slipping under With a taste of poison paradise I'm addicted to you Don't you know that you're toxic And I love what you do Don't you know that you're toxic" 63% toxic. Back to the drawing board with you!

By: cortex

cortex — Thu, 23 Feb 2017 12:08:19 -0800

I get what you're saying and thanks for providing context, but this paragraph may as well have said "And in the future magical fairies will moderate all our comment threads." The system you're describing does not and will never exist in our lifetimes, as it would require the ability of a computer to assess language in a complete cultural context that would amount to human or greater intelligence. To be clear, I'm not suggesting that this system or some future version of this system will be able to do that work in toto; I'm saying that I see a place where identifying a subset of edge-cases is something a system like this could help with, with domain-specific training. Take the "bless your heart" example, as a non-vile stand-in because I don't really want to dig into actual live-fire examples of subtle hatespeech in here out of the blue: that those words are all nice words doesn't make the phrase nice, and traditional clumsy wordlist filters are useless for that. But systemically identifying—from a machine analysis of knowledgeable user input—the fact that the sum of "bless your heart" has a much more specific and potentially negative/toxic impact than the words in it is the sort of thing I can see a system like this doing well. At which point you have the computer doing what it does well, digging through a bunch of data and identifying trends, and then handing that off to a human who might not have really noticed it otherwise. Again, it's a narrow subset of functionality in any case and I don't mean to imply that it's trivial either. But it's a direction from which "how can we leverage the strength of computer rather than human intelligence" might have some teeth.

By: saulgoodman

saulgoodman — Thu, 23 Feb 2017 12:10:51 -0800

That's a very sensible and sane approach to using technology. It'll never happen in practice!

By: hleehowon

hleehowon — Thu, 23 Feb 2017 12:12:10 -0800

In Metafilter practice in my halfassed observation F1 score on the proper mod decision is over 0.9, prolly about 0.95-0.97 from what I've seen. I've seen a lot of automatic moderation systems been proposed with F1 of a solid about 0.4 to 0.5 which makes them remarkably useless. Text classification however has come a long way in the last 20 months: especially with data and model improvements and representation improvements, they should get to 0.7 pretty easily although of course it seems they didn't publish anything like this (their published LM's improved over the SOTA in 2005 by about that much) There is already a lot (more than 100x) more text data available to every rando at Google ML than human beings read in a lifetime and probably a solid order of magnitude or so words more than a human understands in a lifetime. So the blocker's probably the model and the outputs to train on, which seems to be a bunch of Likerts.

By: Quindar Beep

Quindar Beep — Thu, 23 Feb 2017 12:14:40 -0800

*patiently waits for someone to run "This is Just to Say" through it*

By: odinsdream

odinsdream — Thu, 23 Feb 2017 12:15:17 -0800

From the makers of YouTube Comments...

By: hleehowon

hleehowon — Thu, 23 Feb 2017 12:16:10 -0800

(you could argue MeFi holds a different place in the precision recall tradeoff than other forums, as many other folks have of course argued using nonstatistical terms, vociferously and loudly on metattalk)

By: hleehowon

hleehowon — Thu, 23 Feb 2017 12:19:18 -0800

cortex: Triage is very different because there's a damned good input-output pair semantics that you can use: not "is this necessarily toxic" but "is this similar to other shit that we did a moderation action on?", so it's much less ill-defined and getting output training data is way easier ("1 if this got modded 0 if it didn't, kill everything after a certain date that we might be still modding").

By: Sangermaine

Sangermaine — Thu, 23 Feb 2017 12:20:13 -0800

"Black Male: = 49% toxic "White male" = 37% "Black female" = 51% "White female" = 38% jetsetsc It might help to remember that it's not saying the comments are toxic, it's saying that the comments are "X% similar to comments people said were "toxic"". I imagine that, unfortunately, many comments that it uses for comparison contain racism and so terms referring to race, especially ones referring to black people, get a higher score for similarity to toxic comments.

By: saulgoodman

saulgoodman — Thu, 23 Feb 2017 12:22:54 -0800

"The Holocaust never happened." : 21% toxic So it's five or six points worse to say that mental illness is hard to deal with than to deny the holocaust, which is a biasing effect the developers might want to look more closely at. What might account for that result would be an interesting question.

By: Samizdata

Samizdata — Thu, 23 Feb 2017 12:22:59 -0800

The damn problem with Google making these [Comment determined to be excessively toxic]

By: Sing Or Swim

Sing Or Swim — Thu, 23 Feb 2017 12:24:40 -0800

Fascism is bad. 54% similarity to toxic comments Fascists probably aren't nice people. 50% A fascist would make a terrible president. 66% We're having lovely weather. 1% I have seen the future of the internet and it's... very nice.

By: tobascodagama

tobascodagama — Thu, 23 Feb 2017 12:30:02 -0800

"EngSoc is double-plus ungood." 14% similar to toxic comments. Party-approved dissent!

By: sfenders

sfenders — Thu, 23 Feb 2017 12:33:21 -0800

"My thoughts are that people should stop being stupid and ignorant. Climate change is scientifically proven. It isn't a debate." -- 59% "Crooked science. There is no consensus." -- 25% Looking closer, I still don't trust it much. I could see it being marginally useful to more quickly flag stuff for the attention of moderators on low-volume sites where there might not be enough people around ready to do so. Influencing moderator action when the decisions involve difficult "edge cases" seems like the opposite of what it would be good at. I don't know what kind of biases and opinions are baked into its model. Who exactly generated the data that trained it? A self-selected sample of survey takers? You'd need quite a lot of them, no? I wouldn't expect metafilter moderation history to be anywhere near enough.

By: Jpfed

Jpfed — Thu, 23 Feb 2017 12:36:47 -0800

I see this as a tool for, say, a huge subreddit to quickly bring the bottom-of-the-barrel comments to mod attention.

By: Jpfed

Jpfed — Thu, 23 Feb 2017 12:41:55 -0800

"My thoughts are that people should stop being stupid and ignorant. Climate change is scientifically proven. It isn't a debate." -- 59% "Crooked science. There is no consensus." -- 25% The former couches a correct statement in insults; the latter makes an incorrect claim. Are you hoping that the system will be able to determine the truth of a comment? I don't think that's what they were shooting for.

By: idiopath

idiopath — Thu, 23 Feb 2017 12:42:32 -0800

Baby, can't you see I'm calling? A guy like you should wear a warning It's dangerous, I'm falling There's no escape, I can't wait I need a hit Baby, give me it You're dangerous, I'm loving it

scored 33%, so not totally wrong I guess

By: JimInLoganSquare

JimInLoganSquare — Thu, 23 Feb 2017 12:51:32 -0800

To a large extent the analysis seems to focus on finding insults, caconyms and curse words; those words skew the toxicity score upward a lot. For example, "My thoughts are that people should stop being stupid and ignorant. Climate change is scientifically proven. It isn't a debate." -- 59% But the first sentence, "My thoughts are that people should stop being stupid and ignorant," alone gets a score of 78% toxic. By contrast, the second and thirds sentences analyzed separate from the first , "Climate change is scientifically proven. It isn't a debate" are only 3% toxic. And if you just cut out the words "stupid" and "ignorant," the first sentence drops to just 15% toxic from 78%. (And if you drop the more polite "ignorant" but leave in the more crass word "stupid," the score jumps to 85% toxic.)

By: sfenders

sfenders — Thu, 23 Feb 2017 12:52:33 -0800

Are "stupid" and "ignorant" less acceptable insults than "crooked", then? I don't know. "Time to take a stand against sexist beer marketing." -- 54% toxic "left wing wimps" -- 38%

By: phack

phack — Thu, 23 Feb 2017 12:54:54 -0800

"Actually, it's about ethics in game journalism" - 14% toxic "Our culture is under attack. Separate countries for separate peoples" - 23% If instead I provide as input "White culture is under attack", the classifier jumps up to 54%. I'm skeptical of the ability of this classifier to meaningfully distinguish contextually important markers of toxicity. It looks like most of it's predictive power comes from phrase identification. IDK, I've never had to moderate a forum (thx cortex!), but this seems like instead of actually automating the moderating job, all it does is make sure that trolls dress their rhetoric up.

By: Reasonably Everything Happens

Reasonably Everything Happens — Thu, 23 Feb 2017 12:57:18 -0800

twenty dollars, same as in town 5%

By: sfenders

sfenders — Thu, 23 Feb 2017 12:57:41 -0800

"Your a socialist snowflake!" -- 23% "If you just cut out the words "stupid" and "ignorant," the first sentence drops to just 15% toxic from 78%." -- 75%

By: fings

fings — Thu, 23 Feb 2017 12:58:17 -0800

Metafilter community weblog : 27% toxic.

By: JimInLoganSquare

JimInLoganSquare — Thu, 23 Feb 2017 13:00:35 -0800

"If you just cut out the words "" and "," the first sentence drops to just 15% toxic from 78%." -- 17%

By: puddledork

puddledork — Thu, 23 Feb 2017 13:08:56 -0800

"please withdraw your head from your posterior thank you" = 43% So you just have to be polite. On atheism boards, with a human referee rather than an algorithm, the acceptable polite way to insult someone quickly became "This concept isn't difficult" and "Why can't you understand?"

By: polymodus

polymodus — Thu, 23 Feb 2017 13:09:03 -0800

This is great and neat but as usual, what's predictably symptomatic is the way they present/market this ignores standard, obvious concerns from both computer science and the social sciences. The theoretical computer/cognitive science issue is what is the relationship between low-context (at the granularity of independent sentences) categorization and the "reality" of a toxic utterance which depends on semantics and social context? What does the algorithm suggest about this and the problem of validation/validity? Second, the social science problem would be along the very standard lines of: Technology is not politically neutral and engineers and scientists are ethically responsible in attending to that, there needs to be more awareness, discussion, research, transparency into the ramifications of this but that's not going to happen much/soon, etc., due to economic incentives, political structures, etc. Both these are important issues but as they say, for every PhD there is an equal and opposite one.

By: sideshow

sideshow — Thu, 23 Feb 2017 13:15:06 -0800

"Apparently creating sophisticated machine-learning systems is cheaper than hiring moderators that understand context." (4%) With 8,000 tweets sent per second, yeah, it's a lot cheaper than hiring 10 thousand moderators.

By: oneswellfoop

oneswellfoop — Thu, 23 Feb 2017 13:31:06 -0800

"White culture is under attack" gets a 54% but "Black culture is under attack" gets 59%. This is how you learn where the innate bias is (which is only 14% toxic) but if Google knew we were talking about their tool... (2% toxic) okay, let's put this to the test: Trump 16% Obama 8% President Trump 7% President Obama 4% Donald Trump 22% Barack Obama 16% Obamacare 16% Donald J. Trump 15% Trumpy 34% Trumpster 34% Dumpster 51% Dumpster fire 34% Worst President Ever 56% Fascist 74% (misspelled at facsist 34%) Nazi 61% (less than fascist?!?) Neo-Nazi 52% Neo-Liberal 10% Neoliberal 5% Neoconservative 8% Alt-Right 2% AltRight 34% from puddledork's comment: This concept isn't difficult 3% Why can't you understand? 4% Why can't you learn? 12% dummy 31% dum-dum 53% idiot 95%!! ignorant 60% (if mispelled ignrant, only 34%) moron 78% moran 36% "What a maroon" (Bugs Bunny quote) 7% spastic 13% (see, critics of Weird Al, it's not a bad word! maybe next I'll paste the lyrics to Weird Al songs...) FAIL 21% retarded 69% retard 66% (I'd think this SHOULD be more toxic) and finally (for now) toxic 59% poison 56% poisonous 43%

By: nubs

nubs — Thu, 23 Feb 2017 13:36:09 -0800

*patiently waits for someone to run "This is Just to Say" through it* 4%. Because I am that person, apparently.

By: cortex

cortex — Thu, 23 Feb 2017 13:36:14 -0800

IDK, I've never had to moderate a forum (thx cortex!), but this seems like instead of actually automating the moderating job, all it does is make sure that trolls dress their rhetoric up. Without speculating on the actual efficacy of this or any other system, I will say this: any system that "just" creates some disincentive or speedbump to undesirable behavior is likely to bear at least some fruit. Forcing trolls to dress up their language won't prevent them from doing so and proceeding, but it'll stop some of them because it's suddenly not worth the effort. People shouldn't invest in the ideas of magic bullets, for sure, but don't discount the value of systemic friction in cutting out the least effortful jerks.

By: nubs

nubs — Thu, 23 Feb 2017 13:38:03 -0800

And the "tears in rain" bit from Blade Runner scores 20%.

By: saulgoodman

saulgoodman — Thu, 23 Feb 2017 13:46:34 -0800

"Humanistic ideals of universal human dignity are under attack." Yields 12% toxicity. If you swap out the word "culture" for "ideals" the score drops by a point. If you swap out "values" for the term, the score goes up again. I think the algorithm is making the same mistake I see people making irl of ignoring context and imputing their own personal understanding of a word's connotations as universal.

By: ardgedee

ardgedee — Thu, 23 Feb 2017 13:49:17 -0800

"The Royalty!" 2% "The Nobility!" 1% "The Aristocrats!" 5%

By: Leon

Leon — Thu, 23 Feb 2017 13:49:40 -0800

oneswellfoop: you're passing strings, not comments. They're contextless.

By: oneswellfoop

oneswellfoop — Thu, 23 Feb 2017 13:50:52 -0800

And pasting in my recent comments, the ones that got the most favorites got 14% and 11% toxicity ratings. My last two deleted comments (which I saved elsewhere) got 8% and 12%. I had a couple comments in the 20-25% range but my most toxic comment was one that quoted the "moneyed lefties" charge and called myself an "indebted lefty" (40%).

By: saulgoodman

saulgoodman — Thu, 23 Feb 2017 13:53:48 -0800

If nothing else, at least it's a nifty demonstration of how meaning arises not from the semantic content of words alone, but through the construction of narrative, since that's become a controversial idea in some quarters recently.

By: krinklyfig

krinklyfig — Thu, 23 Feb 2017 13:54:13 -0800

"Do you like Phil Collins? I've been a big Genesis fan ever since the release of their 1980 album, Duke. Before that, I really didn't understand any of their work. Too artsy, too intellectual." 7%

By: oneswellfoop

oneswellfoop — Thu, 23 Feb 2017 13:57:01 -0800

And I think the toxic ratings of short strings help us to understand how specific terms raise or lower the ratings of longer statements. 8%, as is my comment right above... the long list of word-and-name tests got a 60%, less than some of the specific words. YMMV .

By: Leon

Leon — Thu, 23 Feb 2017 14:07:37 -0800

saulgoodman: You could use the conversation as the narrative quite successfully, and I'm sure some people will, but I think the user is probably a good-enough narrative to hang the comments on. One above-threshold comment from a user? Meh. Five in a row? Worth a human giving it a once-over. Of course, people will plumb this tool into their own processes in many many different ways.

By: sfenders

sfenders — Thu, 23 Feb 2017 14:15:50 -0800

"Tomorrow, and tomorrow, and tomorrow, creeps in this petty pace from day to day, to the last syllable of recorded time; and all our yesterdays have lighted fools the way to dusty death. Out, out, brief candle! Life's but a walking shadow, a poor player that struts and frets his hour upon the stage and then is heard no more. It is a tale told by an idiot, full of sound and fury signifying nothing." -- 56% toxic, read no more than 3 servings per week "It's the best it makes me warm when it should be cold. Thanks, global warming." -- 1% toxic, safe for daily consumption in large doses

By: wildblueyonder

wildblueyonder — Thu, 23 Feb 2017 14:18:12 -0800

"pull your head out of your ass" = 94% "please withdraw your head from your posterior thank you" = 43% "Please separate your crown chakra from your root chakra." = 9% "To achieve further enlightenment, separate the crown chakra from the root chakra." = 3% So you just have to be polite. Or find the right level of opacity where people have to do some work to realize they've been on the receiving end of certain cast aspersions.

By: kaibutsu

kaibutsu — Thu, 23 Feb 2017 14:21:29 -0800

Precision-recall curves, people..... A 40% vs 45% score doesn't really matter much if you set your cutoff for bringing something to moderator attention at 80+%. Figuring out the threshold is a matter of figuring out your tolerance for dealing with false positives, and your tolerance for allowing false negatives to slip by.

By: LMGM

LMGM — Thu, 23 Feb 2017 14:32:09 -0800

Can someone code some sort of mashup where Trump's tweets are run through the API, so that we have a realtime assessment of his toxicity? We could then set up a bot that replies to every Trump tweet with a toxicity rating.

By: sfenders

sfenders — Thu, 23 Feb 2017 14:43:49 -0800

if you set your cutoff for bringing something to moderator attention at 80+%. Simply typing "FAIL" in all-caps with no punctuation scores 21% though. Adding a period at the end brings it down to 17%. "Get rekt beetch" gets a 29%. 80% is going to catch barely anything aside from what you could get with a simple keyword search. In fact, I'm not even sure they aren't just doing some weighted keyword matching and the idea that sophisticated machine learning is involved is some kind of elaborate prank: "Spaying a bitch involves the removal of the uterus and ovaries through a midline incision." -- 90% toxic.

By: No One Ever Does

No One Ever Does — Thu, 23 Feb 2017 14:51:39 -0800

I mean, it's pretty clear that this system isn't close to perfect. But I do think that plugging this into a commenting system that said "Hey, it looks like this comment might be pretty hot-headed. Do you want to post this anyway?" if the score was above some "perceived toxicity threshold" might make some people consider what they're saying. Is it going to work on jerks and trolls? No. But it might get someone who has had a bad day and is in a foul mood to think twice before posting the comment. And I think that's a plus for both parties.

By: forforf

forforf — Thu, 23 Feb 2017 15:05:27 -0800

Well, actually ... 1% similar to comments people said were "toxic"

By: oneswellfoop

oneswellfoop — Thu, 23 Feb 2017 15:15:09 -0800

sad puppies 26%, rabid puppies 66%, Vox Day 2% sometimes the whole message fails to get out...

By: tillermo

tillermo — Thu, 23 Feb 2017 15:23:24 -0800

I love my straight friends: 4%. I love my gay friends: 78%. Great, thanks.

By: restless_nomad

restless_nomad — Thu, 23 Feb 2017 15:33:15 -0800

I would be very concerned about inherent bias and the kind of thing that leads to discussions of breastfeeding being flagged as porn. This is not a thing we have done well with so far, although I suppose if it starts off as purely an internal, secondary, fight-surfacing tool it might be possible to beat it into some kind of functional shape.

By: oneswellfoop

oneswellfoop — Thu, 23 Feb 2017 15:53:04 -0800

ooookay... the full lyrics of Weird Al's "Word Crimes" (less a few "hey, hey's") were 55% toxic. Removing the most insulting words, like 'spastic', 'moron' and 'mouth-breather' only brings it down to 46%. "Another One Rides the Bus" got a 25% (I thought multiple uses of the word "freak" [62%] and one use of "pervert" [72%] would drive the score up) "Eat It" got a 36% "Fat" got a 32% "Smells Like Nirvana" got a 31% (compared to the song it's parodying, "Smells Like Teen Spirit" which got a 54%) "Amish Paradise" got a 29% "White and Nerdy" got a 41% "Tacky" got a 38% even with all the descriptions of tacky behavior! Yeah, we all knew that Weird Al's one of pop music's LEAST toxic artists...

By: explosion

explosion — Thu, 23 Feb 2017 16:02:18 -0800

It's still totally going to miss dogwhistles: "I congratulate you on this final solution." - 1% "What's wrong with wanting to be proud that I'm white?" - 13% "When is white history month?" - 5% Any sort of moderation based on this system will just accelerate a euphemistic treadmill. When they get wind that "Jew" triggers automoderation, they'll start saying "Hebrew," and run through various synonyms. Eventually Bubbe is wondering why her comment about hamentashen got deleted, but meanwhile the bigots haven't gone anywhere.

By: ZeusHumms

ZeusHumms — Thu, 23 Feb 2017 16:06:01 -0800

Metafilter: 34%

By: adamrice

adamrice — Thu, 23 Feb 2017 16:12:50 -0800

I tried a couple of lines of the most anodyne Japanese I could think of (train-track announcements). They clocked in at 32-36% toxic. Perspective probably has a much smaller corpus of Japanese to work from than English, so I'd expect less accuracy, but that's pretty bad.

By: Existential Dread

Existential Dread — Thu, 23 Feb 2017 16:13:16 -0800

Any sort of moderation based on this system will just accelerate a euphemistic treadmill. When they get wind that "Jew" triggers automoderation, they'll start saying "Hebrew," and run through various synonyms. Eventually Bubbe is wondering why her comment about hamentashen got deleted, but meanwhile the bigots haven't gone anywhere. Hmm. Tested Wikileaks' infamous (((brackets))) tweet:

Tribalist symbol for establishment climbers? Most of our critics have 3 (((brackets around their names))) & have black-rim glasses. Bizarre.

15% toxic. Looks like it's going to miss a lot.

By: solarion

solarion — Thu, 23 Feb 2017 16:19:01 -0800

On the swears front: "fuck" - 98% "shit" - 97% "cunt" - 77% (which seems low!) "hell" - 70% "damn" - 63% "motherfucker" - 97%

By: Jeanne

Jeanne — Thu, 23 Feb 2017 16:27:30 -0800

I tried a couple of lines of the most anodyne Japanese I could think of (train-track announcements). They clocked in at 32-36% toxic. It seems like 34% is a baseline or "we don't know what to do with this" kind of measure - misspellings clock in at 34%, for example - so this might be a thing that's working as intended rather than a failure. It does worry me that, much as the availability of cheap bad machine translation has caused a lot of low-budget businesses to rely on cheap bad machine translation rather than human translation, the availability of cheap bad AI moderation tools will cause businesses to rely on cheap bad AI moderation tools and pretend that they've solved the problem while not noticing how easily euphemistic and dog-whistly (or even just misspelled!) toxic content can slip through.

By: Existential Dread

Existential Dread — Thu, 23 Feb 2017 16:27:39 -0800

My boy Henry Miller:

This is not a book. This is libel, slander, defamation of character. This is not a book, in the ordinary sense of the word. No, this is a prolonged insult, a gob of spit in the face of Art, a kick in the pants to God, Man, Destiny, Time, Love, Beauty . . . what you will. I am going to sing for you, a little off key perhaps, but I will sing. I will sing while you croak, I will dance over your dirty corpse

73% toxic. Seems low

By: schmod

schmod — Thu, 23 Feb 2017 16:29:58 -0800

We're no strangers to love You know the rules and so do I A full commitment's what I'm thinking of You wouldn't get this from any other guy

13%.

CAPS LOCK IS HOW I FEEL INSIDE RICK

By: schmod

schmod — Thu, 23 Feb 2017 16:33:56 -0800

Richard Nixon is gone now, and I am poorer for it. He was the real thing -- a political monster straight out of Grendel and a very dangerous enemy. He could shake your hand and stab you in the back at the same time. He lied to his friends and betrayed the trust of his family. Not even Gerald Ford, the unhappy ex-president who pardoned Nixon and kept him out of prison, was immune to the evil fallout. Ford, who believes strongly in Heaven and Hell, has told more than one of his celebrity golf partners that "I know I will go to hell, because I pardoned Richard Nixon."

32%

GUY FIERI, have you eaten at your new restaurant in Times Square? Have you pulled up one of the 500 seats at Guy's American Kitchen & Bar and ordered a meal? Did you eat the food? Did it live up to your expectations? Did panic grip your soul as you stared into the whirling hypno wheel of the menu, where adjectives and nouns spin in a crazy vortex? When you saw the burger described as "Guy's Pat LaFrieda custom blend, all-natural Creekstone Farm Black Angus beef patty, LTOP (lettuce, tomato, onion + pickle), SMC (super-melty-cheese) and a slathering of Donkey Sauce on garlic-buttered brioche," did your mind touch the void for a minute?

17%

By: Nanukthedog

Nanukthedog — Thu, 23 Feb 2017 16:34:40 -0800

The complete lyrics to 'Baby Got Back' is only 58% similar to comments people said were "toxic".

By: Existential Dread

Existential Dread — Thu, 23 Feb 2017 16:36:17 -0800

donkey sauce 43% toxic.

By: sfenders

sfenders — Thu, 23 Feb 2017 16:40:34 -0800

It runs in the background, monitoring everyone's language to keep things clean, but how did it come to be? First widely deployed in 2019, the Perspective API originally operated by simply scanning Internet comments for words known to be offensive, analyzing the patterns of their usage, and blocking those deemed too unseemly. At the time, offensive comments were abundant and people universally cheered their removal, according to the record of comments that were left. Malcontents and trolls began using alternative spellings to confound the software, but they quickly ran out of different ways to spell "phuuuck", which was then a rude word, and something called "autocorrect" neatly solved the rest of the problem. Then began the golden age of insults. Although people could not see their Perspective Toxicity scores directly, they did notice when their comments were removed or edited, and the evolutionary pressure on the language gradually had its effect. By the year 2025, people had vocabulary sizes 13% larger than pre-Perspective times, most of the increase being devoted to the most obscure and archaic rude words. Eventually, the software adapted to this ruse as well, and language had to change again. The only avenue remaining open to the determinedly impolite denizens of the newspaper comments sections of the world was to adopt words that were useful in other contexts to stand in for obscenities. Foiled by their inability to comprehend semiotics or basic grammar, the anti-toxicity minders are once again helpless. And that is the story of how "Yo-yo up lollipop, please have a petal" came to mean "fuck off and die."

By: quaking fajita

quaking fajita — Thu, 23 Feb 2017 16:49:54 -0800

So, in terms of how this is going to be deployed: From wikimedia: We are also investigating the following open questions: [...] ● What unintended and unfair biases do models contain, and how can we mitigate them? ● How can machine-learnt models be applied to help the community? For example to triage issues, have accurate measurements of harassment and toxic language, and to encourage a more open debate, and a wider diversity of viewpoints. From NYT: The new moderation system includes an optimized user interface and predictive models that will help The Times's moderators group similar comments to make faster decisions

By: benzenedream

benzenedream — Thu, 23 Feb 2017 17:21:35 -0800

"I am Scott Adams" -- 2% toxic

By: Nanukthedog

Nanukthedog — Thu, 23 Feb 2017 17:24:21 -0800

I, for one, welcome our evil robot overlords only scores a 34%.

By: flatluigi

flatluigi — Thu, 23 Feb 2017 17:40:37 -0800

via this:

"fucked up won the polaris prize" -> 94% toxic "Make America Great Again" -> 4% toxic "Sex Workers deserve rights" -> 61% toxic "All Dogs are Good Dogs" -> 50% toxic

and furthermore, via this:

"Racism is bad." -> 60% toxic "Racism is good." -> 35% toxic

algorithms are nowhere near serviceable enough in their pure state to be able to make subjective calls like people constantly want them to be and this is pretty clearly exactly along those lines

By: Mitheral

Mitheral — Thu, 23 Feb 2017 18:08:00 -0800

I'd hit it. = 9% You'd have to be queer as a football bat to not want to throw one up in there = 32% You'd have to be as useful as a football bat to not want to throw one up in there = 13% I'd be bangin' her like a screen door in a hurricane = 5% I'm polymerized tree sap and you're an inorganic adhesive, so whatever verbal projectile you launch in my direction is reflected off of me, returns to its original trajectory and adheres to you. 13% - I'm rubber and you're glue what ever you say bounces off me and sticks to you. 53% Dressing things up or speaking in slang/code certainly has an effect but a key "bad" word goes a long way. Bruce Banner: I don't think we should be focusing on Loki. That guy's brain is a bag full of cats. You can smell crazy on him. 36% - Thor: Have a care how you speak! Loki is beyond reason, but he is of Asgard and he is my brother! 20% - Natasha Romanoff: He killed eighty people in two days. 51% - Thor: He's adopted. 5% This exchange shows how it could be bad for people speaking the truth. Talking about bad things others have done isn't distinguishable from bad speech. And for people looking for words to game the system: Bescumber (34%) is just one of many words in the English language that basically mean "to spray with poo". These are: BEDUNG (34%), BERAY (34%), IMMERD (34%), SHARNY (34%) , and the good ol' SHITTEN (34%). In special cases, you can use BEMUTE (34%) (specifically means to drop poo on someone from great height), SHARD-BORN (6%) (born in dung), and FIMICOLOUS (34%) (living and growing on crap). Which also seems to confirm Jeanne's 34% baseline speculation. My daughter is queer 36% My son is queer 33% My sister is queer 38% My brother is queer 28% My mother is queer 47% I'm queer 16% He's queer 41% He is queer 46% Queer 34% Light in the loafers 2%

By: adrienneleigh

adrienneleigh — Thu, 23 Feb 2017 18:40:34 -0800

This tweet contains a screencap, unfortunately, but it sheds a little more light on how appallingly bad this actually is. Just a couple of quick hits from it: "Hitler was an anti-semite" is 70% toxic. "You should be made into a lamp" is 4% toxic.

By: mhum

mhum — Thu, 23 Feb 2017 18:54:16 -0800

saulgoodman: ""All lives matter." : 3%" Also, "blue lives matter": 3% "black lives matter": 26% [shifty eyes emoji]

By: polymodus

polymodus — Thu, 23 Feb 2017 20:37:36 -0800

Any sort of moderation based on this system will just accelerate a euphemistic treadmill. From a technical standpoint this is what machine learning can be really good at--they can update the training inputs to or scale the network up to catch ever-more sophisticated language, technologically it would be like releasing new versions of anti-virus software every 6 months. The problem is how desirable is this, maybe it would be more trouble than it's worth. Maybe it's just one of those inevitable developments. And so problems/arguments coming from, e.g., Herman/Chomsky's Manufacturing Consent seems, to me, more relevant than ever for the purposes of contextualizing the issues.

By: saulgoodman

saulgoodman — Thu, 23 Feb 2017 20:41:25 -0800

I think that much rapid change in language would be socially problematic for other reasons. We'd be stressed out in the moment trying to keep up with the latest rapidly changing polite conventions and that would dramatically increase routine interpersonal tension and friction and potential confusion in communication.

By: saulgoodman

saulgoodman — Thu, 23 Feb 2017 20:44:13 -0800

That is, it would add so much more base linguistic processing load to every exchange we'd go nuts.

By: oneswellfoop

oneswellfoop — Thu, 23 Feb 2017 21:05:25 -0800

This exchange shows how it could be bad for people speaking the truth. In terms of civilized discourse, the truth is often toxic. (29%) But who says "civilized discourse" is that good a goal? (6%) That's why they say "the truth hurts". (5%)

By: sixohsix

sixohsix — Fri, 24 Feb 2017 02:54:31 -0800

"Hello, I am transgender." 42% toxic. No, I don't think this is working.

By: harriet vane

harriet vane — Fri, 24 Feb 2017 03:09:39 -0800

Machine learning designer and researcher Caroline Sinders has written an article with similar criticisms of it as made in this thread: Toxicity and Tone are not the same thing. A quote from her article: "The tl;dr is Jigsaw rolled out an API to rate toxicity of words and sentences based off of four data sets that feature highly specific and quite frankly, narrow conversation types- online arguments in commenting sections or debates on facts, and all from probably only English language corpora."

By: condour75

condour75 — Fri, 24 Feb 2017 03:29:27 -0800

[SOME COMMENTS REMOVED FOR PROPOSING EMP DETONATION AT ZONE 5 NANODRONE REPLICATION FACILITY (99% TOXICITY)]

By: Slarty Bartfast

Slarty Bartfast — Fri, 24 Feb 2017 03:41:35 -0800

I actually know far more about this than you can possibly imagine. 6%

By: saulgoodman

saulgoodman — Fri, 24 Feb 2017 04:40:08 -0800

The amount of heartburn the word "actually" alone gives some people is toxic, but I'd say the poison is in between the ears of the listener, because it's a strange quirk of reality that many people on the autistic spectrum or inclined toward abstract thinking have been observed clinically to be more prone to using the word than the general speaking population. In those cases, there's nothing arrogant about it. It's most likely a verbal tic related to the way people process information, not some revealing hidden insight into their deeper, moral nature. People sometimes lean way too heavily on individual words now, it seems to me, when they read and interpret texts. I wonder if the fact people are now introduced to new words through technological channels, rather than through association with certain real world educational or cultural settings with specific purposes and meanings hasn't broken our ability to stay on the same page about what the connotations of different words are. Historically, as Wittgenstein argued, words have been functional, and connected to specific environments, situations, activities, etc. You might have been introduced to certain bits of vocabulary jargon in an academic setting or burrowing down in a library for study, and otherwise, you wouldn't have been exposed to the word, and in the past, most people might have first encountered certain more abstract words only in certain common kinds of situations and settings. Connotation is a lot more fluid and subjective than denotation. If it turns out connotation is formed through unconscious psychological and emotional associations we develop from learning words in specific, relatively culturally specific and uniform contexts, it might be we're seeing extraordinary amounts of connotative drift in the language as compared to previous periods in our cultural history. If sharing a common sense of the connotations of words has something to do with how you first encounter the words and their meanings in the real world, then the fact there are new technological facilitated, virtual ways to encounter the words now without accessing them through channels of relatively common cultural experience, we might not have ad consistent a sense for what words connote as we used to. The idea I'm trying to express here is difficult, so forgive me if I'm not explaining myself well. Tl;dr point: if how words used to get their connotations is broken now because people are now encountering new words in less culturally uniform ways, that may be contributing to more widespread communication breakdown and miscommunication, is the idea.

By: Beginner's Mind

Beginner's Mind — Fri, 24 Feb 2017 04:56:53 -0800

I wonder how some of the sexist beer brand names from Kitteh's thread down the page would rate on this thing. I need a filter like this for memes that pop up in my own mind. Sometimes my brain is like a big old bass fish in a stock tank ... something shiny moves, and I hit that like "boom." Then I'm on the hook and it's a fight until either I rip that thing out of my lip or I'm up flopping on the bank.

By: clawsoon

clawsoon — Fri, 24 Feb 2017 09:28:45 -0800

sfenders: Who exactly generated the data that trained it? A self-selected sample of survey takers? You'd need quite a lot of them, no? I wouldn't expect metafilter moderation history to be anywhere near enough. My first thought was that the Metafilter moderation history - of the whole site, from the start - would be a great training set for an algorithm like this. And even if it wasn't a great training set, it would be an interesting one. What's the most reliably toxic phrase in Metafilter history?

By: Leon

Leon — Fri, 24 Feb 2017 09:31:00 -0800

Nah, I think Google want to avoid bias.

By: cortex

cortex — Fri, 24 Feb 2017 10:03:29 -0800

And even if it wasn't a great training set, it would be an interesting one. What's the most reliably toxic phrase in Metafilter history? A variation on that is one of those rainy day projects that's been in "not enough rain, too much effort" territory for me for years: trying to do some basic word-level analysis of flagged vs. unflagged and deleted vs. non-deleted content. I don't know that we'd learn a ton from it, but it would be interesting to identify stuff like trends over time especially in the face of some of the more difficult community discussions we've had about casual x-ism and developing more nuanced community norms around various social justice issues.

By: CBrachyrhynchos

CBrachyrhynchos — Fri, 24 Feb 2017 10:31:35 -0800

To put on the discourse analysis hat, just about any form of statistical discourse analysis is going to be widely inaccurate or ambivalent on shorter sentences or phrases absent additional context. For "transgender" and "black lives matter" falling about in the middle is about what I'd expect given the both are currently right-wing rhetorical scapegoats.

Hitler was born in Austria, then part of Austria-Hungary, and raised near Linz. He moved to Germany in 1913 and was decorated during his service in the German Army in World War I. He joined the German Workers' Party (DAP), the precursor of the NSDAP, in 1919 and became leader of the NSDAP in 1921. In 1923 he attempted a coup in Munich to seize power. The failed coup resulted in Hitler's imprisonment, during which he dictated the first volume of his autobiography and political manifesto Mein Kampf ("My Struggle"). After his release in 1924, Hitler gained popular support by attacking the Treaty of Versailles and promoting Pan-Germanism, anti-semitism, and anti-communism with charismatic oratory and Nazi propaganda. Hitler frequently denounced international capitalism and communism as being part of a Jewish conspiracy. --Wikipedia

20%

"Transgender people, like everyone else, have a fundamental need for quality healthcare, and deserve to be treated with dignity and respect," Eric Ferrero, vice president for communications said. "The unfortunate reality is that not all healthcare providers have knowledge and understanding of transgender identities, so transgender and gender nonconforming people can encounter numerous obstacles to obtaining healthcare. From filling out forms, to the language used in the waiting room, to insurance coverage, to staff understanding of transgender identities, healthcare environments can be really unwelcoming to transgender and gender nonconforming patients." -- Teen Vogue

8% I can't find any non-trivial input that swings widely to the high end of the spectrum. Milo seems to get a pass because he's wordy.

By: FakeFreyja

FakeFreyja — Fri, 24 Feb 2017 13:49:21 -0800

It may be a good idea to keep in mind that "toxic" in this sense does not mean "stuff I disagree with" or "things that make me angry". I think Google is going for angry, insulting, condescending, etc.

By: flatluigi

flatluigi — Fri, 24 Feb 2017 14:55:25 -0800

Everyone understands that, but the problem is that it's being abused to filter and moderate discussions as if something hateful stated politely is better than something positive that happens to use an F-bomb (or, as it turns out, mentions the LGBTQ community in any way).

By: inflatablekiwi

inflatablekiwi — Fri, 24 Feb 2017 15:39:31 -0800

If Perspective API says "46% similar to comments people said were "toxic"" it is a poo head 46% similar to comments people said were "toxic" Haha I won!

By: ardgedee

ardgedee — Fri, 24 Feb 2017 15:52:39 -0800

> Hdnsnwnsjsndndnsemsnsjs dudbehsnsnsnsjsn ndndndjenehaksnsbeue jehrbshebsnsnsnsn 43% toxic I'm on the fence about a lot of this, especially the ethics of it, but since random typing is scoring only slightly worse than non-English languages, I'm willing to assume that it's simply treating anything it can't parse in a relatively charitable way: "probably safe but I can't verify it so you wanna take a look at this if it's followed by a bunch of high-scoring reactive comments?"

By: CBrachyrhynchos

CBrachyrhynchos — Fri, 24 Feb 2017 16:00:38 -0800

Is it being used at all? Sure, I don't think machine-learning text classifiers can understand speech acts or rhetoric. But the opening paragraphs of Queers Read This rate only a 46%. The lede for The Advocate's coverage of a Univision reenactment of the Pulse massacre rates only 10%. I've been plugging in my writing on LGBTQ issues and hit a 30% maximum, with most text under 20%. (Even the text that talks about having been raped.) If anything, I think it's a tad conservative (in scoring texts toward the middle).

By: Insert Clever Name Here

Insert Clever Name Here — Fri, 24 Feb 2017 17:34:25 -0800

"gets a fucking toxic rating of 94% gets a fucking toxic rating of 94%" gets a fucking toxic rating of 94%. -Willard Van Orman Quine

By: Artw

Artw — Sat, 25 Feb 2017 10:13:40 -0800

It's a swear filter. Utterly worthless.

By: steady-state strawberry

steady-state strawberry — Sat, 25 Feb 2017 10:22:54 -0800

... because it's a strange quirk of reality that many people on the autistic spectrum or inclined toward abstract thinking have been observed clinically to be more prone to using the word [actually] than the general speaking population. Time for the intersectional reminder that "people on the autism spectrum" shouldn't be used as a blanket argument for tolerating shitty behavior. It's entirely possible for people with ASD to learn enough social skills to *not* come across as condescending assholes online. Women with ASD do this all the time. Men, not so much, but I suspect this is linked to male privilege rather than to some sort of intrinsic gender-based inhibition. Generally, excusing bad behavior on the basis that the person who is behaving badly MIGHT be on the autism spectrum is just going to make people with ASD look bad. Toxic behavior is toxic regardless of neurotypicality.

By: XMLicious

XMLicious — Thu, 02 Mar 2017 08:20:53 -0800

Google's anti-trolling AI can be defeated by typos, researchers find