Tag Archives: Axios

Facebook has a misinformation problem, and is blocking access to data about how much there is and who is affected

November 6, 2021 Creative Commons

Leaked internal documents suggest Facebook – which recently renamed itself Meta – is doing far worse than it claims at minimizing COVID-19 vaccine misinformation on the Facebook social media platform.

Online misinformation about the virus and vaccines is a major concern. In one study, survey respondents who got some or all of their news from Facebook were significantly more likely to resist the COVID-19 vaccine than those who got their news from mainstream media sources.

As a researcher who studies social and civic media, I believe it’s critically important to understand how misinformation spreads online. But this is easier said than done. Simply counting instances of misinformation found on a social media platform leaves two key questions unanswered: How likely are users to encounter misinformation, and are certain users especially likely to be affected by misinformation? These questions are the denominator problem and the distribution problem.

The COVID-19 misinformation study, “Facebook’s Algorithm: a Major Threat to Public Health”, published by public interest advocacy group Avaaz in August 2020, reported that sources that frequently shared health misinformation — 82 websites and 42 Facebook pages — had an estimated total reach of 3.8 billion views in a year.

At first glance, that’s a stunningly large number. But it’s important to remember that this is the numerator. To understand what 3.8 billion views in a year means, you also have to calculate the denominator. The numerator is the part of a fraction above the line, which is divided by the part of the fraction below line, the denominator.

Getting some perspective

One possible denominator is 2.9 billion monthly active Facebook users, in which case, on average, every Facebook user has been exposed to at least one piece of information from these health misinformation sources. But these are 3.8 billion content views, not discrete users. How many pieces of information does the average Facebook user encounter in a year? Facebook does not disclose that information.

Without knowing the denominator, a numerator doesn’t tell you very much. The Conversation U.S., CC BY-ND

Market researchers estimate that Facebook users spend from 19 minutes a day to 38 minutes a day on the platform. If the 1.93 billion daily active users of Facebook see an average of 10 posts in their daily sessions – a very conservative estimate – the denominator for that 3.8 billion pieces of information per year is 7.044 trillion (1.93 billion daily users times 10 daily posts times 365 days in a year). This means roughly 0.05% of content on Facebook is posts by these suspect Facebook pages.

The 3.8 billion views figure encompasses all content published on these pages, including innocuous health content, so the proportion of Facebook posts that are health misinformation is smaller than one-twentieth of a percent.

Is it worrying that there’s enough misinformation on Facebook that everyone has likely encountered at least one instance? Or is it reassuring that 99.95% of what’s shared on Facebook is not from the sites Avaaz warns about? Neither.

Misinformation distribution

In addition to estimating a denominator, it’s also important to consider the distribution of this information. Is everyone on Facebook equally likely to encounter health misinformation? Or are people who identify as anti-vaccine or who seek out “alternative health” information more likely to encounter this type of misinformation?

Another social media study focusing on extremist content on YouTube offers a method for understanding the distribution of misinformation. Using browser data from 915 web users, an Anti-Defamation League team recruited a large, demographically diverse sample of U.S. web users and oversampled two groups: heavy users of YouTube, and individuals who showed strong negative racial or gender biases in a set of questions asked by the investigators. Oversampling is surveying a small subset of a population more than its proportion of the population to better record data about the subset.

The researchers found that 9.2% of participants viewed at least one video from an extremist channel, and 22.1% viewed at least one video from an alternative channel, during the months covered by the study. An important piece of context to note: A small group of people were responsible for most views of these videos. And more than 90% of views of extremist or “alternative” videos were by people who reported a high level of racial or gender resentment on the pre-study survey.

While roughly 1 in 10 people found extremist content on YouTube and 2 in 10 found content from right-wing provocateurs, most people who encountered such content “bounced off” it and went elsewhere. The group that found extremist content and sought more of it were people who presumably had an interest: people with strong racist and sexist attitudes.

The authors concluded that “consumption of this potentially harmful content is instead concentrated among Americans who are already high in racial resentment,” and that YouTube’s algorithms may reinforce this pattern. In other words, just knowing the fraction of users who encounter extreme content doesn’t tell you how many people are consuming it. For that, you need to know the distribution as well.

Superspreaders or whack-a-mole?

A widely publicized study from the anti-hate speech advocacy group Center for Countering Digital Hate titled Pandemic Profiteers showed that of 30 anti-vaccine Facebook groups examined, 12 anti-vaccine celebrities were responsible for 70% of the content circulated in these groups, and the three most prominent were responsible for nearly half. But again, it’s critical to ask about denominators: How many anti-vaccine groups are hosted on Facebook? And what percent of Facebook users encounter the sort of information shared in these groups?

Without information about denominators and distribution, the study reveals something interesting about these 30 anti-vaccine Facebook groups, but nothing about medical misinformation on Facebook as a whole.

These types of studies raise the question, “If researchers can find this content, why can’t the social media platforms identify it and remove it?” The Pandemic Profiteers study, which implies that Facebook could solve 70% of the medical misinformation problem by deleting only a dozen accounts, explicitly advocates for the deplatforming of these dealers of disinformation. However, I found that 10 of the 12 anti-vaccine influencers featured in the study have already been removed by Facebook.

Consider Del Bigtree, one of the three most prominent spreaders of vaccination disinformation on Facebook. The problem is not that Bigtree is recruiting new anti-vaccine followers on Facebook; it’s that Facebook users follow Bigtree on other websites and bring his content into their Facebook communities. It’s not 12 individuals and groups posting health misinformation online – it’s likely thousands of individual Facebook users sharing misinformation found elsewhere on the web, featuring these dozen people. It’s much harder to ban thousands of Facebook users than it is to ban 12 anti-vaccine celebrities.

This is why questions of denominator and distribution are critical to understanding misinformation online. Denominator and distribution allow researchers to ask how common or rare behaviors are online, and who engages in those behaviors. If millions of users are each encountering occasional bits of medical misinformation, warning labels might be an effective intervention. But if medical misinformation is consumed mostly by a smaller group that’s actively seeking out and sharing this content, those warning labels are most likely useless.

[You’re smart and curious about the world. So are The Conversation’s authors and editors. You can read us daily by subscribing to our newsletter.]

Getting the right data

Trying to understand misinformation by counting it, without considering denominators or distribution, is what happens when good intentions collide with poor tools. No social media platform makes it possible for researchers to accurately calculate how prominent a particular piece of content is across its platform.

Facebook restricts most researchers to its Crowdtangle tool, which shares information about content engagement, but this is not the same as content views. Twitter explicitly prohibits researchers from calculating a denominator, either the number of Twitter users or the number of tweets shared in a day. YouTube makes it so difficult to find out how many videos are hosted on their service that Google routinely asks interview candidates to estimate the number of YouTube videos hosted to evaluate their quantitative skills.

The leaders of social media platforms have argued that their tools, despite their problems, are good for society, but this argument would be more convincing if researchers could independently verify that claim.

As the societal impacts of social media become more prominent, pressure on the big tech platforms to release more data about their users and their content is likely to increase. If those companies respond by increasing the amount of information that researchers can access, look very closely: Will they let researchers study the denominator and the distribution of content online? And if not, are they afraid of what researchers will find?

This article was originally published on The Conversation By Ethan Zuckerman and was republished under the Creative Commons Attribution-NonCommercial-NoDerivatives license (CC BY-NC-ND 4.0).

Check out Lynxotic on YouTube

Find books on Big Tech and many other topics at our sister site: Cherrybooks on Bookshop.org

Enjoy Lynxotic at Apple News on your iPhone, iPad or Mac.

Lynxotic may receive a small commission based on any purchases made by following links from this page

Cutting Edge, Observation, Politics, Tech

Why Web Scraping Is Vital to Democracy

May 26, 2021 Creative Commons

Journalists have used scrapers to collect data that rooted out extremist cops, tracked lobbyists, and uncovered an underground market for adopted children

By: The Markup Staff

The fruits of web scraping—using code to harvest data and information from websites—are all around us.

People build scrapers that can find every Applebee’s on the planet or collect congressional legislation and votes or track fancy watches for sale on fan websites. Businesses use scrapers to manage their online retail inventory and monitor competitors’ prices. Lots of well-known sites use scrapers to do things like track airline ticket prices and job listings. Google is essentially a giant, crawling web scraper.

Scrapers are also the tools of watchdogs and journalists, which is why The Markup filed an amicus brief in a case before the U.S. Supreme Court this week that threatens to make scraping illegal.

Web Scraping with Python: Collecting More Data from the Modern Web

The case itself—Van Buren v. United States—is not about scraping but rather a legal question regarding the prosecution of a Georgia police officer, Nathan Van Buren, who was bribed to look up confidential information in a law enforcement database. Van Buren was prosecuted under the Computer Fraud and Abuse Act (CFAA), which prohibits unauthorized access to a computer network such as computer hacking, where someone breaks into a system to steal information (or, as dramatized in the 1980s classic movie “WarGames,” potentially start World War III).

In Van Buren’s case, since he was allowed to access the database for work, the question is whether the court will broadly define his troubling activities as “exceeding authorized access” to extract data, which is what would make it a crime under the CFAA. And it’s that definition that could affect journalists.

Or, as Justice Neil Gorsuch put it during Monday’s oral arguments, lead in the direction of “perhaps making a federal criminal of us all.”

Investigative journalists and other watchdogs often use scrapers to illuminate issues big and small, from tracking the influence of lobbyists in Peru by harvesting the digital visitor logs for government buildings to monitoring and collecting political ads on Facebook. In both of those instances, the pages and data scraped are publicly available on the internet—no hacking necessary—but sites involved could easily change the fine print on their terms of service to label the aggregation of that information “unauthorized.” And the U.S. Supreme Court, depending on how it rules, could decide that violating those terms of service is a crime under the CFAA.

“A statute that allows powerful forces like the government or wealthy corporate actors to unilaterally criminalize newsgathering activities by blocking these efforts through the terms of service for their websites would violate the First Amendment,” The Markup wrote in our brief.

What sort of work is at risk? Here’s a roundup of some recent journalism made possible by web scraping:

The COVID tracking project, from The Atlantic, collects and aggregates data from around the country on a daily basis, serving as a means of monitoring where testing is happening, where the pandemic is growing, and the racial disparities in who’s contracting and dying from the virus.
This project, from Reveal, scraped extremist Facebook groups and compared their membership rolls to those of law enforcement groups on Facebook—and found a lot of overlap.
Reveal also used scrapers to find that hundreds of millions of dollars in property taxes should have never been charged to Detroit residents who then lost their homes through foreclosure.
The Markup’s recent investigation into Google’s search results found that it consistently favors its own products, leaving some websites from which the web giant itself scrapes information struggling for visitors and, therefore, ad revenue. The U.S. Department of Justice cited the issue in an antitrust lawsuit against the company.
In Copy, Paste, Legislate, USA Today found a pattern of cookie-cutter laws, pushed by special interest groups, circulating in legislatures around the country.
Reuters scraped social media and message boards to find an underground market for adopted children whose parents, who had usually adopted the children from abroad, decided the children were too much for them. A couple featured in the piece was later convicted of kidnapping as a result of the investigation.
Gizmodo was able to use similar tools to find the probable locations of tens of thousands of Ring surveillance cameras.
The Trace and The Verge, using scrapers, found people using an online market to sell guns without a license and without performing background checks.

This article was originally published on The Markup and was republished under the Creative Commons Attribution-NonCommercial-NoDerivatives license.

Recent Articles:

Find books on Music, Movies & Entertainment and many other topics at our sister site: Cherrybooks on Bookshop.org

Enjoy Lynxotic at Apple News on your iPhone, iPad or Mac.

Lynxotic may receive a small commission based on any purchases made by following links from this page

Business, Entertainment, News, Tech

Media Layoffs Increase Across Content Industry

February 21, 2019 Danny Leeds

Ripple Effect of Digital Deflation?

Recent announcements from BuzzFeed , and a slew of other digital media outlets, that they are culling hundreds of content producers (Axios reports total of over one thousand lay-offs today alone), appear to indicate two distinct themes emerging for the 2019 content media landscape.

On the one hand, after more than a decade of growth and pure digital news models showing potential financial viablility, profitability remains elusive. On the other, a tsunami of artificial intelligence, software and hardware improvements over the last 2 years, in particular, have begun to reduce overheads, especially for those bulk production techniques such as employed by BuzzFeed.

While legacy publishers are showing some marginal improvement in subscription revenues, the business model for online media and content production continues to shift in search of a sustainable income mix. Cost of human contributions to content is an obvious issue.

“The restructuring we are undertaking will reduce our costs and improve our operating model so we can thrive and control our own destiny, without ever needing to raise funding again”
@PERETTI / BUZZFEED

Basic Cost Cutting or Shifting the Mix?

.@peretti email to staff: “Revenue growth by itself isn’t enough to be successful in the long run. The restructuring we are undertaking will reduce our costs and improve our operating model so we can thrive and control our own destiny, without ever needing to raise funding again” https://t.co/vvpCnMhPNF
— Steven Perlberg (@perlberg) January 23, 2019

Lowering overhead by cutting staff is the message sent as the primary explanation for the industry-wide layoffs. Digging just beneath the surface, however, is the not-so-gradual shift in technology that allow digital content production organizations to ramp up output while at the same time reducing costs.

Read: more work done by A.I., software and robots and fewer humans required. Result: Accelerated Digital Deflation.

The entire history of tech advances since the ubiquity of the personal computer has engendered digital deflation (ask anyone in the music industry). Meanwhile, advancements over the last couple of years are creating a massive acceleration of this trend, based on a shift toward automated production in all facets of content creation.

As a content creator the perspective seems amusingly bifurcated; improved tools for visual and verbal expression must be welcomed and adapted to, all while working in a monetization system that does not yet favor the individual creator.

Perhaps a headline: “Thousands of Humans Rendered Obsolete By Improved Robot Software” would seem less compassionate than merely indicating that they (the humans) stood in the way of reaching profitability?

In the long run, using more A.I., together with improved hardware and software tools to increase productivity in digital communication, will lead to lowered barriers to entry. And, for content creators and those 1000+ professionals just liberated, it might be best to consider all their options. One of the best may be to establish a free-lance future using the very technology that erased those jobs in the first place.

Find books on Big Tech, Sustainable Energy, Economics and many other topics at our sister site: Cherrybooks on Bookshop.org

Enjoy Lynxotic at Apple News on your iPhone, iPad or Mac and subscribe to our newsletter.

Lynxotic may receive a small commission based on any purchases made by following links from this page.