As we see it:
- Think of “Big Data” as a jigsaw puzzle. There are lots of very different pieces, each one fairly easily identified. But “Big Data” exists (in all its applications and its social and political implications) only when the pieces are fit together.
- This Backgrounder is the first of several on “Big Data” and its implications. Here we will deal with the basics, what we ourselves needed to understand before we had a rough idea of the whole.
- Each of the pieces we discuss here is basic to the “system” as a whole.
- There are more pieces to be added later, each as Backgrounders in their own right: for example on the new modes of financial transactions, the implications of “Big Data” for political life, the impact of “Big Data” on labour (both people and process), the impact of changing discourses on technology, on government surveillance, on hacking, and of course on privacy. Later, we will also consider what can be done to curb the more deleterious effects of Big Data on individuals and the polity.
- It would be easy to dismiss the impact of “Big Data”, as secondary to the very real forces altering both the mode of production and every aspect of social life, or in relation to “Trumpism , nuclear scenarios and of course climate change.
- This is an easy mistake to make. The impact in the case of “Big Data” is not catastrophic, nor episodic, nor even obvious in day-to-day life. The main players are corporations, and much of what they do in terms of “Big Data” is done behind the scenes or openly promoted in the name of innovation, heavily laden with technology and technological mumble-jumble, and seemingly without much significance. Moreover, we are also in some senses beneficiaries, so simply calling “Big Data” a danger is a bit of a misnomer.
- The “Big Data” phenomenon is a slow creeper (think of it a weed of sorts), easy to dismiss in its most immediate manifestations but slowly taking over and profound in its implications.
What’s big and small about “Big Data”:
- “Big Data” depends upon masses of information being collected. What makes “Big Data” big is that all kinds of information can be collected (and combined) from virtually any source.
- For the most part, the information being collected is in very small chunks: that is: it is about a single purchase, a trip across town on Uber, a mobile phone call,, or all the many small physical points of contact between a person sitting in a “smart chair and the chair itself.
- What makes Big Data useful is the size and diversity of the collection, its vast quantities of different kinds of information. Also important is the smallness of the details/data collected. Furthermore, because the collection is so vast, an analyst can break out smaller sub-groups for separate analysis.
Sources of information:
- A significant portion of information comes from governments, especially if governments are committed to open source approaches and transparency (one of many contradictions). This information is mainly free for the taking.
- In fact, there’s a ton of information available for free from multiple sources, corporate, civil society groups, organizations and corporations. Many people also answer on-line questionnaires, a big source of data.
- But much of the data is provided without intention: Think of it this way: as one goes about one’s daily personal and work life, one is continually dropping bits of information into various repositories. For example: one drives a smart car, or uses smart appliances or energy conservation technologies, or purchases something from Amazon or any purchasing site, or plays a game or purchases an app, or subscribes to a magazine, or takes a ride in a taxi, Uber or even take public transportation (Presto/Metrocard).
- Spellcheck a document or pick a movie on Netflicks (any entertainment source that allows you to choose what and when) or join a customer loyalty program or use a credit card or enter any place with a camera (including street cameras, especially their street views).
- Most significantly, use a search engine (any one), Facebook (or any other social networking site), Twitter or Instagram (actually any site with your photos). The contents of each message or post contains information on opinions, sentiments, affect, ”likes” and of course social networks. None of this is personal insofar as the collecting is concerned; it is the amalgam of information from different sources that matters to the collection.
- Add yet more: check in at work, and use a computer, or operate a machine, or make phone calls, or register information on customers, suppliers, distributors and retail outlets, or take orders in a restaurant or bar, or operate a care facility (lots of information collected).
- Anything that deposits information regularly into a computer provides the raw material for “Big Data”.
The role of technology:
- None of this would be possible without significant advances in technology and analytical capacity.
- The data needs to be analyzed to be useful; someone needs to identify the patterns, trends and correlations (more on this below).
- The technological and analytical capabilities to do so exist now, and are continually being expanded at a rapid rate.
- None of this would be possible also without huge advances in and capacity for storage. (drop box as yet another data source).
- Needless to say, technological capacity is not evenly spread across the world, nor is “catch-up” likely or feasible given the magnitude of expertise required.
Who owns it all?
- No one owns it all. One of the most important features of “Big Data” is that the data and its ownership, is decentralized. There is no reason to bring it all under a single corporate umbrella.
- This said, a relatively few corporations and only a few governments hold most of the data, collected by themselves or from other sources.
- Take a fictional corporate example: Easy Corp collects data on those who purchase its services. It knows who the purchasers are, and thus can provide each purchaser with customized lists of “preferences” based on their previous purchases. It incorporates this feedback into its data collection.
- Easy Corp owns (at least) three other businesses: One of its businesses uses Easy Corp’s data on trends etc. to provide marketing services to other companies. One sells or licenses Easy Corp data as a package to other corporations. One of Easy Corp’s businesses is analytics, that is, it is a business based on Easy Corp’s expertise (readily marketable today)
- If one were to just look at Easy Corps’ assets, one would be hard pressed to understand its valuation on the stock market, or its value for private equity companies. Neither its products (or service) nor its plant and inventory, nor even its organizational capacity and expertise adds up to this valuation.
- Big investors know about Easy Corp’s three or more other businesses, and they also know about the ever-increasing importance of “Big Data”. They buy Easy Corp stock at seemingly inflated prices, and Easy Corp’s owners benefit. They are buying the value in “Big Data”.
- The service being provided by companies like Easy Corp can be essentially beside the point. These business can be mainly a means of collecting information. Take Via (a real company that provides shared ride services in NYC.). Unlike Uber (a major player in “Big Data”) Via pays its workers on an hourly basis. There is little likelihood that Via makes money on its service (it doesn’t charge enough) or by leasing its cars and equipment to drivers. But Via is a font of data, and its data is important enough that it has little trouble raising money from investors.
- The value of the data is significant but not just because of its immediate uses: Easy Corp or Via can use its data over and over, with no loss or extra costs. Once collected, the data package is more or less free to be licensed or sold without loss, degradation or additional cost to Easy Corp or Via or any other user.
- Easy Corp makes and sells a product or service. Today, there are many other companies whose business is exclusively either analytics and/or collection and analysis of data packages.
- These other companies collect the data packages from free/open sources including governments, international organizations and think tanks, and/or they purchase data packages from Easy Corp and others. They license or re-sell these data packages or they analyze them for particular purposes for their customers.
- Anyone, in theory, can be a purchaser of analyses produced by ”Big Data”.
- All of this costs a lot of money, of course. This is not a game for small players (corporate or otherwise). Individuals need not apply.
Ten Basic Assumptions of “Big Data”:
- First, the size of the collections matter, each in its own right and all collections taken together. The vastness of information makes it possible to “wash out” inaccuracies and the messiness attendant on usual data collection and/or research.
- Two: it cannot be stressed too much how important it is for those using “Big Data” to have multiple different sources of data, various data packages that can be combined and analyzed in different ways. Google searches can be combined with Uber rides, for example.
- Third, sampling is not needed; everything is collected. The vastness of the data allows for parceling out and analyzing portions of it. Unlike conventional research, collection of information comes before any particular patterns are identified as interesting.
- Fourth, virtually any activity, behavior, feeling, viewpoint can be turned into data points: As noted, Facebook is quite capable of analyzing expressions of feeling or viewpoints expressed by its many billions of users. Instagram (any photo program) can reduce a photograph to data points, both in terms of its content or its technical character.
- Fifth, a smart appliance or a car is continually feeding information through its use, and in the case of a car, its navigation system. Thus the data are constantly evolving in response to changing situations.
- Sixth: just by recording, say, a touch, information can be collected on what a worker does, how and how often.
- Seven: cameras capture movement everywhere, not just about the intended image but also about the rest of the picture (the street view, traffic etc.)
- Eight: What matters are the patterns, trends, correlations (often seemingly serendipitous) and profiles of groups and sub-groups made possible by analysis of the data packages, separately or in combination. “Big Data” answers the question: What is happening here? It does not address or care about “why”. If sales of tomatoes correlate with hospitals in the vicinity, that is enough to decide how to structure tomato sales or where to build hospitals. Serendipity is just fine; there need be no reason why a correlation or pattern emerges. Simply, the data correlations/patterns must prove stable enough to support predictions.
- Nine: “Big Data” is dependent on machine-based learning (that is, the machine learns how to correct its mistakes from the feedback it gets) and technological capacity.
- Ten: the usefulness of “Big Data” is also dependent on human judgment, but this judgment is applied not in deciding what to collect. Human judgment is needed because there are potentially huge numbers of patterns, profiles, and correlations in any large collection of data packages after the fact. Judgment (and interests) are exercised in determining which ones might be useful for which particular purposes.
The World of “Nudge”:
- In the world of much of academe, in government and in policy making, conventional regulation is in bad odor, seen as a “command and control” (a widely-held view notwithstanding the realities of regulatory regimes) and of big (public) government (itself in bad odor).
- Those in favour of regulation’s public interest goals have offered an alternative, something they call “nudge”.
- “Nudge” involves changing the incentive structure (using carrots and sticks, “likes” and dislikes), and the means of affecting moods, behaviors and emotions so that people/corporations will make the better decisions, in their view, better decisions reflecting public interest goals.
- “Nudge” is a libertarian approach inasmuch as individuals/corporations retain their right to choose without interference or rules from government and corporate trade associations (that is without reference to standards).
- Goals are achieved as a result of the way that options are presented. Obviously the patterns (sentiments, viewpoints, social linkages etc.) made evident through “Big Data” feed into defining options that the “nudge” proponents use “to regulate” behavior.
- Someone won a Nobel prize for coining insights into how people form judgments and make decisions, based on extensive research (the old fashioned kind). Today, these same insights are part of much academic writing but also conventional wisdom, a self-reinforcing situation if ever there was one.
- The original argument is that people think in two modes: (1) slowly and based on rational thought, judgment, evidence and observation etc. (2) quickly, impetuously, relying on affect and sentiment, social conventions and previously held views (confirmation bias) as well as intuitions etc. The argument was originally called Thinking Fast, Thinking Slow (2011).
- For a long time, both the process and its contents of rational thought have been studied. But now psychologists have done countless experiments (on university students?) on how people think and act irrationally (i.e. thinking fast) and what they are inclined to do in specific situations. It turns out that “laws” can be developed on the basis of experiments about fast or intuitive thinking and its specific effects on decision-making. These “ laws” indicate which options people will likely choose and how they will probably react. Among the “laws” is the strong inclination of people to reinforce each others’ mood, behaviors, viewpoints, “likes” and dislikes through their own social networks.
- It is a small leap to consider how the “laws” of fast thinking can apply to consumer behavior or to political campaigns etc. Combine the capacities of “Big Data” with the notion of “nudge” and also with the capacity of “Big Data” to identify patterns, trends, profiles of large and small groups, social networks and correlations. Now one can not only profile groups but also predict how they (and others in their social network) will act, think, feel or react in relation to a set of “nudges”.
- Needless to say, there are other ways to approach the notion of human thought and judgment, but Thinking Fast, Thinking Slow plus “nudge” have caught the attention of governments, political actors, policy makers, corporations and members of the public: This approach seems to explain so much. It is useful for their various purposes.
- We will deal with the privacy and other aspects of “Big Data” later. Suffice to say, forget about guarantees that your own information is lost in the mix, anonymous though it may be. This does not mean that you personally and necessarily are being watched. More to come on this.
- The significance of “Big Data” may well lie also in who has access to it, in how it can and perhaps will change the labour process (both production and work), in how it does generate a new kind of “value”, that is, a new factor of production or resource that can be and is bought and sold (without diminishing its value as a resource) and in “Big Data’s” apparent capacity to “nudge” and sway behavior, sentiment and public opinion.
Some of the sources used for this backgrounder: Big Data, Viktor Mayer-Schonberger and Kenneth Cukier, Houghton Mifflin Harcourt, 2014, Thinking, Fast and Slow, Daniel Kahneman, Farrar, Strauss and Giroux, 2011, Thinking does not involve Subjugating, Steven Pinker, 2016, The competitive value of data: From Analytics fo Machine Learning, Irving Wladawsky Berger, WSJ, Feb 3, 2017, Why CEOs aren’t prepared for Big Data, and Tech’s High Stakes Arms race: Costly Data Centres, wsj April 7, 2017, Invisible Manipulators of Your Mind, Tamsin Shaw, (review of Michael Lewis’ The Undoing Project” )New York Review of Books, April 20, 2017.