The 4th Editions
A year ago I wrote “for a book to be republished in a third edition barely two years after its first is highly unusual, but we were compelled to update the book by the extremely positive reception it has received.” I also wrote that each new course that adopts the book “ratifies the idea that multiple perspectives can reinforce a shared focus on organizing, while at the same time highlighting the concepts, technologies, and methods that distinguish those points of view.”
It is happening again. In the last year the idea of data science as a new career field has led many universities to add new courses, modify existing ones, to hire new faculty, and even to change the names of schools or departments. Many people teaching with or studying The Discipline of Organizing have suggested that the book incorporate more discussion of data science concepts, and we’ve done that in this 4th edition.
The new methods and tools of data science and machine learning let us organize more information, to do it faster, and to make predictions based on what people have clicked on, bought, or said. Data science introduces new considerations of scale and speed when massive computational power and new statistical techniques are harnessed to organize and act on information.
But this is not the first time that new ideas and technologies have challenged how people organized and interacted with resources, and it won’t be the last. Data science will not replace human organizers, any more than any other science has replaced humans.
A data scientist needs to learn statistics, machine learning, and other new methods and technologies, and this book briefly sketches them, but does not try to teach them in any detail. However, data scientists need to understand the fundamental concepts of information organization, resource description, category design, and classification that are at the heart of this book. Data scientists need to select resources wisely and decide how best to describe them, they need to understand that resource description and categorization can be biased, they need to understand tradeoffs and complements between people and computers, and they need to understand when interpretability of features and organizing principles are more important than a bit more classification accuracy in a machine learning model.
The 4th edition builds a bridge between organizing and data science. It reframes descriptive statistics as organizing techniques, expands the treatment of classification to include computational methods, and incorporates many new examples of data-driven resource selection, organization, maintenance, and personalization. It introduces a new “data science” category of discipline-specific content, both in the chapter text and in endnotes, marked with [DS] in editions that contain endnotes.
New sections and sidebars include:
New sidebar: The Distinction between Data and Information
New sidebar: Data Science and the Discipline of Organizing
New section: §1.9, “The Concept of “Interaction Resource””
New section: §3.3.4, “Organizing With Descriptive Statistics”
New section: §220.127.116.11, “Exploratory Analysis to Understand Data”
New section: §18.104.22.168, “Resource Description for Sensemaking and Science”
New sidebar: Sensemaking and Organizing
New sidebar: Geometric Distance Functions
New section: §7.5.3, “Implementing Categories Defined by Probability and Similarity”
New figure: Figure 7.1, “Rule-based Decision Tree”
New figure: Figure 7.3, “Probabilistic Decision Tree”
New sidebar: Finding Friends and Dates: Lessons for Learning Categories
New sidebar: Statistical Bias and Variance
New sidebar: Bias and Variance on Dartboards
Just as with the 2nd and 3rd editions, we are publishing the 4th edition in a “Professional Edition” that contains all of the discipline-tagged supplemental content and endnotes, and in a simplified “Core Concepts Edition” that omits all supplemental content. In addition, the 4th edition is being published in an “Informatics Edition” that includes all the new content related to data science, but omits the discipline-specific content about library science, museums, and document archives.
Many instructors, students, and readers identified content in previous editions that was inaccurate, confusing, redundant, or missing, and I thank them as a group. We have worked hard to resolve every concern, but as prefaces often say, any remaining flaws are our responsibility.
However, there are some whose contributions to this 4th edition have been so substantial that it would be thoughtless not to thank them by name. Most of them were participants in a “book club” seminar at Berkeley during the 2015-16 academic year that thoroughly deconstructed a number of books to help us design, build, and cross the bridge between organizing and data science. These books included The Signal and the Noise by Nate Silver, Predictive Analytics by Eric Siegel, and The Master Algorithm by Pedro Domingos. The book club participants were Pascual Arrechea, Stacey Baradit, Dina Bseiso, Phil Braddock, Bill Chambers, Jason Danker, Laura Desmond-Black, Paul Glenn, Daniel Griffin, Rob Kuvinka, Molly Mahar, Emily Paul, Robyn Perry, Keshav Potluri, Shom Sarkar, Jordan Shedlock, Vijay Velagapudi, and Emily Witt. We learned a lot together, and the important things we learned are now in this 4th edition.
Other Berkeley students and alums who reviewed the 4th edition include Andy Brooks, Lisa Jervis, Ian MacFarland, Jason Ost, and Richa Prajipati.
Instructors who teach with The Discipline of Organizing can easily see places to improve it, but David Bamman, John King, Vivien Petras, Isabelle Sperano, Mikael Gunnarsson, Yasar Tonta, and Nina Wacholder reviewed the book carefully and proposed new content.
Without Robyn Perry’s contributions as an author, reviewer, and graphic artist, we would not have finished this edition in time for the 2016-2017 academic year. The 4th edition was produced by Murray Maloney, as markup and production editor.
Robert J. Glushko, 5 August 2016