Data science, advanced analytics, machine learning, artificial intelligence, cognitive computing, and natural language processing are all buzz words popular in the business world today because so many use cases have demonstrated how leveraging these tools can lead to significant competitive advantages.
Despite the proven power in these tools, many still struggle with successful implementation, not because the tools are losing their power, but because many data science teams, vendors, and individuals fail to properly integrate the tools of data science within the context of human decision making. Thus great data science products are built but their true impact is lost through the often irrational, biased, and difficult-to-predict humans who are tasked with using them.
This problem is not new, and what we explore in this post is the idea that we may be able to learn a thing or two from the past in order to develop new roadmaps for successful data science. Herein we look at different models that bridge products with people.
At the intersection of data science and human psychology, lies a multidisciplinary field that is ripe for implementation.
What is the design of everyday data science?
In 1988, Donald Norman published the book The Psychology of Everyday Things, which later turned into another book titled The Design of Everyday Things. The ideas contained in these books were simple, powerful, and disruptive because, prior to this time, no one had formalized how to merge engineering with human psychology. These books have inspired the field of User Design or UX, more formally known as Human-Centered Design (HCD).
Flash forward 30 years, and data science in many businesses may be failing in the same way that design failed before people started to actually incorporate the study of humans into design engineering. But the problem with data science goes beyond the design of everyday things because the products of data science are often not things. Rather they are insights, automations, and models of human skills and abilities. Thus, we must not only take ideas from HCD to improve the user experience with the products of data science but we must also leverage other disciplines to fully grasp a roadmap to successful data science implementation.
What should the design of everyday data science be?
Because the products of data science are increasingly integrated with things, be they refrigerators, toasters, cars, or applications, the design of everyday data science would indeed benefit from some of the principles of HCD that were the bedrock of Dr. Norman’s original ideas.
Before we get started, it is important to define a few key concepts (from Bruce Tognazzini’s extensive work on HCD):
- Discoverability: “ensures that users can find out and understand what the system can do.”
- Affordances: “A relationship between the properties of an object and the capabilities of the agent that determine just how the object could possibly be used.”
- Signifiers: “Affordances determine what actions are possible. Signifiers communicate where the action should take place.”
- Mappings: “Spatial correspondence between the layout of the controls and the devices being controlled.”
- Feedback: Immediate reaction and appropriate amount of response.
To that end, we, as data scientists, must ensure discoverability in our products. We often fail here because we believe that insights derived from statistical models or advanced analytics are in and of themselves discoveries and so therefore are already discoverable. This assumption however is incorrect because insights are only as valuable as they are applicable to the business or user. Therefore, we must articulate what it means to deliver data science products that are more discoverable. This includes all the elements of discoverability including identifying affordances, signifiers, mappings, and opportunities for feedback.
A data science product is delivered in the context of interacting humans and is thus only as good as it allows users to discover how its affordances improve their experiences. An affordance is not an attribute of the product but rather a relationship between the user and the product (Norman, 1988). If a data science classification model replaces the need for someone to click through thousands of documents to find information then its affordances are time, improved quality, and augmented performance. These should be clearly discoverable through the way the product is delivered through documentation and signifiers.
Signifiers signal to users possible points of use that create affordances. In data science this can mean delivering key drivers with models so that users have clarity on why, in the case of the above example, different documents are being categorized, tagged, or labeled by the model. Doing so lends itself to the discovery of affordances such as improved quality and performance augmentation.
Mappings to Dr. Norman referred to how different design elements mapped to their design functions. For example, light switches map to light bulbs by enabling them to turn on or off. In data science we often map the function of models to their probabilities or decisions as 1’s and 0’s but for users, this is not typically intuitive and so therefore this mapping is not typically all that useful. Thus, we can adjust our mappings to include qualifiers that represent more intuitive application of our data science products. For example, probabilities become buckets of “High Risk,” “Moderate Risk,” and “Low Risk” value labels that improve the ability of users to map the outputs of our models to their functions.
In many ways optimal mappings will not be apparent until we have had the chance to obtain feedback from users. For business users feedback can be explicit and carried out in ways that follow the principles of good design (simple, easy, and unobtrusive). In the rare case where our users are actually customers of our insights (a model that predicts someone’s likelihood to get a job or their success in a relationship) then feedback must also be intuitive and responsive (see also below where we expand on responsive feedback design through voice).
But it is not enough to simply borrow concepts from HCD to improve the success of data science products. Because these products are deployed to interact with people, both customers and business users alike, our success pipeline must be sensitive to the political and social psychological relationships that define how these individuals interact with each other and our products.
For example, machines that deliver automation or even augmentation to a business user can feel threatening. The threats can be in the form of threats to job security or they can threaten one’s feelings of efficacy and expertise. Thus, our data science pipeline must be sensitive to this outcome by directly addressing feelings of threat in order to achieve buy-in. Social psychologists have long recognized that to increase buy-in, people need to feel as though a new process is fair, and to ensure fairness the change process requires voice. Voice is the opportunity granted to users to partake in how the process actually unfolds. From a data science perspective this means that we enable opportunities, not just for feedback as we learned from HCD but to demonstrate how that feedback actually created change in our product.
For example, explain the key model features to users and solicit feedback for different ways to group those features into meaningful and actionable groupings. In one such instance, a client had the idea to group features that could be affected via different outreach mediums (e.g. personal phone call, email nudge, etc.). By incorporating this feedback into the product, users were already thinking about how to creatively develop content that could address these differences when they saw those with high probabilities (e.g. risk scores) along with key drivers that better matched different modes of outreach. Users saw the affordances because they were now an active participant in using the product to improve their own impact.
But voice is not the only perspective in psychology that can help to develop a successful data science product pipeline. Indeed, one could incorporate concepts from political psychology or motivation to understand the relational aspects of their products success. We leave this to the imagination and creativity of you, the reader. Feel free to comment below on ideas to continue this conversation and push the envelope further in pursuing more effective models for data science success.
End-to-end success checklist
A useful checklist to consider in developing a successful data science pipeline might look something like this:
- What characteristics make up the primary user groups for this product?
- How do those characteristics suggest different possible affordances of my product? What does my product enable or prevent (anti-affordances) for those specific users?
- What delivery or deployment method makes the most sense to achieve these affordances?
- How do I signal these affordances to my user base?
- What mappings make the most sense from my users’ perspective?
- Am I providing opportunities for feedback that are simple, easy, and unobtrusive?
- Can I demonstrate how the feedback has changed the product?
This concludes our post on successful data science product pipelines. We appreciate you taking the time to read this and look forward to seeing your continued ideas in the comments below. Although this post was high-level and rather theoretical, stay tuned as we will be including future topics that explore more practical issues in coding for data science and human decision making.
I would also emphasize that this is merely one application that attempts to merge different fields but there are many other approaches. The key is to recognize the value of cross pollination from fields as diverse as data science, data engineering, app development, user-experience, and psychology. Cheers!
Norman, D. (2013). The design of everyday things: Revised and expanded edition. Constellation.
Tognazzini, B. (2014). First Principles of Interaction Design (Revised & Expanded). AskTog.