The News: The retirement of DAWNbench was recently announced by the team at Stanford University signaling change ahead for benchmarking the performance of integrated AI hardware/software stacks. Stanford announced that DAWNBench will stop accepting rolling submissions on March 27th in order to help consolidate industry benchmarking efforts. The team responsible for development of the end-to-end learning performance benchmarking methodology introduced with DAWNBench has been working with MLPerf to expand the functionality to a more comprehensive offering of tasks and scenarios. Having now put MLPerf Training benchmark suite and the MLPerf Inference benchmark suite through testing, it was decided to “pass the torch” from DAWNBench to MLPerf moving forward. Read the full post from the Stanford DAWN team here.
Analyst Take: The retirement of DAWNBench by its team of creators at Stanford was of particular interest to me as it relates to what’s next for benchmarking the next generation of infrastructure for industrialized data science. When launched in 2017, DAWNBench was tremendously valuable, now, it’s exciting to look ahead to the next iteration of this industry standard. But first, a look at the back story.
DAWNBench was launched in 2017 as an integral component of the larger five-year DAWN research project as part of the industrial affiliates program at Stanford University and is financially supported in part by founding members including Intel, Microsoft, NEC, Teradata, VMWare, and Google. DAWN’s charter was intended to address the issues of the age of machine learning and artificial intelligence.
DAWNBench was the first open benchmark to compare end-to-end training and inference across multiple deep learning frameworks and tasks. Its benchmarks support comparison of the performance of AI stacks across diverse model architectures, software frameworks, hardware platforms, and optimization procedures. DAWNBench provided benchmark specifications for image classification and question answering workloads, and it benchmarked on such stack metrics as accuracy, computation time and cost.
The AI solution market has matured to the point where users demand reliable benchmarks of the comparative performance of alternative hardware/software stacks.
Though its importance will soon be a historical footnote, DAWNBench was a key catalyst in the larger industry push for open AI performance benchmarking frameworks. This pioneering project predated and substantially inspired MLPerf and other AI industry benchmarking initiatives that have taken off in the past few years.
Today, and with increasing regularity, AI vendors claim to offer the fastest, most scalable, and lowest cost in handling natural language processing, machine learning, and other data-intensive algorithmic workloads. To bolster these claims, more vendors are turning to MLPerf benchmarks. And we are already seeing major AI vendors such as NVIDIA and Google boast of their superior performance on the MLPerf training and inferencing benchmarks.
The retirement of DAWNBench was inevitable and should be considered a success. Its discontinuance after three years is a clear sign that its sponsors agree that its core mission has been accomplished.
I expect MLPerf to carry on DAWNBench’s practice of expanding the range of AI performance metrics that can be benchmarked. DAWNBench provided a framework for assessing competitive AI system performance on such metrics as accuracy, computation time, and cost, whereas previous AI accelerator benchmarks had focused purely on accuracy. As edge deployments become more common for AI, I expect that MLPerf will add memory footprint and power efficiency as metrics for benchmarking a wide range of device-level AI workloads.
What’s not clear following the announcement of the retirement of DAWNBench is how the larger DAWN project should proceed now that its most important contribution to the development of standardized AI infrastructures is behind it.
Open AI benchmarks are clearly useful for assessing whether any deployed AI hardware/software stack meets any or all of the applicable productization metrics. Having benchmarks enables AI professionals to have confidence that that a deployed hardware/software stack can meet the stringent productionization requirements of industrialized data science pipelines.
But the other subprojects under DAWN feel like a potpourri of interesting initiatives without any clear organizing vision:
Conceivably, the DAWN project could try to bring some benchmarking focus to each of these projects, including answering the following:
That said, trying to retrofit each of the DAWN projects with a benchmarking focus may not be what AI professionals need most.
Perhaps it would be best for Stanford to bring one of these projects, like Snorkel, for instance, to the forefront of its attentions. High-volume programmatic data labeling definitely scratches a key itch—that of industrialization of training data generation and preparation—felt by working data scientists.
So it’s no surprise that Snorkel has the most active community of any of the remaining DAWN projects. Most notably, Google, working with researchers from Stanford University and Brown University, has extended the open source Snorkel tool to suit it for enterprise-grade data integration workloads. Under the “DryBell” project, Snorkel has been modified to support higher-volume data processing andcreation. The researchers changed the optimization function used in Snorkel’s generative adversarial network to halve the speed at which Snorkel processes data and applies labels. They also integrated Snorkel with the MapReduce distributed computation method so that it can be run loosely coupled across multiple computers.
DAWNBench’s contribution to the AI industry ecosystem is clear. DAWNBench catalyzed creation of an open framework and forum—MLPerf—within which hardware/software stacks can be benchmarked for a wide range of AI workloads.
What’s next for benchmarking next-generation infrastructure for industrialized data science? At the midway point in the DAWN project’s five-year journey, it would behoove Stanford to assess which, if any, of the remaining workstreams provides value in the development of an open ecosystem for industrialized AI pipelines.
Conceivably, the DAWN project could try to bring some benchmarking focus where appropriate to any or all of the remaining projects: MacroBase, Spatial, Weld, NoScope, and Snorkel. But I believe it would more fruitful to emphasize Snorkel, given the significant industry traction it has already received as a next-generation open-source platform for programmatic labeling of training data.
In fact, there is no clear alternative to Snorkel in today’s open AI ecosystem. Training workflows are becoming more automated, manual labeling is running into scalability limits, and synthetic data generation is proving itself sufficient for many AI DevOps challenges. That’s why I’m bullish on Snorkel and the benefit a focus there could provide to the industrialized data science community as a whole.
Futurum Research provides industry research and analysis. These columns are for educational purposes only and should not be considered in any way investment advice.
Zoom Stock Finds a Bright Spot in Coronavirus Fears
Intel is all in on 5G Networks
Poly’s Poly Lens Aims to Increase Collaboration Space Adoption, Simplify IT Ops
The original version of this article was first published on Futurum Research.
In this guest contribution from Steve Vonder Haar, Senior Analyst with Wainhouse, a Futurum Group…
In this guest contribution from Craig Durr, Senior Analyst with Wainhouse, a Futurum Group Company,…
Futurum's Daniel Newman dives into the recent announcement coming out of Micron, that they will…
Futurum analyst Michael Diamond recaps the Amazon Devices and Services event and reviews some of…
Futurum senior analyst Steven Dickens provides his take on the latest announcements coming out of…
Futurum’s Ron Westfall and Daniel Newman examine Micron’s financial results for the fourth quarter 2022…