Promises and challenges of Big Data in the dairy industry [Longread]

Since the 1950’s, computers have been used as a management tool in dairy farming (Lissemore, 1989). Over subsequent decades, dairy herd management software has evolved consistently and the personal computer has emerged as an important management tool to primarily monitor production, reproduction and health (Gloy and Akridge, 2000). In the meantime, technologies to collect and store data have evolved at a quicker pace than the speed at which new insights in dairy science have been discovered. The exponentially increased volume and speed at which data is created in the post-dotcom decade is commonly referred to as Big Data. Despite the fact that “Big Data” has become a buzzword, there is no consistent definition of Big Data, nor detailed analysis of this new and emerging technology. Most discussions until now have been taking place in the blogosphere, where active contributors have generally converged on the most important features and incentives of the Big Data. One of the drivers of excitement around Big Data has been the expectation that we will be able to discover new insights in order to support decision making as an ultimate goal. Machine learning is viewed as a key technology that will unlock such insights. Machine learning toolkits of varying quality and popularity are widely available (Domingos, 2012). Albeit, a review paper from Rutten et al. (2013) documents the lack of integrated information and especially decision-making support tools in dairy research. It illustrates that while the technology has been widely accepted and published in dairy science, in actuality, effective implementation, validation and valorization is often minimal. The current experience paper will focus on the authors’ perspectives on the current challenges in data-intensive animal science.



“On the origin of information by means of data collection, or the preservation of favoured knowledge in the struggle for wisdom.“
The above quote is a wink to the title of one of the most illustrious articles written by Charles Darwin (1859). This statement emphasizes the intrinsic value of data and the potential advantage of raising data to a higher level. It could be stated that data is the biggest and most powerful asset in modern dairy farming. This is supported by Ackoff’s Data, Information, Knowledge and Wisdom pyramid (Ackoff, 1989). However, data is not directly meaningful. Only when data are put in relation to one another does useful information arise where knowledge can be extracted from that collected information. Data only has value when transformed into information and knowledge, which the leads to wisdom.


With regards to the dairy sector, animals produce data that can be seen as a representation of their behaviors, characteristics, events or environment. They are a product of observations made over time. Specifically at this level, technology has helped small data grow into big data. Due to the rapid development of Precision Livestock Farming (PLF) technologies and the availability of high-throughput information from sensors, large-scale massive data has become available on-farm. In the ‘Internet-Of-Things’ (IoT) (r)-evolution, these PLF technologies have been proposed as a roadmap to aid researchers in making decisions. For example, milk robots, milk meters and automated concentrate feeders are collecting data around production parameters. Heat detection sensors, such as pedometers and accelerometers, are attached to the front or hind legs and neck (Roelofs et al., 2017), and more recently, the ears (Bikker et al., 2014; Rutten et al., 2017) to keep track of cow movement and position (Roelofs et al., 2005). New sensor technology such as ruminal and vaginal temperature loggers (Bewley et al., 2008a), automated weight scales, and 3D-imaging technologies, are now being used to collects data around the metabolism and health status of animals (Friggens et al., 2007). So called secondary off-farm data centers exist, mainly containing pedigree and milk recording data. High dimensional genomic and diagnostics datasets are created in different countries across Europe, each containing a subset of data representing the real world of dairy cows (Egger-Danner et al., 2015; Frost et al., 1997; Pietersma et al., 1998; Spahr, 1993; Sun et al., 2013; Tomaszewski, 1993).
In short, the definition that has been proposed for Big Data is that it is a collection of data from traditional and digital sources inside and outside a company or industry that represents a source for ongoing discovery and analysis. One of the drivers of excitement around Big Data has been the expectation that we will be able to discover new insights through its use. However, some specific characteristics of Big Data in the dairy industry need to be addressed before data can be transformed into prediction and decision-making models.


The VOLUME or format of the data no longer presents a major constraint (as storage has become less costly); hence, the total volume of cow related data that is collected per day has increased rapidly. In 1984, it was estimated that a software program to manage 100 farms with 100 cows each, would need about 6 Megabyte (Mb) storage capacity on a yearly basis (Noordhuizen and Buurman, 1984). In 2000, Canadian research reported a need for 1.3 Gb of random access memory (RAM) to hold solutions and right hand sides of mixed model equations during the iteration process of the test-day milk recording model with nearly 2 million animals in the Holstein breed (cows with records and ancestor dams and sires) and with 36 genetic regression coefficients per animal. To save solutions and diagonal blocks for all animals, as well as other information needed for the publication of results, a total of 16 Gb of disk storage was required (Schaeffer et al., 2000). Genomic evaluation of dairy cattle became available in the United States in 2008 (Wiggans et al., 2011). Ever since, multiple researchers have

Leave a Reply

%d bloggers like this: