Thanos Gentimis, Connor, Lawson
Thanos Gentimis and Lawson Connor
The U.S. Department of Agriculture is one of the chief providers of yield forecasts for farmers. These forecasts for yield and price have been known to shift agricultural market prices and influence farmers’ planting decisions because of the magnitude of the information they bear. Ways to improve these forecasts at the national level are continuously being sought, though the extent of improvement using current statistical models is limited due to three primary reasons:
The USDA forecasts, like other forecasts based on traditional methods, are subject to these issues, and research has shown that the assumptions about variables included in those models leads to over- or underestimates of yields in some years.
To make these forecasts more robust requires applying machine learning techniques, which means teaching a computer program to recognize patterns in a massive amount of data and develop models that can more accurately make predictions. Because machine learning is wholly data driven, it can reduce, if not eliminate, forecaster assumptions and bias. Furthermore, the strengths of machine learning position it as a primary candidate for problems like yield prediction, where large amounts of data inputs are required. Despite the advantages, agriculture has not to date taken full advantage of the potential of machine learning.
Machine learning algorithms are tied to the five Vs (Figure 1) that people normally connect with big data: volume, value, variety, velocity and veracity. First, they thrive when the input is large in volume and variety. They also tend to downplay problems in the data, known as veracity, and they are not affected if the data are coming in fast or the predictions need to happen immediately (velocity). Agriculture appears to be a natural candidate to benefit from the use of machine learning because of the complexity of agricultural systems, enormous data requirements and need for timely value predictions, especially for things like yield.
A local yield forecast for rice, for example, might sound seemingly simple if we fix the crop in a certain field under certain agronomic practices. One should theoretically be able to predict that yield by combining information from previous years with the elaborate models coming from various agricultural disciplines and make changes based on current phenomena, such as disease pressure and weather patterns. To our knowledge, however, no complete statistical model exists for such a general prediction. By applying machine learning, however, we believe such a model can be created.
Many factors that determine yield come into play in a rice field from planting to harvest. Plant genetics is one. Using genetics, scientists create new varieties. This process becomes infinitely more complicated because different varieties perform differently in various climates, soils and agronomic practices, and that all needs to be considered for any successful yield prediction. Instead of fine tuning a mathematical model for each of these variables, a neural network, the primary machine learning tool, would include all the data in a raw tabulated format as input. (See Figure 2)
Temperature is another factor. Although temperature in Louisiana is relatively stable, specific highly unpredictable weather patterns have a dramatic effect on yield, so they must be incorporated and even predicted if a successful algorithm is to be created. Machine learning tools can incorporate whole time series and weather data in the same models as before without many changes.
Research coming from geneticists, plant pathologists, entomologists and many more disciplines must be distilled and applied to the model as well; this turns the application of pesticides and fertilizers to a form of delicate art rather than a standardized process, and the effect on yield can vary dramatically if the balance is broken.
It has been shown that machine learning predictive algorithms improve by incorporating predictions from other specific models as input. Various disciplines approach each problem from their own unique perspective, and the various scientific models are dependent on keeping practically all variables steady and tweaking one or two to see their effects on the output variable. With machine learning, however, the algorithms adjust themselves, provided a long history of data is available.
Essential to an efficiently functioning agricultural market is the expected demand and supply condition throughout the season, and at the heart of this are accurate yield forecasts. Those forecasts inform many aspects of the markets with the most important being the prediction of market price. In general, agricultural prices display tremendous sensitivity for relatively small changes in crop yield expectations.
Accurate yield forecasts, and price predictions by extension, allow for improved inventory management, market coordination (when to sell and when to buy), efficient portfolio management for market financiers, and reduction of risk inherent in agricultural markets and thus reduced costs of doing business. Accurate yield forecasts improve the efficiency of vertically integrated firms as well, such as food processors, packers, mills and eventually the consumers of these products. Improved yield forecast accuracy is where machine learning tools can demonstrate their value compared to current methods.
Agriculture has indeed entered the era of big data, and it is generally accepted that the most appropriate tools today to tackle the challenges will come from machine learning. The LSU AgCenter is sensitive to these changes and is taking a lead toward digital agriculture, creating multidisciplinary teams from all areas related to agriculture. The ultimate goal is helping farmers make better management decisions and mitigate risk.
Thanos Gentimis is an assistant professor in the Department of Experimental Statistics, and Lawson Connor is an assistant professor in the Department of Agricultural Economics and Agribusiness.
(This article appears in the summer 2019 issue of Louisiana Agriculture.)
Figure 1. The five Vs of big data.
Figure 2. A neural network is the primary machine learning tool.