A Study of Seasonal ARIMA Model-Based Forecasting Method for Intelligent Food Control in a Livestock Environment

Most of the high and medium quality hays are imported from different countries for the livestock feedlots. As a fact, increasing production cost is becoming one of the primary problems in the livestock production. To minimize the cost spent on the hay import, the forecasting has to be precise. More than the previous year food stock data; the accumulated feed intake of Beef cattle can give an accurate forecast. Therefore, in this paper, Seasonal - Autoregressive Integrated Moving Average (SARIMA) model is used to forecast the food stock requirement in the livestock barn over a simulated data. By identifying the model implementation, The best fit model is identified using the SARIMA model, and the predicted values are compared with the actual data, to provide an accurate forecasting of the food supply. .


Introduction
Due to the insufficient land in some countries, the hays are usually exported from the major countries such as America, East and Mid East Asia [1]. Along with the economic development, the hay values and the import charges are increasing steadily. Even though, the country has many measures to grow high quality pastures, the amount of production is still insignificant for the Cattles. In order to reduce the production cost, the accurate forecasting of the hay availability is required. Most of the farmers blindly choose the previous year's stock quantity or the amount is predicted with the consistent year's purchase, which leads to the inaccuracy of the stock quantity and further leads to the high production cost. As the feed intake of the cattle differs from the age and the season, the required food quantity also differs accordingly. In the beef cattle, the mature beef cow of weight, approximately 450 kg consumes 12 kg of High quality hay, while the heifer cattle that weighed 350 kg consume 8 kg of high quality hay per day [2]. The feed intake of the individual cow can change due to many factors such as illness, seasonal changes, food quality and pregnancy [3]. Hence, the prediction of the food stocks is determined with the feed intake of the Beef cattle, instead of, considering the previous year data. The current data is obtained from the feed bunk through the wireless sensor networks. The weight sensors attached to the feed bunker helps to identify the amount of food intake and transmit it through the wireless sensor networks.
The wide knowledge in the forecasting methods helps to determine the most suitable forecasting strategies for any possible data. Preliminary way to describe the most important criteria for the forecasting strategies includes choosing the simplest and cost effective methods to obtain accuracy, the forecasting system that fit the provided data, a precise time of the forecasting strategies, that is, unaffected from the past-present calculation and also, provide a system to test the accuracy of the forecasting system. Such process helps to determine the best forecasting technique and also clears any bias in the system. Although the models such as regression analysis method and Kalman Filtering method are most used traditional methods, the autoregressive-integrated moving-average (ARIMA) is the most popular model, due to its simple algorithm on the linear sensitive data. The purpose of this paper is to implement the time series analysis technique, namely the Box-Jenkin's autoregressive-integrated moving-average (ARIMA) technique [4], in the development of weekly forecasting models for feed intake of the beef cattle.

Related Works
ARIMA model was first modelled by Box and Jenkins [8] which is mainly used for identifying the pattern and predicting the future values of the time series data. Akaike [9] explains the stationary time series with an Auto-regression process AR (p), where p is finite order that's supposed to be bound by the same integer. The Moving Average (MA) models were first used by Slutzky [10], that calculate the mean data points by creating an average of the data set. Some of the other models that relate to auto regression were pure AR models by Hannan and Quinn [11] for and ARMA models by Hannan [12], suggest obtaining the order of a time series model by minimizing the errors. Many researches are underway in the livestock management process that includes the milk production [13], sales manipulation [14], disease prediction [15] and so on. Deshmukh et al [16] and Sanchez et al [17] uses ARIMA model to predict the milk production to manage the diary cost.
The prediction of the dairy demands for the livestock is also calculated with an ARIMA model by Ahmad et al [18]. Harris et al [19] proves the effectiveness of the ARIMA model in livestock price prediction by comparing with the other models. Since the livestock food intake is less during the winter season, it shows a strong seasonal pattern which repeats every year, Therefore, the seasonal ARIMA (SARIMA) model [20] is found to be more prefect model compared to the other model such as linear regression, random walk and the vector regression. Lira et al [21] explains the usefulness of the SARIMA model in the milk price prediction with the winter seasonal effect. SARIMA model is commonly used to find the future demand of the data with the past year data. Mombeni et al [22] explains the estimation of the water demand using SARIMA model. Compared to the other model, the SARIMA model, is said to be the best model for the prediction of the seasonal data with the huge amount of data. Therefore, in this study, we use SARIMA model to predict and forecast the feed intake of the livestock that gives a clear demand of the food stock for the Beef cattle.

The Suggested Forecasting Method for Food Supplement
This section explains the related model and works used to forecast the Beef cattle food stock. First, the ARIMA model is explained, followed by the summaries of the test model Ljung-Box Test for the verification of the fitted model. The final section discusses the related research works that helped in the identification of the ARIMA model.

ARIMA Process
The time series with the differencing, along the AR and MA model comprises the autoregressive integrated moving average (ARIMA) model. Autoregressive process of order (p) is: Moving average process of order (q) is: The ARIMA model of order (p, d, q) is: is error, normally distributed with zero mean and constant variance σ2 for t = 1, 2,...n. The model changes with different combination of the autoregressive and moving average model. Therefore, to determine the best model from the various combinations, low Akaike's Information Criteria (AIC) [5] is diagnosed, which is defined as: Where m = p+q+P+Q and L is the likelihood function. Since -2 logL is approximately equal to n (1+log2 ) + nlogσ2, where σ2 is the mean square error. Also AIC can be written as: To check the adequacy for the residuals using Q statistic. A modified Q statistic is the Box-Ljung Q statistic [6] is defined by: Where: rk = The residual autocorrelation at lag k n = The number of residuals The Q statistic is compared to a critical value from Chi square distribution. If the p-value associated with Q statistic is small (p<α), the model is considered adequate. Forecasting the future periods using the parameters for the tentative model has been selected. Seasonal-ARIMA model is a part of the ARIMA model which has the same structure to the normal ARIMA model, with the Moving Average (MA), Auto-regressive (AR) and differencing factor. In addition to the normal ARIMA model, all the factors are operated in the multiple of lag with the period of seasons. Seasonal ARIMA (SARIMA) is used when the time series exhibits a seasonal variation. A seasonal autoregressive notation (P) and a seasonal moving average notation (Q) willform the multiplicative process of SARIMA as (p,d,q)(P,D,Q)s. The subscripted letter 's' shows the length of seasonal period. For example, in a hourly data time series s = 7, in a quarterly data s = 4, and in a monthly data s = 12. In order to formalize the model, the backshift operator (B) is used. The time series observation backward in time by k period is symbolized by Bk, such that Bkyt = yt-k Formerly, the backshift operator is used to present a general stationarity transformation, where the time series is stationer if the statistical properties (mean and variance) are constant through time. The general stationarity transformation is presented below: where z is the time series differencing, d is the degree of nonseasonal differencing used and D is the degree of seasonal differencing used. Then, the general form of SARIMA model SARIMA (p,P,q,Q) is

Ljung-Box Test
Ljung-Box Test [23] is mainly used in the residual to check the autocorrelation. In other words, the residual must remain uncorrelated or should be small for a fitted model, else it will be considered as an unfit model. The null hypothesis for the testing is 0 : 1 ( ) = 2 ( ) = . . . = ( ) = 0.

Ljung-Box statistics can be mentioned as
where, N is the number of observation used to estimate the model. In this statistic Q* roughly follows the chi-square distribution, and q is the number of parameters should be estimated in the model. When the Q* is large, the residuals of the model are mostly auto correlated. Therefore, the model is used in the process of the formulation.

Data Set
In this study, the simulated data set for this study consist of daily observation of the feeding behavior in the Beef cattle. All the daily data has been accumulated to the monthly data, which comprises the data for the period of 2000 -2015. The data was simulated considering the seasonal effect (Low intake) of the feed behavior in the winter, due to the cold season and also the digestion of the cattle decrease considerably. The main aim of the paper is to find the best fit model for the forecasting and also to compare the forecasting with the yearly food supply forecasting. The figure 1 shows the feed intake data of the Beef cattle from year 2000 -2015. The smoothed series of weekly moving average is plotted in the figure 2 representing the moving average of the feed data. The prediction and forecast for the feed stock is performed using the ARIMA model.

Model Identification
As the first step of the prediction, the data are analyzed to find the stationarity, to determine the seasonal effect of the data. To identify the stationarity, the autocorrelation function (ACF) and partial autocorrelation functions (PACF) [24] are used. The requirement of fitting an ARIMA model depends on the series to be stationary. When the mean, variance, and autocovariance are supposed to be time invariant, then it is said to be stationary. As the ARIMA model uses previous lags of series to model its behaviour, the uncertainty is reduced with consistent properties. When the data values vary along with a steady variance with the mean of 1. Before the ACF and PACF, the stationary is identified using the augmented Dickey-Fuller (ADF) test, which is a commonly used statistical test for stationarity, with the null hypothesis to identify the non-stationarity. The test procedure calculates the change in the output variable, that can be explained by lagged value and a linear trend. When the changes remain unchanged in the output variable, despite the change in the lagged values and the appearance of the trend points towards the non-stationarity, with the rejection of null hypothesis.
Autocorrelation plots serve as important visual tool in confirming the stationarity. In addition to that, it can help to find the order parameters for ARIMA model. By identifying the correlation with its lags, trend or seasonal components can be determined, and therefore making the statistical properties constant. It also explains correlation between a series and its lags. Also, with the order of differencing, ACF plots can help in calculating the order of the Moving average (q) model. Similar to the autocorrelation, Partial autocorrelation plots (PACF) display correlation between a variable and its lags, but not with the previous value. And helps to determine the order of the Auto regression (p) model.
Apart from the ADP test, there are few more test that determine the stationarity such as Phillips-Perron (PP) [25], Kwiatkowski-Philips-Schmidt-hin (KPSS) [26] and Mann-Kendall trend test (MK)

Forecasting and Validation
The residuals values at each lags in ACF and PACF needs to be within the confidence interval for a best fitted model. The prediction and forecasting is performed after fitting the model. The Arima model fitting can be validated using various performance measures such as root mean square error (RMSE) and mean absolute error (MAE). But, the most popular used methods are AIC and BIC. It helps to choose the best predictor subsets and also used in complicated non-nested models. The AIC or BIC for a model is expressed by [-2logL + kp], where L is the likelihood function, p is the number of parameters in the model, and k is 2 for AIC and log(n) for BIC.
In other words, AIC estimate the distance between the unknown true likelihood function of the data and the fitted likelihood function of the model, which explains that the lower AIC means a model is closer the nest fit model. BIC estimates the posterior probability under the true condition. Similar to the AIC, The best fitted models is determined using the lower BIC value, although both have different assumptions. Despite the similarity, the BIC uses more complexity and therefore, on choosing the best value, AIC is chosen over a larger model than BIC.

Identification of Seasonality
With the feed data of the beef cattle from year 2000 to 2014, the stationarity is identified through the ADF test, which proves the non-   stationarity. The periodogram plot is also plotted as shown in the figure 3, where the lag point at each fifth month shows the seasonal effect in the feed intake data. Colas et al [28 ], explains the periodogram as a strong way to find the significance of the seasonality in the time series data. Figure 3 shows the periodogram of the Beef cattle feed intake data, signifying the seasonality with the peak and marking the bandwidth as 0.0192.
The periodogram peak at the same frequency gives more confidence to decide on the stationarity. Therefore, initial plot of the feed intake data of the Beef cattle and periodogram indicates the seasonal condition in the feed intake, that intake of the feed less during the The figure 4 shows the decomposed data that separates the data from trend and the seasonality. It is clear that data has no trend aside from seasonal effect. The ACF and PACF plot also explains the seasonality of the feed data. The residuals values for most of the lags were within the limit of 95% tolerance, indicating a very less seasonality in the lag difference and also clarifies that there is no correlation between the residuals.  The ACF plot shown in the figure 5 shows the lag difference of the previous data and the PACF plot in figure 6 shows the lag difference of the period, where the seasonality spikes at the interval of 11. It explains the existence of the seasonality of one pre year. Therefore, we start with the differencing of order one and the differenced values are plotted in the figure 7, which displayed a visible spike at the interval of 11 months. The model fitting is performed with the differenced value of the feed data. Once again, the ACF and PACF plots are plotted to check the effect of the differencing in the figure 8 and 9 respectively.

Implementation of SARIMA model
The residuals of the differenced data series are plotted to check the seasonal spike in the data as shown in the figure 10. Concluding the model to be non-stationary model, we use the SARIMA (Seasonal Auto-regression Integrated Moving Average) to find the best fit model. Based on the autocorrelation plot in the figure 12, the approximate values for the p and q are considered to be less than or equal to 4. The partial autocorrelation in the figure 13 as proves the same. According to the plot, it also shows the first order seasonal changes, so all models from p = 0 -4 and q = 0 -4, with the constant and the first order seasonal differentiation (D =1) were tested. All the 25 models are tested and the corresponding AIC values are tabulated in the Table 1.
The AIC (Akaike Information Criterion) which helps to identify the relative quality of the statistical model is calculated for all the models.    is one of the best model for the forecasting of the feed intake. Forecasting the ARIMA model that happened in the following way to further assure the model, the auto-arima function was performed and obtained the same result. With the obtained model, we fit the model to the data set. The residual fit is performed and the autocorrelation is tested. In the figure 11, the autocorrelation and partial auto correlation function plotted in the in the figure 12 and figure 13, show the spikes within the significance limits, and so the residuals appear to be white noise. A Ljung-Box test also shows that the residuals have no remaining autocorrelations. As the seasonal ARIMA model has passed the required checks, the forecasting is performed. The forecasting model for the differenced data and the original data for the next year (2015) is shown in the figure 13 and 14 respectively. The forecasts follow the trend in the data that occurs due to the double differencing. The large and rapidly increasing prediction intervals show that the feed intake index could start increasing or decreasing at any time, while allow for the data to trend upwards during the forecast period. The quantity of food supplies is commonly ordered by predicting the previous month or previous year's data, without considering the Beef cattle's feed behaviour. Therefore, the feed intake of the beef cattle is considered to reduce the production cost. The table 2 shows the predicted value and the actual value of the food stock ordered for the year 2015. Although, there is no significant variation between the actual and the predicted data from the feed intake behavior, the change takes place during the environmental changes or the number of the beef cattle increases. Hence, the forecast of the food supply for the Beef cattle is preferred to be calculated from the feeding behaviour of the Cattle.

Conclusion
In this paper, the best fit model is found and fit to forecast the food stock supply as well as the comparison of the yearly food stock and the yearly feed intake of the beef cattle. Using the simulated data, the stationarity was identified using the auto-correlation function. Due to the seasonal effect, the seasonal ARIMA model (1,0,0) (2,1,2) was considered to be the best fit BOX-Jenkins model to forecast the feed intake data. The comparison study indicates that the model fits statistically well. Thus, the fitted models forecast better during and beyond the given period to satisfy both the cases. Hence, the feed intake behavior with ARIMA model can be used widely in each barn to forecast and reduce the production cost effectively.  Table 1: AIC values for ARIMA (0,0,0) (2,1,2) to ARIMA (4,0,4) (2,1,2)