Applying the Mahalanobis–Taguchi System to. Improve Tablet PC Production Processes. Chi-Feng Peng 2,†, Li-Hsing Ho 3,†, Sang-Bing Tsai. The purpose of this paper is to present and analyze the current literature related to developing and improving the Mahalanobis-Taguchi system (MTS) and to. ABSTRACT. The Mahalanobis-Taguchi System is a diagnosis and predictive method for analyzing patterns in multivariate cases. The goal of this study is to.
|Published (Last):||11 May 2016|
|PDF File Size:||17.10 Mb|
|ePub File Size:||20.34 Mb|
|Price:||Free* [*Free Regsitration Required]|
To receive news and publication updates for Computational Intelligence and Neuroscience, enter your email address in the box below.
BoxAmmanJordan. Correspondence should be addressed systwm Mahmoud El-Banna ; moc. This is an open access article distributed under the Creative Commons Attribution Licensewhich permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
The Mahalanobis Taguchi System MTS is considered one of the most promising binary tguchi algorithms to handle imbalance data. Unfortunately, MTS lacks a method for determining an efficient threshold for the binary classification.
MMTS outperforms the benchmarked algorithms especially when the imbalance ratio is greater than A real life case study on manufacturing sector is used to demonstrate the applicability of the proposed model and to compare its performance with Mahalanobis Genetic Algorithm MGA. Classification is one of the supervised learning approaches in which a new observation needs to be assigned to one tavuchi the predetermined classes or categories.
If the number of the predetermined classes is more than two, it is a mahalnaobis classification problem; otherwise, the problem is known as the binary classification problem.
At present, these problems have found applications in different domains such as product quality [ 1 ] and speech recognition [ 2 ]. The classification accuracy depends on both the classifier and the data types. The classifier types can be categorized according to supervised versus unsupervised learning, linear versus nonlinear hyperplane, and feature selection versus feature extraction based approach [ 3 ].
On the other hand, Sun et al. If the data distribution of one class is different from distributions of others, then the data is considered imbalance.
The border that separates balance from imbalance data is vague; for example, imbalance ratio, which is the ratio between the major to minor class observations, is reported from small values of to 1 to The mahalanogis of an equal number of observations in each class is elementary in using the common classification methods such as decision tree analysis, Support Vector Machines, discriminant analysis, and neural networks [ 6 ].
Imbalance data occurs often in real life such as text mahlanobis [ 7 ]. The problem of treating the applications that have imbalance data with the common classifiers leads to bias in the classification accuracy i.
To handle the classification of imbalanced data problem, the research community uses data and algorithmic or both approaches. For the data approach, the main idea is to balance the class density randomly or informatively i. While at the algorithmic approach, the main idea is to adapt the classier algorithms towards the small class, a combination of the data and algorithmic levels approaches is also used and known as cost-sensitive learning solutions.
The problems reported [ 4 ] using data approach are as follows: While the problem reported [ 4 ] using the algorithmic approach is that it needs a deep understanding about the classier used itself and the application area mahalanpbis. Finally, the problem in using the cost-sensitive learning approach is the assumption of previous knowledge for many errors types and imposing a higher cost to the minority class to improve the prediction accuracy.
Knowing the cost matrices in most cases is practically difficult. While data and algorithmic approaches constitute the majority efforts in the area of imbalanced data, several other approaches have also been conducted, which will be reviewed in Literature Review.
To overcome the pitfalls of data and algorithmic approaches to solve the problem of imbalanced data classification, the classification algorithm needs to be syshem of dealing with imbalance data directly without resampling and should have a systematic foundation mahalankbis determining the cost matrices or the threshold. One of the promising classifiers is the Mahalanobis Taguchi System MTSwhich has shown good classification results for imbalance data without resampling, it does not require any distribution assumption for the input variables, and it can be used to measure the degree of abnormality i.
Three operating point selection criteria, shortest distance, harmonic mean, and antiharmonic mean, have been compared, and the results in [ 9 ] showed that there is no difference among classifiers performances. The aim of this work is to enhance the Mahalanobis Taguchi System MTS classifier performance by providing a scientific, rigorous, and systematic method using the ROC curve for determining the threshold that discriminates between the classes.
The organization of the paper is as follows: Section 2 reviews mahalanbis previous work of imbalance data classifications methods, the Mahalanobis Taguchi System, and its applications. Section 5 presents a case study to demonstrate the applicability of the proposed research. And in Section 6the results obtained from this research are summarized. In this section, an overview of the imbalance classification approaches, the Mahalanobis Taguchi System concept, its different areas of applications, weakness points, and its variants is presented.
Solutions to deal with the imbalanced learning problem can be summarized into the following approaches [ 10 ]: Data level approach [ 11 ] is mainly returning the balance distribution between the classes through resampling techniques. It includes the following types: The problems reported in data approaches are as follows: Algorithmic level approach solutions are based sysstem creating a biased algorithm towards positive class. The algorithmic level approach has tagcuhi used in many popular classifiers such as decision trees, Support Vector Machines SVMsassociation rule mining, back-propagation BP neural network, one-sample learning, active learning methods, and the Mahalanobis Taguchi System MTS.
The adaptation of decision tree classifier to suit the imbalance data can be accomplished by adjusting the probabilistic estimate of the tree leaf or developing new trimming approaches [ 14 ]. Support Vector Machines SVMs showed good classification results for slightly imbalanced data [ 15 ], while for highly imbalanced data researchers [ 1617 ] reported poor performance classification results, since SVM try to reduce total error, which will produce results shifted towards the negative majority class.
Modified Mahalanobis Taguchi System for Imbalance Data Classification
To handle the imbalance data, there are proposals tagucchi as using penalty constants for different classes found in Lin et al. Therefore, in this paper, SVM was selected as one of the benchmarked algorithms to compare with ours; the results showed that SVM classification performance largely tauchi with a high imbalance ratio, which supports the previous findings of the researchers more details will be presented in Results.
Association rule mining is a recent classification approach combining association mining and classification into one approach [ 20 — 22 ].
To handle the imbalance data, determining many syshem supports for different classes to present their varied recurrence is required [ 23 ]. On the other hand, one-class learning [ 2425 ] used the target class only to determine if sytsem new observation belongs to this class or not. BP neural network [ 26 ] and SVMs [ 27 ] are examined as one-class learning approach.
In the case of highly imbalanced data, one-class learning showed good classification results [ 28 ]. Unfortunately, one-class learning algorithms drawbacks are that the size of the training data is relatively larger than those for multiclass approaches, and it is also hard to reduce the dimension of features used for separation.
Active learning approach is used to handle the problems related to the unlabeled training data. Research on active learning for imbalance tagucih reported by Ertekin et al.
Computational Intelligence and Neuroscience
Unfortunately one of the bit falls for using this approach is that it can be computationally expensive [ 30 ]. The problem with the algorithmic approach is that it needs an extensive knowledge of specific classifier i. Cost-sensitive methods use both data and algorithmic approaches, where the objective is to optimize i. Cost-sensitive methods used different costs or penalties for different misclassification types.
For example, let be the cost of wrongly classifying positive instant as a negative one, while is the cost of the contrary case. In imbalance data classification, usually, the revealing of the positive instant is more important than the negative one; hence, the cost of positive instance misclassification outweighs the cost of negatives ones i.
Different types of cost-sensitive approaches have been reported in the literature: The problem of using the cost-sensitive taguchk is that it is based on previous knowledge of the cost matrix for the misclassification kinds, while in most cases it is unavailable.
MTS is a multivariate supervised learning approach, which aims to classify new observation into one of the two classes i. MTS was used previously in predicting weld quality [ 3 sustem, exploring the influence of chemicals constitution on hot rolling manufactured products [ 34 taugchi, and selecting the significant features in automotive handling [ 35 ].
The MTS approach starts with collecting considerable observations from the investigated dataset, tailed by separating of the unhealthy dataset i.
Calculation of the Mahalanobis Distance MD using the negative observation is performed first, followed by scaling i. The scaled MD for the positive date set supposes to be different from MD for those for the negative dataset.
Since many features are used to calculate the MD, so that the probability to have significant features for the multivariable dataset is high, Taguchi orthogonal array is used to screen these features. The criterion for selecting the appropriate features is determined mahalanoobis selecting the features that possess high MD txguchi for the positive observations.
It is worth noticing that a continuous scale is constructed from the single class observations by using MTS; unlike other classification techniques, learning is done directly from the positive and negative observations.
This characteristic helps the MTS classifier to deal with the imbalance data problems. The step of determining the optimal threshold is a critical one for effective MTS classier. To determine the appropriate threshold, loss function approach was proposed by [ 36 ]; however, it is not a practical approach because of the difficulty in specifying the relative cost [ 37 ]. It has been shown in [ 6 ] that PTM classifier performance outperformed MTS tahuchi performance; therefore, it has been selected to be benchmarked with the proposed classifier.
Unfortunately, the PTM method is based on previously assumed parameters, and the accuracy of the classification results was less than the benchmarked classifiers this is one of the findings in this research, which will be discussed in Results. The other research area in the MTS is related to the modification of the Taguchi method not in the threshold determination.
Both the MGA and MTS Particle Swarm Optimization methods deal with the Taguchi system orthogonal array part, while the threshold determination still lacks a solid foundation or is hard to be determined in reality. Finally, the aim of this research is to enhance the Mahalanobis Taguchi System MTS classifier performance through providing a scientific, rigorous, and systematic method of determining the binary classification threshold that discriminates between the two classes, which can be applied to the MTS and its variants i.
The proposed model, Algorithm 1provides an easy, reliable, and systematic way to determine the threshold for the Mahalanobis Taguchi System MTS and its variants i. The currently used approaches either are difficult to use in practice such as the loss function [ 36 ] due to the difficulty in evaluating the cost in each case or are based on previously assumed parameters [ 6 ].
As shown in Figure 1pointrepresents the optimum theoretical solution best performance for any classifier. The closer the classifier performance to this point is, the better it is. The curve drawn in the figure represents the MTS classifier performance for different threshold values. Changing the threshold will change the point location on the curve i.
Therefore, the problem of finding the optimum threshold can be reformulated into the problem of finding the closest point that lies on the curve to point. Step 1 construction of the initial model stage. Assume there are two classes: A set of data is sampled from both classes. Using the negative observations only, reference Mahalanobis Distances are calculated using 1 with all features used. The Mahalanobis Distances MD for the positive observations are also calculated by using the same equation with all features, with the inverse of the correlation matrix of the negative observation used.
Selection of the new features is performed by using the orthogonal array approach; then a recalculation of MDs for the negative and the positive observation is performed. An arbitrary threshold is assumed i. Step 2 optimization stage. If the stopping criteria i. Accordingly, new features will be selected using the orthogonal array approach, and true positive rate, false positive rate, and the fitness function will be also updated.
If the stopping criteria are met, then the training stage is done, and the model is ready for testing observations. Step 3 testing stage. In this stage, the optimum threshold and the associated features are determined from the previous stage and the Mahalanobis Distance for the new observation is calculated based on those parameters.
If the Mahalanobis Distance for this observation is less than the optimum threshold, then it will be classified as negative; otherwise, it will be classified as positive.