22-23 October 2012 Pacengo del Garda (VR) - Italy www.caeconference.com
|
Data Analysis in a Six Sigma context with modeFRONTIER®
One of the most important concerns of modern companies is the production of high quality products and services. With the globalization of markets and the ever-growing competition, customer satisfaction has become one of the key issues for success.
|
|
|
Figure 1: External data can be easily loaded and organized in modeFRONTIER® thanks to a data import wizard.
|
Series of disciplines, tailored to specific industrial contexts, and methodologies, with a wide and general applicability, have been developed in the last decades in efforts to improve product quality while reducing both costs and time. Six Sigma is certainly one of them; it has gained a wide popularity thanks to its easy applicability and the considerable results in money and time savings reported by the companies which adopted it.
The Six Sigma methodology is mainly based on the simple consideration that all the processes and products which are implemented or produced in an industrial context are always characterized by some variations around a desired optimal value. The real performances of a process or the quality of a product could be, in the real world, substantially different from the expected ones, from a design point of view, leading to low quality and customer dissatisfaction. The aim of Six Sigma is to eliminate defects and waste, measuring and reducing variations, which become the most important enemy to fight with, in the effort to improve quality.
|
|
|
Figure 2: Multi-history plots allow the user to monitor variations over time and discover relations between series.
|
One of the most important ingredients of the success of Six Sigma is the systematic adoption through the company, at different levels, of simple but efficient data measuring and analyses.
In this article we present a simple industrial case where some data are analysed using the Designs Space of modeFRONTIER®. During the measure phase of the Six Sigma DMAIC approach the number of defects (discards) of some production lines have been collected in a spreadsheet (e.g. Excel).
|
|
|
Figure 3: The user can visualize the data distribution, compare series and find outliers thanks to the Box-Whiskers.
|
The main objectives are now to show how it is possible to
- monitor the behavior of discards of the production lines over time,
- use different graphical tools to compare data,
- find out if there is any relation between the series,
- perform an analysis of variance (ANOVA) on some series, which represent the usual steps that a Six Sigma practitioner should perform.
In modeFRONTIER®, it is possible to load external databases following a step by step wizard. With a few mouse clicks the number of defects per week of a certain number of production lines can be loaded and organized. It is actually possible to add new columns containing additional information, such as the total number and the average number of the defects per week.
|
|
|
Figure 4: The Correlation matrix together with the correlation ranking and the multi-history plot can be used to detect linear relations between variables.
|
In modeFRONTIER®, the user can find a series of graphical tools to represent data. It is important to remember that the combined use of different tools can help the user to improve the knowledge of the database, finding out relations between data and discovering trends.
For example, it is possible to simultaneously plot all series with a multi-history or a multi-history 3D chart; this way, one can check if common trends are present between series, compare data to the mean and check if there is any “outlier” series. In this example, it immediately appears that LizT has a different history with respect to the others.
Another useful tool is the Scatter plot, which can be used for example to plot the mean and other series over time together with the regression line, to understand the behavior of the process under study and try to estimate its future trend.
A tool which can provide a good synthetic view on the data is the Box-Whiskers. It provides classical information on the distribution of the data series (mean, standard deviation, etc…), including quartiles, confidence intervals of the means and outliers.
In this case, it can be seen that some production lines have a great variability (LizT and JohnB for example), others though have a low mean but quite a large variability (HugG and JessR) while another (TomC) has a low mean and variability but some outliers are present.
This graph may suggest further investigations in order to identify the reasons for unexpected variability of some production lines or the presence of outliers, in an attempt to improve the quality level.
The correlation matrix can help the user to find out if any linear relation between the variables is present or not. The correlation ranking is useful to immediately discover the most important relations: in this case, there in an inverse relation between HugG and the variable "Week". It means that the number of discards of the HugG production line tends to decrease with the time.
On the contrary there is a strong positive relation between JackN and LizT.
The Quantile-Quantile (Q-Q) plot is used to check if a data series is normally distributed. In this example, it can be seen that JackN has a good probability to be normally distributed, since its values are very closed to the diagonal of the graph. The Six Sigma methodology is based on the hypothesis that all the measured data series are normally distributed: therefore, it may become mandatory to verify if this assumption is still correct in the case under examination. The Q-Q plot and the histogram with the data fitting tool should be used for this objective.
Looking at the Box-Whiskers mentioned above, we can note that JohnB, LizT and TomC have very different means (15.833, 30.708 and 14.083 respectively) and similar standard deviations (7.872, 11.188 and 11.279 respectively).
One could be interested in understanding if, statistically, the difference in means of these series is significant or if it is due to random causes; in other words, one could conclude that LizT is worse than the other two lines?
The statistical tool which can help the user to answer this kind of question is the ANOVA. In modeFRONTIER® the ANOVA test can be performed very easily:
the user has just to choose the data
series to examine and a report containing all the relevant information is generated automatically.
Firstly, a verification is made to ensure that the standard deviations of the series of data are the same. For this purpose, the Hartley and Bartlett tests are performed. Then the classical ANOVA table is reported with some suggestions for interpreting the
results. In order to facilitate the understanding of data, the Box-Whiskers is plotted, together with the Multi Range Test and the Table of Means.
|
|
|
Figure 5: the Q-Q plot can be used to verify if a data series is distributed according to a given theoretical distribution.
|
Thanks to the ANOVA report, it is possible to conclude, with a statistical-based statement, that the LizT production line is definitely worse than the other two.
In a Six Sigma context, this kind of analysis often arises; it is really important to identify if differences in data distributions are due to random reasons or not.
The use of these graphical and statistical tools, particularly when large databases have to be considered, can definitely help the user to improve the knowledge of the processes and to optimize the same by focusing on real problems.
All the previous steps can be performed, very easily and fast; the database can be updated when new data are measured and, in this way, a real-time monitoring of processes is possible.
For any questions on this article please email to the author:
Massimiliano Margonari
EnginSoft S.p.A.
info@enginsoft.it
|