Risk and Reliability Improvement Analysis of Boiler System Using the Failure Mode Effect Analysis & Critical Analysis (FMECA) Method

ABSTRACT

boiler's working conditions, which call for it to be able to withstand high heat and pressure levels (Rahmania et al., 2020).
Given the significance of the boiler for a PLTU, it follows that if one of the system's engine components malfunctions or is damaged, it will have an impact on the operation of the production components and cause the PLTU to not operate (Iing, Irawan, & Azis, 2021). According to historical records for the period 2019-2021 from PT. PJB Service Kendari, PLTU Nii Tanasa Kendari frequently encounters disturbances that lead to excessive downtime and derating. Downtime is the interruption of operations of a system due to maintenance, intentional or unintentional hardware or software failure, or damage (Wahyuridho & Asep, 2022). Derating itself is a loss of power brought on by damage or interference.
There was a lot of downtime and derating of what happened to PLTU Nii Tanasa Kendari. Downtime and derating from the two producing units of PLTU Nii Tanasa totaled 46.95% in 2019. The percentage increased by 15% to 61.8% in 2020. After one year, in 2021, it decreases of 11.4%, bringing it 50.4%. According to research by (Iing, Arhami, & M., 2019)., disturbance or damage to the boiler system occurs the most frequently compared to other PLTU systems such the Turbine & Generator system, Heavy Vehicles, Coal Handling Transport System, Main Cooling System, and Water Treatment Plant. According to research conducted in 2019 (Brahim et al., 2019) disruption or damage to components in a system is mostly caused by a lack of knowledge of critical components in a system, this can become a problem if allowed to drag on dissolved and can cause system failure. In addition, the lack of precise mapping of problems, recommendations for maintenance systems and lack of evaluation of reliability also affect the level of performance of a system which has an impact on the high downtime (Iing et al., 2021). Service providers and customers both experience losses as a result of significant downtime and derating because this disrupts the delivery of electricity to customers. Identifying the root causes of failure, the impact of failure, and the potential impact of the failure can be done for the sake of pressing the downtime and derating in a system (Melani, Murad, Netto, Souza, & Nabeta, 2018), but this is not enough to know that alone; a risk analysis is required to improve availability, reliability and reduce the risks associated with operating a system (Brahim et al., 2019).
Research related to the identification of critical components along with their own risks and reliability has been carried out in recent years. But not much has been done simultaneously in a study. For example, in these studies, only identifying critical components (Melani et al., 2018;Putra & Purba, 2018;Singh, Singh, & Singh, 2019) then research related to evaluating reliability or calculating reliability values (Iing et al., 2019;Iing et al., 2021;Putri, Bahauddin, & Ferdinant, 2013). Research from 2012 reveals that reliability analysis can increase a product's operational reliability by extending component life and identifying important components and associated risks. As a result, improving the value of a system's reliability can be prioritized by focusing on the capability of critical components and its reliability value (Puthillath & Sasikumar, 2012) Before starting the research, it is necessary to review previous research related to the identification of critical components along with the risk and reliability of components. Risk analysis is used to find the causes of failures and to prevent such failures from occurring in the future. The results of the risk analysis can be used to optimize the process. Among the most commonly used risk analysis methodologies are: (Cristea & Constantinescu, 2017) 1. Failure Mode and Effect Analysis (FMEA).

Structured What-If Technique (SWIFT).
Failure Mode Effect Analysis (FMEA) has demonstrated that this method is capable of accurately identifying risks. The use of FMEA in risk management allows for effective risk control for both goods and components (Brahim et al., 2019). The FMEA approach can be used to identify crucial system components as well. It can improve decision-making, offer stronger assurances for addressing ISSN 1693-6590 Vol. 21, No. 1, 2023 Muhammad Hudzaly Hatala (Risk and Reliability Improvement Analysis of Boiler System Using the Failure Mode Effect Analysis & Critical Analysis (FMECA) Method) potential risks, and have an impact on the degree of process and component oversight (Hisprastin & Musfiroh, 2020). A system's risk analysis often has levels or criteria. According to definition, there are normally three levels of risk in a system (Suharjo, Suharyo, & Bandono, 2019) Table 1 provides more information. The implementation of the FMEA method in the research by (Putra & Purba, 2018) using historical data on the failure of the Boiler, Feed Water Pump, Electrical, Control System, Turbine, and other components in the 2013-2016 period will help analyze the causes of failure in general and enable service providers to consistently maintain the boiler power plant's primary component equipment. There's a disadvantage of FMEA method. FMEA's limitations prevent it from categorizing risks that will arise with complexities (Melani et al., 2018). The method's limited ability to improve the design is its next disadvantage (Brahim et al., 2019). With the advancement of research, a technique called Failure Mode and Effect Criticality Analysis (FMECA) has been developed that can spot components with complicated issues. Using FMECA, it is feasible to categorize the criticality of these components by methodically analyzing potential failure modes of product or process components, evaluating the risks connected to these failure modes, and determining the effects on system operations (Brahim et al., 2019;Mohanty, Dash, & Pradhan, 2020).
The component failure modes were ranked in a later study by (Singh et al., 2019) at the CA stage as a cumulative effect of severity (S), occurrence (O), and detection (D). Each associated failure is given a Risk Priority Number (RPN). For potential effects to be assessed, categorized, and prioritized, CA itself has clearly defined criteria (Mohanty et al., 2020). Risk often refers to the likelihood of some unfavorable occurrence happening and leading to several kinds of failures. To identify failure reasons and stop similar failures from happening in the future, risk analysis is utilized. FMEA/FMECA is one of the most popular methods for risk analysis (Cristea & Constantinescu, 2017).
Further research conducted by (Iing et al., 2019;Iing et al., 2021;Ismail, Alkaff, & Gamayanti, 2014;Putri et al., 2013) proves that reliability evaluation is considered important to determine the performance of a component, these studies show that reliability techniques are able to describe the actual engine performance, and is an evaluation material to improve the effectiveness of engine performance, reliability is one measure of the success of the maintenance system, therefore there is a correlation between the reliability value and the maintenance system for a machine. Reliability itself has several ways of calculating it starting from simulations and reliability calculations. Performance evaluation is required, both in terms of risk and reliability arising from components.
In research conducted by (Iing et al., 2021), according to him, the reliability value was obtained based on the time history data between damages or mean time to failure (MTTF) and repair time or mean time to repair (MTTR). In the early stages of Reliability analysis i.e. selecting the system to be analyzed, then it is necessary to classify the system into various levels such as assembly, subassembly and components. (Patil & Bewoor, 2020). After collecting failure data, criticality analysis, and all that is needed then identification of important parts is carried out. After carrying out a criticality analysis, the next step is to estimate the distribution parameters, and finally find the reliability characteristics (Patil & Bewoor, 2020). High reliability value reflects a good maintenance system carried out by a company. One way to improve reliability is to use Preventive Maintenance (PM), which is a treatment strategy that refers to a fixed period of treatment or under certain conditions of a component (Iing et al., 2019).
The contribution of this research is to combine risk analysis and reliability analysis, which are used to identify critical components along with risk and increase the reliability value by using the FMECA method and reliability calculations in order to address the problems described in the previous research and the fact that there has been no research that examines and discusses related to risk analysis, reliability, and reliability improvement strategies with systematic way.

Method
This research was conducted from July 2022 to January 2023 at the Nii Tanasa Kendari PLTU which is located in Konawe Regency, Southeast Sulawesi Province, Indonesia. Risk analysis is carried out with a series of FMECA methods. The data used in this method are primary data from discussions and interviews with respondents. The result of the FMECA method is the identification of critical components along with their risks and prevention recommendations. The critical component itself is determined based on the RPN value and critically score labeled "Very Critical" in the CA stage. The parameters of this stage can be seen in Table 2. Critically matrix itself becomes the final stage of the FMECA method, then all critical components will be calculated for reliability and reliability improvement using secondary data, namely time between failures (TTF) and repair time data (TTR).

Boiler System & Specification
The first stage in data collection was to find out the level of damage to existing systems at PLTU NII Tanasa Kendari as information material to be used later for FMECA data processing and reliability calculations. There were 15 systems running at PLTU Nii Tanasa Kendari which were then observed with a total frequency of damage for as many as 1113 during the last five years period, then with this data the most critical system can be determined using the Pareto diagram which can be seen in Figure  1.
Pareto chart on Fig. 1 serves to determine critical components in the Nii Tanasa Kendari PLTU system. It can be seen that there are six components based on the 80:20 pareto concept including the most critical, namely the Boiler System, Turbine & Generator system, Heavy Vehicles, Coal Handling Transport System, Main Cooling System and Water Treatment Plant, so this research will be devoted to system boilers because they have the largest percentage of 23.72% and supporting components for the water-steam-water cycle. Based on data obtained by researchers, the boiler used by the Nii Tanasa Kendari PLTU under the operation of PT PJB Service uses a stocker type boiler. The specifications can be seen more clearly in Table 3.

Functional Block Diagram
The results of the interviews and analysis in the form of information on the work processes of the boiler in the steam-water-steam cycle are poured into a Functional Block Diagram (FBD) which functions to illustrate the process flow and material flow of the boiler machine with a simple diagram. The following is a Functional Block Diagram (FBD) flow process and function material flow from the PLTU Nii Tanasa Kendari boiler engine, which can be seen in Fig. 2. ISSN 1693-6590 Vol. 21, No. 1, 2023

FMECA Result
FMEA is based on three factors, namely severity, occurrence, and detection, used to prioritize existing problems. In this case study the first stage, namely FMEA, will be focused on the boiler system in PLTU Nii Tanasa Kendari which has two boiler units because it has the highest damage frequency compared to other running systems. In this paper 31 boiler components from each are assessed using the Risk Priority Number (RPN). Failure modes, causes of failure, and failure effects of boiler components will be identified first. Table 4 shows the failure mode and effects analysis (FMEA) of the critical components of the PLTU Nii Tanasa Kendari boiler. After knowing the failure mode, the cause of failure, and the effect of failure then the process continues by finding the RPN value of each component. After obtaining the next RPN value, ranking and labeling the risks of each component using the risk level and critically score (Tanjung et al., 2019). All components that have a risk level of "Unaccepted" and critically level "Very critical" will be included in the list of critical components. the last stage of FMECA is the critically matrix. critically matrix is a graphical or visual means of identifying and comparing failure modes for all components in a given system or subsystem and their probability of occurring (ARMY, 2006). The critical function ISSN 1693-6590 Vol. 21, No. 1, 2023  of the matrix is to re-rank components that have the same RPN value by looking at the severity and occurrence values (Rahman & Fahma, 2021). The results of data processing with FMECA on unit 1 and unit 2 of the PLTU Nii Tanasa Kendari boiler can be seen Table 5.
Based on Table 5 in column unit 1 it can be seen that there are 18 components that have a risk level with the "unaccepted" category and 13 components with a "tolerable" risk level, then out of the 18 components there are seven components that have a critically level with the "very high" category. critical", these seven components can be included as critical components in boiler unit 1 of PLTU Nii Tanasa Kendari. In Column 2, it can be seen that there are 25 components that have a risk level in the unaccepted category, and only six components have a tolerable risk level. Of these 25 components, they are then prioritized again using the critical score value, the component that has a critical level. with the category of "very critical" as many as six components. All components that have the same RPN value will be sorted based on the critical matrix and included as unit 2 critical components. All components with a "tolerable" risk level require a review for hazard acceptability from operators, HSE staff, and maintenance staff at PLTU Nii Tanasa Kendari, but the concern is from the results of risk analysis using the FMECA method where there are 18 components in boiler units 1 and 25 components in boiler unit 2 which have a risk level that is in the "Unaccepted" category (cannot 15 Vol. 21, No. 1, 2023, pp.  be tolerated) so that all of these components must be given special attention and be the focus in terms of maintenance because it can have a major impact on derating (loss of power) and can even result in downtime due to damage to components accompanied by hazards to the environment, humans and the components themselves.
The results of the FMECA worksheet based on RPN ranking, critical score, and critical level is prioritized again using a critical matrix if they have the same RPN value, namely by paying attention to the severity and occurrence parameter values, where this is the final stage of data processing using the FMECA method. The results of processing the FMECA worksheet are then calculated for reliability. Results Identification of critical components can be seen in Table 6. Based on Table 6 as we can see that there are 13 critical components of the PLTU Nii Tanasa boiler system which have a critical score with the "very critical" category in the boiler unit 1, each of the critical components, namely the coal spreader, forced draft fan, water ejector pump, steam drum, economizer, boiler feed pump and condenser, while in boiler unit 2, namely coal spreader, water ejector pump, coal feeder (scrapper), forced draft fan steam drum and boiler feed pump.

Reliability Improvement Result
The first stage in calculating the value of reliability is to identify the initial distribution of data time to failure (TTF) and time to repair (TTR) for critical components (index of fit) using the Least Square Curve Fitting method, then the selected distribution will be tested for goodness-of-fit data. of fit) with two testing tools, namely the Anderson Darling test and Pearson Correlation, after getting the results then proceed with determining the parameters using the Maximum Likelihood Estimated using the Minitab 19 program. A recapitulation of the distribution and TTF data parameters can be seen in Table 7 and TTR data can be seen in Table 8.
After getting the distribution and parameters of the TTF and TTR data to be used, then calculating the value of MTTF, MTTR. Reliability itself can be increased in several ways according to (Ebeling, 1997) increased reliability with the Age replacement method can be carried out, which means that after repairing the components back to the initial condition (R(t-nT)) and increasing reliability can also be done by using preventive maintenance (Rm (t)). An example of the MTTF calculation for the Water Ejector Pump Unit 1 with the Weibull distribution can be seen in equation (1) as follows. (1) ISSN 1693-6590 Vol. 21, No. 1, 2023  Calculation of MTTR with a lognormal distribution can be seen in equation (2) An example of calculating the increase in reliability with corrective action based on the MTTR value so that the component returns to its initial condition (R(t-nT)) in the Weibull distributed Water Ejector Pump Unit 1 component can be seen in equation (4). All results of calculations for increasing the reliability of the critical components of the boiler unit 1 and unit 2 can be seen in Table 9. Based on the Table 9, it can be seen that using preventive maintenance provide increased reliability of critical components. Several critical components that experienced increased reliability, namely coal spreader unit 1, increased reliability from 28.04% to 30% with preventive maintenance, so that component reliability is maintained above 60%. It is recommended to perform preventive maintenance or replace components with maintenance time intervals of 25 days. Water ejector pump unit 1 increased reliability from 38% to 42%, so that component reliability is maintained above 60% it is recommended to perform preventive maintenance or replace components with maintenance intervals of 37 days, condenser unit 1 increased reliability from 50% to 54%, so that component reliability is maintained above 60%. It is recommended to do preventive maintenance or replace components with maintenance time intervals of 36 days, economizer unit 1 has increased reliability from 52% to 99%, so that component reliability is maintained above 60% then recommended for to carry out preventive maintenance or component replacement with maintenance intervals of 46 days. Water ejector pump unit 2 has increased reliability from 39% to 41%, so that component reliability is maintained above 60%, it is recommended to perform preventive maintenance or component replacement at intervals maintenance is 45 days, boiler feed pump unit 2 has increased reliability from 50% to 51%, so that component reliability is maintained above 60%, it is recommended to perform preventive maintenance or replace components with maintenance intervals of 74 days.
However, not all components in the implementation of preventive maintenance experience an increase in reliability due to the characteristics of the failure rate and the type of distribution of each component, as occurs in components with a Weibull distribution where components that have a β (Shape) value below 1 or 0 < β < 1 there is boiler feeds pump unit 1 with a β (Shape) value of 0.947022, coal spreader unit 2 with a β (Shape) value of 0.243116 and coal feeder (scrapper) unit 2 with a β (Shape) value of 0.518609, all of these components have a characteristic failure rate called Decreasing Failure Rate (DFR) so that preventive maintenance has no effect on these components, referring to ISSN 1693-6590 Vol. 21, No. 1, 2023  the bathtub curve in phase 1, namely the initial damage (burn-in/early failures/wear in region) (Ebeling, 1997), which means that all of these components are still in good condition, so that the components work with the reliability above 60%, it is recommended to replace components periodically so that the components return to their initial conditions (R(t-nT)) for the boiler feed pump unit 1 component it is recommended to replace components at intervals of 40 days, for the coal spreader unit 2 component it is recommended to replace components at intervals time is 31 days, coal feeder (scrapper) unit 2 component is recommended to replace components at intervals of 40 days. In addition to the parameter β (Shape), the components with a lognormal distribution also have a DFR failure rate characteristic so that preventive maintenance does not affect these components, as happened to the components of the forced draft fan unit 1 and steam drum unit 1 so, these components can work with a reliability above 60%, it is recommended to replace components periodically so that the components return to their initial conditions (R(t-nT)) for the forced draft fan unit 1 component it is recommended to replace components at intervals of 31 days and the steam drum unit 1 component is recommended to replace components with a time interval of 50 days. Component forced draft fan Unit 2 also did not experience an increase in the reliability value by using preventive maintenance (Rm(t)); this was because the MTTR value obtained was 89 days, preventive maintenance did not show an increase, but to maintain the reliability value above 60%, it is recommended to replace components periodically so, that the components return to their initial conditions (R(t-nT)). Components that also do not experience an increase in reliability are components that have an exponential distribution, namely the steam drum unit 2, this is due to a characteristic failure rate called the Constant Failure Rate (CFR) which refers to the bathtub curve. The CFR phase is characterized by a constant breakdown rate. This phase is often also called the Useful Life Period, in this phase damage is difficult to predict and tends to occur randomly (Ebeling, 1997), meaning that during this period the components are difficult to predict so that preventive maintenance has no effect. so that the components of the steam drum unit 2 can work with a reliability above 60%, it is recommended to replace components periodically with replacement intervals of 25 days, so that the components return to their initial conditions (R(t-nT)).

Conclusion
Based on the description and analysis that has been carried out in the previous chapter, several conclusions can be drawn as follows. The critical system selected at PLTU Nii Tanasa Kendari is the system that has the greatest failure frequency based on the Pareto diagram, namely the boiler system. Then from this critical system, critical components are re-selected. Determination of critical components in the boiler using the FMECA method, the results obtained are based on a critical score in the "very critical" category. All components with this label are included as critical components of each unit. Reliability calculations are performed to evaluate the performance of each critical component of the boiler system based on the MTTF value, then recommendations for reliability improvement are calculated based on the MTTR value and implementation of preventive maintenance.