Cbr Case Based Reasoning
Then the proposed solution Is tried out. If the solution succeeds, then it is stored as a working solution in the case memory; if it fails, the working solution must be repaired and tested again. Therefore, there are four key issues in the developing of any CUB system, namely: (a) case representation and identifying key features, (b) indexing and retrieving similar cases from the case memory, (c) measuring case similarity to select the best match, and (d) modifying the existing solution to fit the new problem. CUB has already been applied in a number of different applications in medicine.
CUB is appropriate in medicine for some important reasons; cognitive adequateness, explicit experience, duality of objective and subjective knowledge, automatic calculation of subjective knowledge, and system integration . Some real CUB-systems are: CASEY that gives a diagnosis for the heart disorders , AS. 52 which is a diagnostic support system for dystrophy’s syndromes, ONION is a renal function monitoring system, CLOYS that gives a consultation for a liver transplanted patient  and ICONS that presents a suitable calculated antibiotics therapy advise for intensive care patients .
This paper presents the development of a case-based expert system prototype for supporting diagnosis of heart diseases and Is organized as follows: the next section deals with the domain and knowledge calculation. Extraction. Section 4 deals with case indexing and retrieval strategies. Section 5 presents the experimental results, and finally, section 6 summarizes the most important findings. Abstract In this paper, we have used the Case Based Reasoning methodology to develop a case-based expert system prototype for supporting diagnosis of heart diseases. 10 cases were collected for 4 heart diseases namely; material tensions, left-sided heart failure, stable angina vectors and essential hypertension. Each case contains 207 attributes concerning both demographic and clinical data. After removing the duplicated cases, the system has trained set of 42 cases for Egyptian cardiac patients. Statistical analysis has been done to determine the importance values of the case features. Two retrieval strategies were investigated namely; induction and nearest-neighbor approaches.
The results indicate that the nearest neighbor is better than the induction strategy, where the retrieval accuracy were 100% and 53. 8% especially. Cardiologists have evaluated the overall system performance where the system was able to give a correct diagnosis for thirteen new cases. Keywords: Expert Systems, Case-Based Reasoning, Medical Informatics. 1. Introduction Case-Based Reasoning (CUB) is a general artificial intelligence paradigm for reasoning from experience. CUB methodology has been investigated in improving human decision-making and has received much attention in developing knowledge-based systems in medicine .
A special issue that includes papers on CUB theory and applications was published [8, 9]. Unlike the rotational rule-based approach in which expert knowledge must be represented in “if-then” rules, a seascapes approach allows knowledge to be grouped and stored as cases. The development of this approach has surged as a key tool for developing a new generation of expert systems . Following to the CUB approach, when a new problem is introduced to the system, the (THE). 110 cases were collected and each case comprises 207 attributes. Cases were represented using the concept of frames. Table shows portion of the case memory. . Domain and Knowledge Acquisition Heart disease is a vital health care problem, affecting lions of Egyptians each year. We have classified heart diseases to 25 different ones, for example: left-sided heart failure, right-sided heart failure, stable angina vectors, unstable angina vectors, essential hyper tension and material Stetsons . Diagnostic knowledge concerns the way in which a diagnosis is performed and it is distinguished in two types. The first type, procedural diagnostic knowledge, reflects the diagnostic procedure and is based on demographic and clinical data of the patient.
This is then used either to ask for an EGG for the patient’s heart or not. The second type of diagnostic knowledge, heuristic diagnostic knowledge, represents experience accumulated through years and concerns the way an expert uses the patient data to make diagnoses. We acquired heuristic knowledge by interviewing experts in the field and constructed a diagnostic tree based on criteria such as the sex and the age of the patient, the existence, acuteness and the duration of symptoms (e. G. Pain, fever) etc. A major relationship in the heart diseases domain is the “cause” relationship.
Defining attributes that describe different classes (diseases) is considered simple task if the classes are similar to real world classes. For example, age, sex, marital status, chest pain. These real parameters will be the used attributes. 3. 1 . Assigning Importance Values to Case Features Features weights for most problem domains are context dependent. The weight assigned to each feature of the case tells how much attention to pay to matches and mismatches in the field when computing the distance measure of a case. One way to assign importance values is to have a human expert assign them as the case library is being built.
The expert might have some feeling about which dimension and combinations of dimensions make good predictors. Another way to assign importance aloes is to do a statistical evaluation of a known corpus outcomes and/or solutions best. Those that are good predictors are then assigned higher importance for matching . Both methods are applied to assign importance values to different features (attributes) in the case base. Each attribute Ai is assigned a number SF called a significance factor, corresponding to the assigned weight of this attribute.
Each significance factor represents the significance of the corresponding attribute in drawing the conclusion. All attributes with their corresponding significance factors are asserted in the system’s knowledge base. Figure 1 illustrates a sample of the assigned values of the different features. The figure reflects the statistical study of the different parameters. The knowledge of physicians consists of general knowledge they have obtained from medical books plus their experiences connected with cases they have treated themselves or colleagues have told them about.
Particularly in diagnostic tasks; the thoughts of physicians circle around typical cases. They consider the differences between a current patient and typical or known exceptional cases. The main purpose of such generalized knowledge is to guide the retrieval process ND sometimes to decrease the amount of memory requirements by erasing redundant cases . 4. Case Indexing & Retrieval In this paper, we focus our discussion on case indexing and retrieval strategies. Case indexing and retrieval are two separate but closely related processes.
Since a case memory may contain thousands of cases, case indices organize their key features to expedite the search process. Case retrieval searches the case base to find candidate cases that share significant features with the new case. Existing literature in case-based reasoning has proposed several mechanisms for case indexing and retrieval. A good review of early literature can be found in . Much of the effort in building a case-based system goes into case collection. If it is impossible or especially difficult to collect cases, then case based reasoning will be too difficult, or impossible, to apply.
Cases need to be collected according to the needs of the system, that is, provide as much coverage as possible about achieving reasoning goals. The first pass at building a case library is often unlikely to provide full coverage. So refinement of this pass is required to achieve the required coverage 4. 1 . Retrieval Using Nearest-Neighbor Technique The intended system is about to suggest a diagnosis for a new patient case, given all the patient’s report attributes (features). Several closest match cases will be retrieved.
The system’s accuracy, precision, speed, and efficiency thus depend on its ability to recall a set of cases that are significantly similar to the new problem case. The features of the input case are assigned as indices characterizing the case. These indices are used to retrieve a similar past case(s) from the case-memory. The system uses the Nearest-neighbor algorithm that finds the closest matches of the cases already stored in the database to the new case using a distance calculation, which determines how similar two cases are by 3.
Case Representation & Features Extraction In the present work, cases has been collected from expert doctors in heart domain from EL-Maid Military Egyptian hospital, health insurance Egyptian institute, Lazar University doctors and medical reference books. The cases were collected for these diseases: Material Stetsons (MS), Left-sided Heart Failure (LASH), Stable Angina Vectors (SAP) and Essential Hypertension comparing their features, the pseudo code of this algorithm  can be written as follows:
Inductive retrieval obviously depends on preceding, since the decision tree is made off-line before retrieval can start. This is also time consuming process for a large case-base, and it has to be redone every time a new case is added to the case-base. However, retrieval times using inductive indexing trees are extremely quick and only increase slowly as the number of cases in the case base increases. One major disadvantage of this technique is if case all.
That means one of the questions in the tree has no answer and so the sequence cannot be completed. Therefore, no case can be retrieved. For each feature in the input case : Find the corresponding feature in the stored case Compare the two values to each other and compute the degree of match Multiply by a coefficient representing the importance of the feature to the match Add the results to derive an average match score This number represents the degree of match of the old case to the input.
A case can be chosen by choosing the item with the largest score. 5. SYSTEM VERIFICATION & RESULTS The main distance measure equation used is I stern(f where: WI Simi fill,fir System verification in case using nearest neighbor approach includes the following tests : (a) check case travel accuracy, (b) check retrieval consistency, (c) check for case duplication and (d) global tests. (a) Check case retrieval accuracy, which means that if the case-base is queried with one of its cases, it should give the same case with distance measure, equals 100%.
Figure 4 shows the distance measure against the case identification number, where it is clear that only one case (b) Check retrieval consistency, which means that if exactly the same search has been performed twice, the same source cases should be retrieved with the same accuracy. (c) Check for case duplication in which a case should exactly match itself, but should not be identical to other cases. (d) Global tests that are important to verify the overall performance of the system.
The steps of this test are as follows: (1) did the system retrieve a useful set of cases? (2) Was the retrieval time acceptable?. Table 2 shows the answer to the above questions, for 13 patients. From the table we notice that the results on using the induction algorithm were not satisfactory as the system come to a correct diagnosis for only 53. 8%. A disadvantage of this technique is it can not deal with missing data so a wrong case may be retrieved or known, it may not retrieve a case at all.
On the other hand, the results on using the nearest neighbor algorithm were acceptable so the answer of the first question is yes. In experiments 1, 2, 3, 5, 6, 7, 8, 9 and 10 the resulted diagnosis of all the closest retrieved cases are the same so this estimates that the query case has the same diagnosis, which is true. But, in experiments 4, 11, 12 and 13 there were two different outputs.
So, for example in experiment 4 there were two resulted outputs (diagnosis) suggested for the query case: essential hypertension (THE) and stable angina vectors (SAP). We overcame this by using K-Nearest Neighbor algorithm, where K represents the number of retrieved cases which is always an odd number and then voting for the most probable class (diagnosis), so there will be one vote for stable angina vectors and two votes for essential hypertension. So the result is in favor of essential hypertension which is the actual diagnosis.
The answer of is the importance of the feature (slot) I are the values for feature f I in the source and target cases, respectively is the number of attributes in each case The similarity function is defined as follows: (1 FL, f IR l/ I fem.- fin l) if feature FL is numeric if feature FL is symbolic and fill IR if feature FL is symbolic and fill = f IR So, the weight is introduced in the case retrieval and the similarity between cases is considered to be the weighted summation of the similarity between attributes.
Then cases whose descriptions are similar to a new situation are ranked higher than those whose descriptions are less similar, and those cases whose ranks are higher than a specified threshold (The=O. 5) are displayed. Figure 2 shows the results of the distance measure between the cases and one query case. The numbers in the X-axis is just a serial number of the cases and the Y-axis represents the distance measure. Cases are sorted according to the distance measure. 4. 2. Retrieval Using Induction Technique An alternative retrieval technique involves a process called induction.