# ChiMerge

[https://sci2s.ugr.es/keel/pdf/algorithm/congreso/1992-Kerber-ChimErge-AAAI92.pdf](https://sci2s.ugr.es/keel/pdf/algorithm/congreso/1992-Kerber-ChimErge-AAAI92.pdf)

ChiMerge Algorithm uses Chi-squared statistic to discretize attributes (numeric). In toad, we firstly transform Char/Object attributes to numeric with WOE function. The Algorithm  is clear in paper (i.e. ChiMerge Algorithm Part).

# Stepwise Regression

[https://link.springer.com/article/10.1007%2FBF02576123](https://link.springer.com/article/10.1007%2FBF02576123) [1]

[https://www.sciencedirect.com/science/article/pii/S0950584917305153?via%3Dihub](https://www.sciencedirect.com/science/article/pii/S0950584917305153?via%3Dihub) [2]

[http://www.jstor.org/stable/1434071](http://www.jstor.org/stable/1434071)[3]

Stepwise Regression （Forward/Backward/Stepwise, i.e. [2] 3.6. Stepwise Linear Regression）is uesed to reduce Low Information Gain Attributes and simplify the Final Model.

The Stepwise Regression Process[2]:

```eval_rst
.. image:: images/stepwise.png
   :width: 80%
   :align: center
```

# Scorecard Transformation

John Wiley & Sons, Inc., *Credit Risk Scorecards Developing and Implementing Intelligent Credit Scoring* (Final Scorecard Production Part)


Formula:

Score = Offset + Factor ∗ ln (odds)                    #odds: good:bad

Score + pdo = Offset + Factor ∗ ln (2 ∗ odds)   # pdo: points to double the odds

==>

pdo = Factor ∗ ln (2),  

Factor = pdo / ln (2);

Offset = Score - Factor ∗ ln (odds)

For example, if a scorecard were being scaled where the user wanted

odds of 50:1 at 600 points and wanted the odds to double every 20

points (i.e., pdo = 20), the factor and offset would be:

Factor = 20 / ln (2) = 28.8539

Offset = 600 – 28.8539 * ln (50) = 487.123

==>

Each score corresponding to each set of odds:

Score = 487.123 + 28.8539 * ln (odds)

Scorecard is developed with WOE as input, the formula can be modified as:

```eval_rst
.. image:: images/scorecard.png
   :width: 80%
   :align: center
```

WOE = weight of evidence for each grouped attribute

β = regression coefficient for each characteristic

a = intercept term from logistic regression

n = number of characteristics

k = number of groups (of attributes) in each characteristic