這是給自己的一份學習紀錄,以免日子久了忘記這是甚麼理論XD

Logistic Function (aka logit, MaxEnt) classifier, which means that it is also known as logit regression, maximum-entropy classification(MaxEnt) or the log-linear classifier.
In this model, the probabilities from the outcome of predictions is using a logistic function.


And what is logistic function?

Let talk about it.

Here comes from Wikipedia:

A logistic function or a logistic curve is a commond S-shaped curve (sigmoid curve) with the equation: $$ f(x) = \frac{L}{1+e^{-k(x-x_o)}}$$ where:

  • $L$ is the supremum of the values of the function
  • $k$ is the logistic growth rate, the steepness of the curve
  • $x_0$ is the $x$ value of the function’s midpoint

中譯:

  • $L$ 是該函數(系統)一個最大上限,當 $x \to 0$ 時,則 $ f(x) \to L$
  • $k$ 該曲線的陡峭程度,$k$ 愈小則分母愈大,整個 $f(x)$愈小,表示中間變化速度緩慢;反之則中間變化速度快
  • $x_0$ 曲線當中變化速度最快的一點

我們可以簡化該方程式:

The standard logistic function, where $L=1, k=1, x_0=0$, has the euqation: $$ f(x) = \frac{1}{1+e^{-x}}$$

and is also called sigmoid function, and it looks like:

targets We can know that:

  • When $x=0, f(0)= \frac{1}{2}$
  • When $x=\infty, f(\infty)= 1$
  • When $x=-\infty, f(-\infty)=0$

Therefore, we use this function because the logistic function ranges between 0 and 1 and is relatively simple among smooth functions


But how does logistic regression find the best estimators for making predictions?

Here is the process

1. Target

We suppose that: $$ f(X) = P(Y=1 \mid X) = \frac{1}{1+e^{-(W^TX)}} $$ and we should know that:

$$ P(Y\mid X) = f(X) \text{if } Y=1 $$

$$ P(Y\mid X) = 1 - f(X) \text{if } Y=0 $$


2. How to

Logistic regression uses the likelihood function to estimate parameters, allowing the model to make more precise predictions. So we define likelihood function: $$ L(W, b) = \Pi^n_{i=1}P\left(y_i \mid x_i \right) $$


3. Mathematical

Then we plug $P(Y\mid X)$ into the formula, so we have: $$ L(W, b) = \Pi^n_{i=1}\left(\frac{1}{1+e^{-(W^TX)}}\right)^{y_i}\left(1-\frac{1}{1+e^{-(W^TX)}}\right)^{1- y_i} $$ and we take the log of likelihood function(log-likelihood): $$ lnL(W, b) = \Sigma^n_{i=1}\left[y_iln\left(\frac{1}{1+e^{-(W^TX)}}\right)\right] + (1- y_i)ln\left(1-\frac{1}{1+e^{-(W^TX)}}\right)$$

then we use calculus and gradient descent to find the optimal parameter W. Finally, we ensure that the likelihood function reaches its maximum, which corresponds to the MLE solution.


In my opinion

It is easy to see that using linear regression for predicting or classifying binary classes can be highly influenced by outliers.
A small change in the data can cause a data point to switch from Class 1 to Class 2 unpredictably, which is unreasonable. To address this issue and improve the regression model for binary classification, we use a logistic function that better fits the binary class data.


🔧 Modeling with sklearn.LogisticRegression

import pandas as pd
import seaborn as sns
import numpy as np
import matplotlib.pyplot as plt
coloumns_name = ['Length', 'Left', 'Right', 'Bottom', 'Top', 'Diagonal']
data = pd.read_csv('bank2.dat', delim_whitespace=True, header=None, names=coloumns_name)
data.head()
--------------------------------------------------
   Length   Left  Right  Bottom   Top  Diagonal    Class
0   214.8  131.0  131.1     9.0   9.7     141.0  Genuine
1   214.6  129.7  129.7     8.1   9.5     141.7  Genuine
2   214.8  129.7  129.7     8.7   9.6     142.2  Genuine
3   214.8  129.7  129.6     7.5  10.4     142.0  Genuine
4   215.0  129.6  129.7    10.4   7.7     141.8  Genuine
data.info()
--------------------------------------------------
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 200 entries, 0 to 199
Data columns (total 7 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   Length    200 non-null    float64
 1   Left      200 non-null    float64
 2   Right     200 non-null    float64
 3   Bottom    200 non-null    float64
 4   Top       200 non-null    float64
 5   Diagonal  200 non-null    float64
 6   Class     200 non-null    object 
dtypes: float64(6), object(1)
memory usage: 11.1+ KB
data.isna().sum()
--------------------------------------------------
Length      0
Left        0
Right       0
Bottom      0
Top         0
Diagonal    0
Class       0
dtype: int64
data.describe().T
--------------------------------------------------
          count      mean       std    min    25%     50%      75%    max
Length    200.0  214.8960  0.376554  213.8  214.6  214.90  215.100  216.3
Left      200.0  130.1215  0.361026  129.0  129.9  130.20  130.400  131.0
Right     200.0  129.9565  0.404072  129.0  129.7  130.00  130.225  131.1
Bottom    200.0    9.4175  1.444603    7.2    8.2    9.10   10.600   12.7
Top       200.0   10.6505  0.802947    7.7   10.1   10.60   11.200   12.3
Diagonal  200.0  140.4835  1.152266  137.8  139.5  140.45  141.500  142.4
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, LabelEncoder
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import confusion_matrix, accuracy_score, classification_report

X = data.drop(columns=['Class'])
y = data['Class']

X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42, test_size=0.2)

scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

lr = LogisticRegression(random_state=42, penalty=None)
model = lr.fit(X_train_scaled, y_train)

y_pred = model.predict(X_test_scaled)
y_proba = model.predict_proba(X_test_scaled)
print(confusion_matrix(y_test, y_pred))
print(accuracy_score(y_test, y_pred))
--------------------------------------------------
[[19  0]
 [ 1 20]]
0.975
print("\nModel Accuracy:", model.score(X_test_scaled, y_test))

print(classification_report(y_test, y_pred))

Model Accuracy: 0.975

              precision    recall  f1-score   support

 Counterfeit       0.95      1.00      0.97        19
     Genuine       1.00      0.95      0.98        21

    accuracy                           0.97        40
   macro avg       0.97      0.98      0.97        40
weighted avg       0.98      0.97      0.98        40

🔨 Modleing with statsmodels.Logit

note:
因為使用該模型存在beta不收斂的問題
所以在fit的部分選擇使用fit_regularized
其中method設定加入l1懲罰, alpha(l1的權重)設0.01
summay才會正常跑出數值

constant = sm.add_constant(X)

model = sm.Logit(y, constant)   
result = model.fit_regularized(method='l1', alpha=0.01)

print(result.summary())
--------------------------------------------------
Optimization terminated successfully    (Exit mode 0)
            Current function value: 0.002174041962487562
            Iterations: 131
            Function evaluations: 136
            Gradient evaluations: 131
                           Logit Regression Results                           
==============================================================================
Dep. Variable:                      y   No. Observations:                  200
Model:                          Logit   Df Residuals:                      193
Method:                           MLE   Df Model:                            6
Date:                Thu, 20 Mar 2025   Pseudo R-squ.:                  0.9994
Time:                        16:39:59   Log-Likelihood:              -0.083416
converged:                       True   LL-Null:                       -138.63
Covariance Type:            nonrobust   LLR p-value:                 6.587e-57
==============================================================================
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
const       7.851e-18   1.11e+04   7.07e-22      1.000   -2.18e+04    2.18e+04
Length        -4.8724     46.740     -0.104      0.917     -96.481      86.737
Left        6.129e-16     83.355   7.35e-18      1.000    -163.372     163.372
Right        -6.8e-16     80.137  -8.49e-18      1.000    -157.066     157.066
Bottom       -11.9135     14.936     -0.798      0.425     -41.187      17.360
Top           -9.3913     15.731     -0.597      0.551     -40.224      21.441
Diagonal       8.9620     10.880      0.824      0.410     -12.362      30.286
==============================================================================

Possibly complete quasi-separation: A fraction 0.95 of observations can be
perfectly predicted. This might indicate that there is complete
quasi-separation. In this case some parameters will not be identified.
c:\Users\USER\anaconda3\Lib\site-packages\statsmodels\base\l1_solvers_common.py:71: ConvergenceWarning: QC check did not pass for 1 out of 7 parameters
Try increasing solver accuracy or number of iterations, decreasing alpha, or switch solvers
  warnings.warn(message, ConvergenceWarning)
c:\Users\USER\anaconda3\Lib\site-packages\statsmodels\base\l1_solvers_common.py:144: ConvergenceWarning: Could not trim params automatically due to failed QC check. Trimming using trim_mode == 'size' will still work.
  warnings.warn(msg, ConvergenceWarning)