Posts

XGBoost Learning

這是給自己的一份學習紀錄，以免日子久了忘記這是甚麼理論XD 🦹 XGBoost Boost What is XGBoost? Think of XGBoost as a team of smart tutors, each correcting the mistakes made by the previous one, gradually improving your answers step by step. 🗝 Key Concepts in XGBoost Tree Building Start with an initial guess (e.g., average score). Measure how far off the prediction is from the real answer (this is called the residual). The next tree learns how to fix these errors. Every new tree improves on the mistakes of the previous trees. 🥢 How to Divide the Data (Not Randomly) XGBoost doesn’t split data based on traditional methods like information gain. It uses a formula called Gain, which measures how much a split improves prediction. A split only happens if: (Left + Right Score) > (Parent Score + Penalty) ❓ How do we know if a split is good? Use a value called Similarity Score The higher the score, the more consistent (similar) the residuals are in that group 🐢 Two Ways to Find Splits: Accurate- Exact Greedy Algorithm Try all possible features and split points Very accurate but very slow 🐇 Two Ways to Find Splits: Fast- Approximate Algorithm Uses feature quantiles (e.g., median) to propose a few split points Group the data based on these splits and evaluate the best one Two options: Global Proposal: use global info to suggest splits Local Proposal: use local (node-specific) info 🏋 Weighted Quantile Sketch Some data points are more important (like how teachers focus more on students who struggle) Each data point has a weight based on how wrong it was (second-order gradient) Use these weights to suggest better and more meaningful split points 🕳 Handling Missing Values What if some feature values are missing? XGBoost learns a default path for missing data This makes the model more robust even when the data isn’t complete 🧚‍♀️ Controlling Model Complexity: Regularization Gamma (γ) ...

Linear Discriminant Analysis Learning

這是給自己的一份學習紀錄，以免日子久了忘記這是甚麼理論XD

Naive & Gaussian Bayes Learning

這是給自己的一份學習紀錄，以免日子久了忘記這是甚麼理論XD 👶 Naive Bayes By definition of Bayes’ theorem $$ P(y \mid x_1, x_2, …, x_n) = \frac{P(y)P(x_1, x_2, …, x_n \mid y)}{P(x_1, x_2, …, x_n)} $$ where $P(y)$ represents the prior probability of class $y$ $P(x_1, x_2, …, x_n \mid y)$ represents the likelihood, i.e., the probability of observing features $x_1, x_2, …, x_n$ given class $y$ $P(x_1, x_2, …, x_n)$ represents the marginal probability of the feature set $x_1, x_2, …, x_n$ With the assumption of Naive Bayes - Conditional Independence $$ P(x_i \mid y, x_1, …, x_{i-1}, x_{i+1}, …, x_n) = P(x_i \mid y) $$ ...

Support Vector Machine Learning

這是給自己的一份學習紀錄，以免日子久了忘記這是甚麼理論XD 🪡 Support Vector Machine

GradientBoost Learning

這是給自己的一份學習紀錄，以免日子久了忘記這是甚麼理論XD 🎲 Gradient Boost

Decision & Classification Tree Learning

這是給自己的一份學習紀錄，以免日子久了忘記這是甚麼理論XD 🤔 What is decision tree? Decision tree is a system that relies on evaluating conditions as True or False to make decisions, such as in classification or regression. When the tree needs to classify something into class A or class B, or even into multiple classes (which is called multi-class classification), we call it a classification tree; On the other hand, when the tree performs regression to predict a numerical value, we call it a regression tree. ...

Statistical Computing HW_0320

This is homework (1) 已知： $$X_1, X_2, …, X_n \overset{\text{iid}}{\sim}p(x)$$ 計算： $$ E( \hat{I}_M)=E\left[\frac{1}{n} \sum^n_{i=1} \frac{f(X_i)}{p(X_i)} \right]=\frac{1}{n}E\left[ \sum^n_{i=1} \frac{f(X_i)}{p(X_i)} \right] $$ 對於每個獨立的 $X_i$ ，我們只要計算： $$E\left[\frac{f(X_i)}{p(X_i)} \right]$$ 因此： $$ E\left[\frac{f(X)}{p(X)} \right] = \int^b_a\frac{f(x)}{p(x)}p(x)dx =\int^b_af(x)dx = I $$ 可知 $$E\left[\frac{f(X_i)}{p(X_i)} \right] =I,　\forall i $$ 所以 $$ E(\hat{I}_M) =\frac{1}{n}\sum^n_{i=1}I=I $$ (2) 計算變異數 $$Var(\hat{I}_M)=E\left[(\hat{I}_M-I)^2\right]$$ 因為 $$ \begin{aligned} Var(\widehat{I}_M) &= Var\left(\frac{1}{n} \sum_{i=1}^{n} \frac{f(X_i)}{p(X_i)}\right) = \frac{1}{n}Var\left(\frac{f(X)}{p(X)}\right) \\ &= \frac{1}{n}\left(E\left[\left(\frac{f(X)}{p(X)}\right)^2\right]-I^2\right) \end{aligned} $$ 已知 $$E\left[\left(\frac{f(X)}{p(X)}\right)^2\right] < \infty$$ ...

Logistic Regression Learning

這是給自己的一份學習紀錄，以免日子久了忘記這是甚麼理論XD Logistic Function (aka logit, MaxEnt) classifier, which means that it is also known as logit regression, maximum-entropy classification(MaxEnt) or the log-linear classifier. In this model, the probabilities from the outcome of predictions is using a logistic function. And what is logistic function? Let talk about it. Here comes from Wikipedia: A logistic function or a logistic curve is a commond S-shaped curve (sigmoid curve) with the equation: $$ f(x) = \frac{L}{1+e^{-k(x-x_o)}}$$ where: ...

AdaBoost Learning

這是給自己的一份學習紀錄，以免日子久了忘記這是甚麼理論XD 🪴AdaBoosting a popluar boosting algorithm, introduced in 1995 by Freund and Schapire.

RandomForest Learning

這是給自己的一份學習紀錄，以免日子久了忘記這是甚麼理論XD 🌳 隨機森林基本概要由多棵決策樹聚集而成的森林方法: 從原始資料中以取後放回的方式抽取資料，建立每棵決策樹的訓練資料(training datasets) 因此有些樣本會被重複選中，這樣的抽樣法又稱為Boostraping(拔靴法) 但是當原始資料的數據龐大時，會發現在抽樣完畢後，有些樣本並沒有被抽取到而這些樣本就被稱為Out of Bag(OOB)資料(袋外) 除了上述訓練資料是被重複抽取之外，特徵也是如此隨機森林並不會將所有特徵一起考慮，而是會隨機抽取特徵(可設定參數max_features) 進行每棵樹的訓練，以上述兩種方式來達到每棵樹近乎獨立的情況，目的是降低每棵樹之間的高度相關性，優點是提高模型的泛化能力，防止過擬合(Overfitting)的情況發生、增進預測穩定性與準確度演算法: 隨機森林的演算法與決策樹的演算法的核心概念是一樣的，差別只是在建立樹的方法不同而已(如上述) 意即當應用在分類問題時，則採用吉尼不純度(Gini Impurity)或是鏑(Entropy)演算法作為分類依據當應用在迴歸問題時，通常採用最小平方誤差(MSE)(其他還有卜氏Possion)