For a training set X of \(m\) samples, each sample have \(n\) features. \(y\) is the responder. Both X and y can be written in matrix form as:

We have log likelihood of logistic regression:

Where \(h(x^{(i)})\) is sigmoid function of \(x^{(i)}\)

In order to maximize \(l(\theta)\), I use Newton method to find \(\theta\). We have:

Where H is Hessian matrix:

and \( \nabla_{\theta}l(\theta)\) is the vector of partial derivatives of \(l(\theta)\) with respect to the \(\theta\)

The matrix form of \(H_{i,j}\) is:

here S is the matrix of:

Having all the thing I need, let implement it! For simplicity, I have used Matlab to calculate the above equation

function theta = newton(X, y, theta)
  dl = (y - sigmoid(X * theta))' * X;
  S = diag((sigmoid(X * theta) - 1) .* sigmoid(X * theta));
  h = X' * S * X;
  theta = theta - inv(h) * dl';
end

P/S: using MathJax for Latex is so convenience!