For a training set X of $$m$$ samples, each sample have $$n$$ features. $$y$$ is the responder. Both X and y can be written in matrix form as:

We have log likelihood of logistic regression:

Where $$h(x^{(i)})$$ is sigmoid function of $$x^{(i)}$$

In order to maximize $$l(\theta)$$, I use Newton method to find $$\theta$$. We have:

Where H is Hessian matrix:

and $$\nabla_{\theta}l(\theta)$$ is the vector of partial derivatives of $$l(\theta)$$ with respect to the $$\theta$$

The matrix form of $$H_{i,j}$$ is:

here S is the matrix of:

Having all the thing I need, let implement it! For simplicity, I have used Matlab to calculate the above equation

P/S: using MathJax for Latex is so convenience!