Linux ODBC and Microsoft Access
The other day I need to do some data analysis from a Microsoft Access file in my FreeBSD box, here are the steps that I’ve done
Goal: load data from Access into Python to perform data analysis
Library/program used:
UnixODBC
UnixODBC is ODBC implementation on Linux/Unix. Installing it is quite easy, we can simply download the source code, ./configure
, make
and make install
.
MDBTools
To actually connect to a database using ODBC, we’ll also need a driver for that database. The ODBC for Access provided by Microsoft doesn’t have a Linux installation so we’ll need to find 3rd party driver.
There are paid libraries such as the one from Easysoft(closed source but may have better quality). Luckily, I found one open-source driver from MDBTools. MDBTools is a collection of cli tools to interact with a Access database, it also contains a ODBC driver. We’ll install this driver.
The source code is at https://github.com/mdbtools/mdbtools
Installation of MDBTools is quite easy, we just need to add the flag --with-unixodbc
and set the path to our unixODBC. If we installed unixODBC in a non-standard path($HOME
for example), we also need to set CPPFLAGS
and LDFLAGS
so that MDBTools knows where to find the library and header files.
Config unixODBC
Now we’ll need to config unixODBC so that it knows about our Access Driver, there are two files that we need:
- odbcinst.ini (in etc folder): contains config of ODBC drivers
- odbc.ini (in etc folder): contains config of Database
We’ll add Access driver to odbcinst.ini
as follow:
[MDBTools]
Driver = <path to mdbtools installation>/lib/odbc/libmdbodbc.so
odbc.ini
is where we config our database so for example I have an access file name test.accdb, I will config it as follow
[test]
Driver = MDBTools
Database = <path to our database file>/test.accdb
After this, we can use unixODBC to connect to access, for example using its isql command:
isql test
> select * from test_table
Connect from Python
I also want to load Access data into Python so I used pyodbc to do that. It’s a wrapper to unixODBC so we’ll need unixODBC installed before we can actually run pyodbc.
To check if pyodbc can actually load unixODBC, we need to import it
# pip install pyodbc
import pyodbc
If there is error such as ImportError: Shared object "libodbc.so.2" not found, required by "pyodbc.cpython-310.so"
, we need to check our unixODBC installation. If we installed it in a non-standard folder, we can set the LD_LIBRARY_PATH environment variable to our unixODBC’s lib folder.
After properly import pyodbc, we can read our Access database as follow:
import pyodbc
file_name = "<path to access file>"
con = pyodbc.connect(f'DRIVER=MDBTools;DBQ={file_name}')
cursor = con.cursor()
cursor.execute('SELECT * FROM test_table').fetchone()
# we can even load our Access data into Pandas dataframe
import pandas as pd
df = pd.read_sql('SELECT * FROM test_table', con)
Conclusion
Although I can connect and load data from Access, sometimes I get some weird errors from pyodbc(or MDBTools). My suggestion is as soon as we can load Access data, save it to other natively supported database systems on Linux such as sqlite. It’s easy to do that from pandas
import pyodbc
import sqlite3
import pandas as pd
access_con = pyodbc.connect('DRIVER=MDBTools;DBQ=test.accdb')
sqlite_con = sqlite3.connect('test.sqlite3')
df = access_con.read_sql('SELECT * FROM test_table', access_con)
df.to_sql('test_table', sqlite_con)