I ran into an issue where one particular query sometimes resulted in the following error:
(pymysql.err.OperationalError) (2013, 'Lost connection to MySQL server during query')
Sometimes the error condition would manifest as:
(2006, "MySQL server has gone away (TimeoutError(110, 'Connection timed out'))")
I could not get my head around why this was happening. Everything on the Internets suggested that tuning MySQL parameters for timeouts was needed. So I did that, but nothing with respect to the errors changed. I tried as many changes to how SQLAlchemy, my database ORM, connects to MySQL hosted using Amazon RDS.
I logged everything possible in RDS and saw messages such as:
Got an error reading communication packets
But that messages lead me no where.
When I compared the overall code path for queries that work fine with the one that generated the errors I found one distinction:
For the queries that worked I was getting a database session using SQLAlchemy within the function that executes the query. For the query that would timeout I was getting a database session at the top of the file that contained the function.
The connection to the database was happening when the application started up, but would eventually timeout as activity subsided. By the time the function got triggered, the session that was created from the code at the top of the tile was already closed. So the application would encounter the “Timeout" and "Lost connection” errors.
Moving the call to get the database session into the body of the function itself solved the problem.