[PATCH] docker: call docker_client.close() to prevent connection leaks

Mark Wielaard mark@klomp.org
Sat Jun 4 21:42:35 GMT 2022


In DockerLatentWorker both _thd_start_instance and _thd_stop_instance
use a new docker_client instance that is never cleaned up. This can
cause (ssh) connections to the docker socket to linger for a long
time. Explicitly call docker_client.close() when done with the client.

Also add a mock close def to test/fake/docker.py.
---

Patch can also be found at:
https://code.wildebeest.org/git/user/mjw/buildbot/commit/?h=docker-py-close

We are running a buildbot instance at https://builder.sourceware.org/
that also has a couple of DockerLatentWorkers. These workers use
docker_host connection strings like "ssh://builder@bb.wildebeest.org:2021"

However after a couple of weeks we found hundreds of lingering network
connections to the container host machines. This is caused by the
DockerLatentWorker creating a new connection each time a container is
initialized or stopped, but never closing the client connection.

This patch makes sure the docker client is always closed and we are
not seeing any lingering network connections anymore.

Our full configuration can be found in this git repository:
https://sourceware.org/git/builder.git

 master/buildbot/test/fake/docker.py | 4 ++++
 master/buildbot/worker/docker.py    | 5 +++++
 2 files changed, 9 insertions(+)

diff --git a/master/buildbot/test/fake/docker.py b/master/buildbot/test/fake/docker.py
index 46aab1cc5..a46920d93 100644
--- a/master/buildbot/test/fake/docker.py
+++ b/master/buildbot/test/fake/docker.py
@@ -115,6 +115,10 @@ class Client:
     def remove_container(self, id, **kwargs):
         del self._containers[id]
 
+    def close(self):
+        # dummy close, no connection to cleanup
+        pass
+
 
 class APIClient(Client):
     pass
diff --git a/master/buildbot/worker/docker.py b/master/buildbot/worker/docker.py
index 49a08843e..c2ef707d6 100644
--- a/master/buildbot/worker/docker.py
+++ b/master/buildbot/worker/docker.py
@@ -299,6 +299,7 @@ class DockerLatentWorker(CompatibleLatentWorkerMixin,
         if not self._image_exists(docker_client, image):
             msg = f'Image "{image}" not found on docker host.'
             log.msg(msg)
+            docker_client.close()
             raise LatentWorkerCannotSubstantiate(msg)
 
         volumes, binds = self._thd_parse_volumes(volumes)
@@ -319,6 +320,7 @@ class DockerLatentWorker(CompatibleLatentWorkerMixin,
 
         if instance.get('Id') is None:
             log.msg('Failed to create the container')
+            docker_client.close()
             raise LatentWorkerFailedToSubstantiate(
                 'Failed to start container'
             )
@@ -331,6 +333,7 @@ class DockerLatentWorker(CompatibleLatentWorkerMixin,
         try:
             docker_client.start(instance)
         except docker.errors.APIError as e:
+            docker_client.close()
             # The following was noticed in certain usage of Docker on Windows
             if 'The container operating system does not match the host operating system' in str(e):
                 msg = f'Image used for build is wrong: {str(e)}'
@@ -346,6 +349,7 @@ class DockerLatentWorker(CompatibleLatentWorkerMixin,
                 if self.conn:
                     break
             del logs
+        docker_client.close()
         return [instance['Id'], image]
 
     def stop_instance(self, fast=False):
@@ -374,3 +378,4 @@ class DockerLatentWorker(CompatibleLatentWorkerMixin,
                 docker_client.remove_image(image=instance['image'])
             except docker.errors.APIError as e:
                 log.msg('Error while removing the image: %s', e)
+        docker_client.close()
-- 
2.30.2



More information about the Buildbot mailing list