How To Speed Up Shared File Access In Docker For Mac

Posted on  by admin

Dockerizing a Node.js Web Application Docker has significantly improved the way we build, ship and run apps. Read this tutorial to learn how to integrate Docker with your Node.js application. Introduction If you've ever developed anything that needs to 'live' somewhere besides your local machine, you know that getting an application up and running on a different machine is no simple task. There are countless considerations to be had, from the very basics of 'how do I get my environment variables set' to which runtimes you'll need and which dependencies those will rely on, not to mention the need to automate the process. It's simply not feasible for software teams to rely on a manual deploy process anymore.

A number of technologies have sought to solve this problem of differing environments, automation, and deployment configuration, but the most well-known and perhaps most notable attempt in recent years is. By the end of this tutorial you should be able to:. understand what Docker is and what it does. create a simple Dockerfile. run a Node.js application using Docker What is Docker, Anyway? Docker's homepage describes Docker as follows: 'Docker is an open platform for building, shipping and running distributed applications.

It gives programmers, development teams and operations engineers the common toolbox they need to take advantage of the distributed and networked nature of modern applications.' Put differently, Docker is an abstraction on top of low-level operating system tools that allows you to run one or more containerized processes or applications within one or more virtualized Linux instances.

Advantages of Using Docker Before we dive in, it's important to stress the potential usefulness of Docker in your software development workflow. It's not a 'silver bullet', but it can be hugely helpful in certain cases. Note the many potential benefits it can bring, including:. Rapid application deployment. Portability across machines.

How to speed up shared file access in docker for mac

How To Speed Up Shared File Access In Docker For Mac

Version control and component reuse. Sharing of images/dockerfiles. Lightweight footprint and minimal overhead. Simplified maintenance Prerequisites Before you begin this tutorial, ensure the following is installed to your system:. Node.js (available or via ).

(Mac users: it's recommended to use docker-machine, available via )., to track changes Directory Structure We'll be using a basic application as our example Node.js application to run in our Docker container. To keep things moving, we'll use Express's scaffolding tool to generate our directory structure and basic files. # This will make the generator available to use anywhere $ npm i -g express-generator $ cd $ git init # (if you haven't set up your repository already) $ express #.

$ npm install This should have created a number of files in your directory, including bin, views, and routes directories. Make sure to run npm install so that npm can get all of your Node.js modules set up and ready to use. Setting Up Express Now that we've got our basic Express files generated for us, let's write some basic tests to ensure that we're working with good development practices and can have some idea of when we're done. To run our tests, we'll use just two tools:.

Let's get them installed first as development tools, so they won't be installed in production: $ npm install -save-dev tape supertest Since we're not focusing on Express in this tutorial, we won't go too deeply into how it works or extensively testing it. At this point, we just want to know that the application will send back some basic JSON responses when we create GET requests. SuperTest will spin up an instance of our application, assign it an ephemeral port, and let us send requests to it with a fluent API. We also get a couple assertions we can run; we run them against the response type and 'Content-Type' header. We can set up our tests that will focus on our rotes in a test/routes.js file. Normally, we would break related tests into several different files, but our application is so lightweight this will suffice.git.gitignore nodemodules Now we're ready to create a Dockerfile. You can think of a Dockerfile as a set of instructions to Docker for how to create our container, very much like a procedural piece of code.

To get started, we need to choose which base image to pull from. We are essentially telling Docker 'Start with this.' This can be hugely useful if you want to create a customized base image and later create other, more-specific containers that 'inherit' from a base container. We'll be using the debian:jessie base image, since it gives us what we need to run our application and has a smaller footprint than the Ubuntu base image. This will end up saving us some time during builds and let us only use what we really need. Using a Dockerfile is one way to tell Docker how to build images for us. # Replace shell with bash so we can source files RUN rm /bin/sh && ln -s /bin/bash /bin/sh # Set environment variables ENV appDir /var/www/app/current The RUN command executes any commands in a new layer on top of the current image and then commits the results.

The resulting image will then be used in the next steps. This command starts to get us into the incremental aspect of Docker that we mentioned briefly as one of its benefits.

Each RUN command acts as sort of git commit-like action in that it takes the current image, executes commands on top of it, and then returns a new image with the committed changes. This creates a build process that has high granularity — any point in the build phases should be a valid image — and lets us think of the build more atomically (where each step is self-contained). With that in mind, let's install some packages that we'll need to run our Node.js application later. # Run updates and install deps RUN apt-get update # Install needed deps and clean up after RUN apt-get install -y -q -no-install-recommends apt-transport-https build-essential ca-certificates curl g gcc git make nginx sudo wget && rm -rf /var/lib/apt/lists/. && apt-get -y autoclean Note that we grouped all the apt-get install-related actions into a single command. Because we did that, the build is, in that phase, only doing things related to installing needed packages with apt-get and subsequent cleanup. Next, we'll install so we can install any version of Node.js that we want.

There are base images out there that let you install Node.js with Docker, but there are several reasons why you might not want to use them:. Speed: nvm lets you upgrade to a latest version of Node.js immediately. There are sometimes critical security fixes that get released and you shouldn't need to wait for a new version. Clean separation of concerns: changing to/from a version of Node.js is done with nvm, which is dedicated to managing Node.js installations. Lightweight: you get what you need with a simple curl-to-bash installation We'll add node.js-related commands to our Dockerfile. # Dockerfile #.

ENV NVMDIR /usr/local/nvm ENV NODEVERSION 5.1.0 # Install nvm with node and npm RUN curl -o- bash && source $NVMDIR/nvm.sh && nvm install $NODEVERSION && nvm alias default $NODEVERSION && nvm use default # Set up our PATH correctly so we don't have to long-reference npm, node, &c. ENV NODEPATH $NVMDIR/versions/node/v$NODEVERSION/lib/nodemodules ENV PATH $NVMDIR/versions/node/v$NODEVERSION/bin:$PATH We just ran the basic, installed the version of Node.js we want, made sure it is set as the default for later, and set some environment variables to use later (PATH and NODEPATH). One thing to note: We highly recommend downloading a copy of the nvm install script and hosting it yourself if you're going to use this setup in production, since you really don't want to be relying on the persistence of a hosted f ile for your entire build process.

Now that we have Node.js installed and ready to use, we can add our files and get ready to run everything. First, we need to create a directory to hold our application files. Then, we'll set the, so Docker knows where to add files later. This affects RUN, CMD, ENTRYPOINT, COPY, and ADD instructions that follow it in the Dockerfile. We waited to set it till now because our commands have not needed to be run from a particular directory. # Dockerfile #.

#Expose the port EXPOSE 4500 CMD 'pm2', 'start', 'processes.json', '-no-daemon' # the -no-daemon is a minor workaround to prevent the docker container from thinking pm2 has stopped running and ending itself EXPOSE will open up a port on our container, but not necessarily the host system. Remember, these instructions are for Docker, not the host environment. We can map ports to external ports later, so choosing a privileged port like 80 or 443 isn't absolutely necessary here. CMD is what will happen when you run your container using docker run from the command line. It takes arguments as an array, somewhat similar to how Node's API works. Our final Dockerfile should look more or less as follows.

HowHow To Speed Up Shared File Access In Docker For Mac

File system sharing (osxfs) Estimated reading time: 14 minutes osxfs is a new shared file system solution, exclusive to Docker for Mac. Osxfs provides a close-to-native user experience for bind mounting macOS file system trees into Docker containers. To this end, osxfs features a number of unique capabilities as well as differences from a classical Linux file system.

Case sensitivity With Docker for Mac, file systems operate in containers in the same way as they operate in macOS. If a file system on macOS is case-insensitive, that behavior is shared by any bind mount from macOS into a container.

On macOS Sierra and lower, the default file system is HFS+. On macOS High Sierra, the default file system is APFS. Both are case-insensitive by default but available in case-sensitive and case-insensitive variants. To get case-sensitive behavior, format the volume used in your bind mount as HFS+ or APFS with case-sensitivity. Reformatting your root partition is not recommended as some Mac software relies on case-insensitivity to function. Access control osxfs, and therefore Docker, can access only those file system resources that the Docker for Mac user has access to. Osxfs does not run as root.

If the macOS user is an administrator, osxfs inherits those administrator privileges. We are still evaluating which privileges to drop in the file system process to balance security and ease-of-use. Osxfs performs no additional permissions checks and enforces no extra access control on accesses made through it. All processes in containers can access the same objects in the same way as the Docker user who started the containers.

Namespaces Much of the macOS file system that is accessible to the user is also available to containers using the -v bind mount syntax. The following command runs a container from an image called r-base and shares the macOS user’s /Desktop/ directory as /Desktop in the container. Root@2h30fa0c600e:/# ls Desktop boot etc lib lib64 media opt root sbin sys usr bin dev home lib32 libx32 mnt proc run srv tmp var By default, you can share files in /Users/, /Volumes/, /private/, and /tmp directly. To add or remove directory trees that are exported to Docker, use the File sharing tab in Docker preferences - Preferences - File sharing. (See.) All other paths used in -v bind mounts are sourced from the Moby Linux VM running the Docker containers, so arguments such as -v /var/run/docker.sock:/var/run/docker.sock should work as expected.

If a macOS path is not shared and does not exist in the VM, an attempt to bind mount it fails rather than create it in the VM. Paths that already exist in the VM and contain files are reserved by Docker and cannot be exported from macOS.

See to learn about new configuration options available with the Docker 17.04 CE Edge release. Ownership Initially, any containerized process that requests ownership metadata of an object is told that its uid and gid own the object. When any containerized process changes the ownership of a shared file system object, such as by using the chown command, the new ownership information is persisted in the com.docker.owner extended attribute of the object.

Subsequent requests for ownership metadata return the previously set values. Ownership-based permissions are only enforced at the macOS file system level with all accessing processes behaving as the user running Docker. If the user does not have permission to read extended attributes on an object (such as when that object’s permissions are 0000), osxfs attempts to add an access control list (ACL) entry that allows the user to read and write extended attributes. If this attempt fails, the object appears to be owned by the process accessing it until the extended attribute is readable again. File system events Most inotify events are supported in bind mounts, and likely dnotify and fanotify (though they have not been tested) are also supported. This means that file system events from macOS are sent into containers and trigger any listening processes there. The following are supported file system events:.

Creation. Modification. Attribute changes. Deletion. Directory changes The following are partially supported file system events:. Move events trigger INDELETE on the source of the rename and INMODIFY on the destination of the rename The following are unsupported file system events:. Open.

Access. Close events.

Unmount events (see ) Some events may be delivered multiple times. These limitations do not apply to events between containers, only to those events originating in macOS. Mounts The macOS mount structure is not visible in the shared volume, but volume contents are visible. Volume contents appear in the same file system as the rest of the shared file system. Mounting/unmounting macOS volumes that are also bind mounted into containers may result in unexpected behavior in those containers.

Unmount events are not supported. Mount export support is planned but is still under development. Symlinks Symlinks are shared unmodified. This may cause issues when symlinks contain paths that rely on the default case-insensitivity of the default macOS file system.

File types Symlinks, hardlinks, socket files, named pipes, regular files, and directories are supported. Socket files and named pipes only transmit between containers and between macOS processes - no transmission across the hypervisor is supported, yet. Character and block device files are not supported. Extended attributes Extended attributes are not yet supported. Technology osxfs does not use OSXFUSE. Osxfs does not run under, inside, or between macOS userspace processes and the macOS kernel. Performance issues, solutions, and roadmap See to learn about new configuration options available with the Docker 17.04 CE Edge release.

With regard to reported performance issues , and a similar thread on, this topic provides an explanation of the issues, recent progress in addressing them, how the community can help us, and what you can expect in the future. This explanation derives from a by David Sheets (@dsheets) on the to the forum topic just mentioned.

We want to surface it in the documentation for wider reach. Understanding performance Perhaps the most important thing to understand is that shared file system performance is multi-dimensional. This means that, depending on your workload, you may experience exceptional, adequate, or poor performance with osxfs, the file system server in Docker for Mac. File system APIs are very wide (20-40 message types) with many intricate semantics involving on-disk state, in-memory cache state, and concurrent access by multiple processes.

Additionally, osxfs integrates a mapping between macOS’s FSEvents API and Linux’s inotify API which is implemented inside of the file system itself, complicating matters further (cache behavior in particular). At the highest level, there are two dimensions to file system performance: throughput (read/write IO) and latency (roundtrip time). In a traditional file system on a modern SSD, applications can generally expect throughput of a few GB/s. With large sequential IO operations, osxfs can achieve throughput of around 250 MB/s which, while not native speed, is not likely to be the bottleneck for most applications which perform acceptably on HDDs. Latency is the time it takes for a file system call to complete. For instance, the time between a thread issuing write in a container and resuming with the number of bytes written. With a classical block-based file system, this latency is typically under 10μs (microseconds).

With osxfs, latency is presently around 130μs for most operations or 13× slower. For workloads which demand many sequential roundtrips, this results in significant observable slowdown. Reducing the latency requires shortening the data path from a Linux system call to macOS and back again. This requires tuning each component in the data path in turn - some of which require significant engineering effort. Even if we achieve a huge latency reduction of 65μs/roundtrip, we still “only” see a doubling of performance.

This is typical of performance engineering, which requires significant effort to analyze slowdowns and develop optimized components. We know a number of approaches that may reduce the roundtrip time but we haven’t implemented all those improvements yet (more on this below in ). A second approach to improving performance is to reduce the number of roundtrips by caching data.

Recent versions of Docker for Mac (17.04 onwards) include caching support that brings significant (2-4×) improvements to many applications. Much of the overhead of osxfs arises from the requirement to keep the container’s and the host’s view of the file system consistent, but full consistency is not necessary for all applications and relaxing the constraint opens up a number of opportunities for improved performance. At present there is support for read caching, with which the container’s view of the file system can temporarily drift apart from the authoritative view on the host. Further caching developments, including support for write caching, are planned. A is available. What we are doing We continue to actively work on increasing caching and on reducing the file system data path latency.

This requires significant analysis of file system traces and speculative development of system improvements to try to address specific performance issues. Perhaps surprisingly, application workload can have a huge effect on performance. As an example, here are two different use cases contributed on the and how their performance differs and suffers due to latency, caching, and coherence:. A rake example (see below) appears to attempt to access 37000+ different files that don’t exist on the shared volume. Even with a 2× speedup via latency reduction this use case still seems “slow”. With caching enabled the performance increases around 3.5×, as described in the. We expect to see further performance improvements for rake with a “negative dcache” that keeps track of, in the Linux kernel itself, the files that do not exist.

However, even this is not sufficient for the first time rake is run on a shared directory. To handle that case, we actually need to develop a Linux kernel patch which negatively caches all directory entries not in a specified set - and this cache must be kept up-to-date in real-time with the macOS file system state even in the presence of missing macOS FSEvents messages and so must be invalidated if macOS ever reports an event delivery failure. Running ember build in a shared file system results in ember creating many different temporary directories and performing lots of intermediate activity within them. An empty ember project is over 300MB. This usage pattern does not require coherence between Linux and macOS, and is significantly improved by write caching. These two examples come from performance use cases contributed by users and they are incredibly helpful in prioritizing aspects of file system performance to improve.

We are developing statistical file system trace analysis tools to characterize slow-performing workloads more easily to decide what to work on next. Under development, we have:. A growing performance test suite of real world use cases (more on this below in What you can do). Further caching improvements, including negative, structural, and write-back caching, and lazy cache invalidation. A Linux kernel patch to reduce data path latency by 2/7 copies and 2/5 context switches.

Increased macOS integration to reduce the latency between the hypervisor and the file system server What you can do When you report shared file system performance issues, it is most helpful to include a minimal Real World reproduction test case that demonstrates poor performance. Without a reproduction, it is very difficult for us to analyze your use case and determine what improvements would speed it up.

When you don’t provide a reproduction, one of us needs to figure out the specific software you are using and guess and hope that we have configured it in a typical way or a way that has poor performance. That usually takes 1-4 hours depending on your use case and once it is done, we must then determine what regular performance is like and what kind of slow-down your use case is experiencing. In some cases, it is not obvious what operation is even slow in your specific development workflow. The additional set-up to reproduce the problem means we have less time to fix bugs, develop analysis tools, or improve performance. So, include simple, immediate performance issue reproduction test cases. The by @hirowatari shown in the forums thread is a great example.

This example originally provided:. A version-controlled repository so any changes/improvements to the test case can be easily tracked. A Dockerfile which constructs the exact image to run. A command-line invocation of how to start the container. A straight-forward way to measure the performance of the use case. A clear explanation (README) of how to run the test case What you can expect We continue to work toward an optimized shared file system implementation on the Edge channel of Docker for Mac.

You can expect some of the performance improvement work mentioned above to reach the Edge channel in the coming release cycles. We plan to eventually open source all of our shared file system components. At that time, we would be very happy to collaborate with you on improving the implementation of osxfs and related software. We also plan to write up and publish further details of shared file system performance analysis and improvement on the Docker blog. Look for or nudge @dsheets about those articles, which should serve as a jumping off point for understanding the system, measuring it, or contributing to it.

Wrapping Up We hope this gives you a rough idea of where osxfs performance is and where it’s going. We are treating good performance as a top priority feature of the file system sharing component and we are actively working on improving it through a number of different avenues. The osxfs project started in December.

Since the first integration into Docker for Mac in February 2016, we’ve improved performance by 50x or more for many workloads while achieving nearly complete POSIX compliance and without compromising coherence (it is shared and not simply synced). Of course, in the beginning there was lots of low-hanging fruit and now many of the remaining performance improvements require significant engineering work on custom low-level components. We appreciate your understanding as we continue development of the product and work on all dimensions of performance. We want to continue to work with the community on this, so continue to report issues as you find them. We look forward to collaborating with you on ideas and on the source code itself.