[Linux] Fix deadlock when running `LaunchProcess()` from a thread (#4833)

`LaunchProcess()` on Linux works by calling `fork` then `execvpe`. `fork` is used to copy a running process, generating a new child process. The new child starts running from the location where the parent was running, from whatever thread from the parent called `fork`. The child process only gets one thread, however. If a different thread in the parent process had locked a mutex, that mutex is also locked in the child process. Since that separate thread is not present in the child, the mutex remains locked in the child, with no way to unlock it. So it is important that as little work as possible happens between the call to `fork` and to `execvpe`. Previously, this code was trying to report an error that may have occurred from calling `execvpe`. It was doing that by calling `AZ_TracePrintf`. That function does lots of things, including trying to make an EBus call, which looks up a variable in the `AZ::Environment` instance, which has a global mutex. If there was some other thread that had that mutex locked when the `fork` call was made, the subprocess would deadlock, and the parent process would also deadlock waiting for the child to finish. This solves that issue by removing the call to `AZ_TracePrintf` from the subprocess code path. Instead, the parent process sets up a pipe for the child process to write to in case the call to `execvpe` fails (the self-pipe trick). The parent then reads from that pipe. If it reads no data, `execvpe` worked and there's no error. If it does read data, the data to be read is the errno from the failed `execvpe` call made by the child. The parent can then use `strerror()` to report the error. Fixes #4702. Signed-off-by: Chris Burel <burelc@amazon.com>
4 years ago · ab86c9961e
parent 7a14a21377
commit ab86c9961e
2 changed files with 73 additions and 26 deletions
--- a/Code/Framework/AzFramework/AzFramework/Process/ProcessWatcher.cpp
+++ b/Code/Framework/AzFramework/AzFramework/Process/ProcessWatcher.cpp
@ -22,7 +22,7 @@ namespace AzFramework
        AZStd::scoped_ptr<ProcessWatcher> pWatcher(LaunchProcess(processLaunchInfo, communicationType));
        if (!pWatcher)
        {
-            AZ_TracePrintf("Process Watcher", "ProcessWatcher::LaunchProcessAndRetrieveOutput: Unable to launch process '%s %s'", processLaunchInfo.m_processExecutableString.c_str(), processLaunchInfo.m_commandlineParameters.c_str());
+            AZ_TracePrintf("Process Watcher", "ProcessWatcher::LaunchProcessAndRetrieveOutput: Unable to launch process '%s %s'\n", processLaunchInfo.m_processExecutableString.c_str(), processLaunchInfo.m_commandlineParameters.c_str());
            return false;
        }
        else
@ -31,7 +31,7 @@ namespace AzFramework
            ProcessCommunicator* pCommunicator = pWatcher->GetCommunicator();
            if (!pCommunicator || !pCommunicator->IsValid())
            {
-                AZ_TracePrintf("Process Watcher", "ProcessWatcher::LaunchProcessAndRetrieveOutput: No communicator for watcher's process (%s %s)!", processLaunchInfo.m_processExecutableString.c_str(), processLaunchInfo.m_commandlineParameters.c_str());
+                AZ_TracePrintf("Process Watcher", "ProcessWatcher::LaunchProcessAndRetrieveOutput: No communicator for watcher's process (%s %s)!\n", processLaunchInfo.m_processExecutableString.c_str(), processLaunchInfo.m_commandlineParameters.c_str());
                return false;
            }
            else
--- a/Code/Framework/AzFramework/Platform/Linux/AzFramework/Process/ProcessWatcher_Linux.cpp
+++ b/Code/Framework/AzFramework/Platform/Linux/AzFramework/Process/ProcessWatcher_Linux.cpp
@ -91,31 +91,42 @@ namespace AzFramework
            return processId == 0;
        }

-        /*! Executes a command in the child process after the fork operation has been executed.
-         * This function will never return. If the execvp command fails this will call _exit with
-         * the errno value as the return value since continuing execution after a execvp command
-         * is invalid (it will be running the parent's code and in its address space and will 
-         * cause many issues).
+        /*! Executes a command in the child process after the fork operation
+         * has been executed. This function will never return. If the execvpe
+         * command fails this will call _exit since continuing execution after
+         * a execvpe command is invalid (it will be running the parent's code
+         * and in its address space and will cause many issues).
+         *
+         * This function runs after a `fork()` call. `fork()` creates a copy of
+         * the current process, including the current state of the process's
+         * memory, at the time the call is made. However, it only creates a
+         * copy of the one thread that called `fork()`. This means that if any
+         * mutexes are locked by other threads at the time of `fork()`, those
+         * mutexes will remain locked in the child process, with no way to
+         * unlock them. So this function needs to ensure that it does as little
+         * work as possible.
         *
         * \param commandAndArgs - Array of strings that has the command to execute in index 0 with any args for the command following. Last element must be a null pointer.
-         * \param envionrmentVariables - Array of strings that contains environment variables that command should use. Last element must be a null pointer.
+         * \param environmentVariables - Array of strings that contains environment variables that command should use. Last element must be a null pointer.
         * \param processLaunchInfo - struct containing information about luanching the command
         * \param startupInfo - struct containing information needed to startup the command
+         * \param errorPipe - a pipe file descriptor used to communicate a failed execvpe call's error code to the parent process
         */
-        void ExecuteCommandAsChild(char** commandAndArgs, char** environmentVariables, const ProcessLauncher::ProcessLaunchInfo& processLaunchInfo, StartupInfo& startupInfo)
+        [[noreturn]] static void ExecuteCommandAsChild(char** commandAndArgs, char** environmentVariables, const ProcessLauncher::ProcessLaunchInfo& processLaunchInfo, StartupInfo& startupInfo, const AZStd::array<int, 2>& errorPipe)
        {
+            close(errorPipe[0]);
+
            if (!processLaunchInfo.m_workingDirectory.empty())
            {
                int res = chdir(processLaunchInfo.m_workingDirectory.c_str());
                if (res != 0)
                {
-                    std::cerr << strerror(errno) << std::endl;
-                    AZ_TracePrintf("Process Watcher", "ProcessWatcher::LaunchProcessAndRetrieveOutput: Unable to change the launched process' directory to '%s'.", processLaunchInfo.m_workingDirectory.c_str());
+                    write(errorPipe[1], &errno, sizeof(int));
                    // We *have* to _exit as we are the child process and simply
                    // returning at this point would mean we would start running
                    // the code from our parent process and that will just wreck
                    // havoc.
-                    _exit(errno);
+                    _exit(0);
                }
            }

@ -135,15 +146,17 @@ namespace AzFramework

            startupInfo.SetupHandlesForChildProcess();

-            execve(commandAndArgs[0], commandAndArgs, environmentVariables);
+            execvpe(commandAndArgs[0], commandAndArgs, environmentVariables);
+            const int errval = errno;

-            // If we get here then execve failed to run the requested program and
+            // If we get here then execvpe failed to run the requested program and
            // we have an error. In this case we need to exit the child process
-            // to stop it from continuing to run as a clone of the parent
-            AZ_TracePrintf("Process Watcher", "ProcessWatcher::LaunchProcessAndRetrieveOutput: Unable to launch process %s : errno = %s ", commandAndArgs[0], strerror(errno));
-            std::cerr << strerror(errno) << std::endl;
+            // to stop it from continuing to run as a clone of the parent.
+            // Communicate the error code back to the parent via a pipe for the
+            // parent to read.
+            write(errorPipe[1], &errval, sizeof(errval));

-            _exit(errno);
+            _exit(0);
        }
    }

@ -212,9 +225,8 @@ namespace AzFramework
        
        AZStd::string outputString;
        bool inQuotes = false;
-        for (size_t pos = 0; pos < processLaunchInfo.m_commandlineParameters.size(); ++pos)
+        for (const char currentChar : processLaunchInfo.m_commandlineParameters)
        {
-            char currentChar = processLaunchInfo.m_commandlineParameters[pos];
            if (currentChar == '"')
            {
                inQuotes = !inQuotes;
@ -231,7 +243,7 @@ namespace AzFramework
                outputString.push_back(currentChar);
            }
        }
-        
+
        if (!outputString.empty())
        {
            commandTokens.push_back(outputString);
@ -249,10 +261,10 @@ namespace AzFramework
            return false;
        }

-        // Because of the way execve is defined we need to copy the strings from
+        // Because of the way execvpe is defined we need to copy the strings from
        // AZ::string (using c_str() returns a const char*) into a non-const char*

-        // Need to add one more as exec requires the array's last element to be a null pointer
+        // Need to add one more as execvpe requires the array's last element to be a null pointer
        char** commandAndArgs = new char*[commandTokens.size() + 1];
        for (int i = 0; i < commandTokens.size(); ++i)
        {
@ -275,7 +287,7 @@ namespace AzFramework
                azstrcat(environmentVariable.get(), envVarString.size() + 1, envVarString.c_str());
                environmentVariablesVector.emplace_back(environmentVariable.get());
            }
-            // Adding one more as exec expects the array to have a nullptr as the last element
+            // Adding one more as execvpe expects the array to have a nullptr as the last element
            environmentVariablesVector.emplace_back(nullptr);
            environmentVariables = environmentVariablesVector.data();
        }
@ -288,15 +300,50 @@ namespace AzFramework
            AZ_Assert(environmentVariables, "Environment variables for current process not available\n");
        }

+        // Set up a pipe to communicate the error code from the subprocess's execvpe call
+        AZStd::array<int, 2> childErrorPipeFds{};
+        pipe(childErrorPipeFds.data());
+
+        // This configures the write end of the pipe to close on calls to `exec`
+        fcntl(childErrorPipeFds[1], F_SETFD, fcntl(childErrorPipeFds[1], F_GETFD) | FD_CLOEXEC);
+
        pid_t child_pid = fork();
        if (IsIdChildProcess(child_pid))
        {
-            ExecuteCommandAsChild(commandAndArgs, environmentVariables, processLaunchInfo, processData.m_startupInfo);
+            ExecuteCommandAsChild(commandAndArgs, environmentVariables, processLaunchInfo, processData.m_startupInfo, childErrorPipeFds);
        }
-        processData.m_childProcessId = child_pid;

        // Close these handles as they are only to be used by the child process
        processData.m_startupInfo.CloseAllHandles();
+        close(childErrorPipeFds[1]);
+
+        {
+            int errorCodeFromChild = 0;
+            int count = 0;
+            // Read from the error pipe.
+            // * If the child's call to execvpe succeeded, then the pipe will
+            // be closed due to setting FD_CLOEXEC on the write end of the
+            // pipe. `read()` will return 0.
+            // * If the child's call to execvpe failed, the child will have
+            // written the error code to the pipe. `read()` will return >0, and
+            // the data to be read is the error code from execvpe.
+            while ((count = read(childErrorPipeFds[0], &errorCodeFromChild, sizeof(errorCodeFromChild))) == -1)
+            {
+                if (errno != EAGAIN && errno != EINTR)
+                {
+                    break;
+                }
+            }
+            if (count)
+            {
+                AZ_TracePrintf("Process Watcher", "ProcessLauncher::LaunchProcess: Unable to launch process %s : errno = %s\n", commandAndArgs[0], strerror(errorCodeFromChild));
+                processData.m_childProcessIsDone = true;
+                child_pid = -1;
+            }
+        }
+        close(childErrorPipeFds[0]);
+
+        processData.m_childProcessId = child_pid;

        for (int i = 0; i < commandTokens.size(); i++)
        {