DaisyChain is a node-based dependency graph for executing programs which typically involve file processing. In addition to command-line support, a GUI application can be used for executing scripts and programs without the need to interact with a terminal. The GUI supports drag-n-drop for files (both graphs and file inputs). Nodes in the graph can be executed in serial or parallel.
The project consists of the following components:
DaisyChain is not a data flow graph[^1] and simply passes string tokens along to nodes. Executables run by the graph are not required to support UNIX pipeline semantics (e.g. read stdin, write stdout). The graph functions like xargs in that regard. However, standard output can be captured and used as input.
[^1]: the NodeEditor project used by DaisyChain IS a data flow graph.
DaisyChain uses a directed-acyclic-graph (DAG) to control the order of operations. Nodes in the graph read inputs, do some processing, and then write the outputs. Inputs are typically file paths but could be any string token (e.g. a range of numbers). A node will execute a process once per token until all tokens have been received. Tokens can be modified as they pass through the graph.
Every node runs as a child process (via fork()). All synchronization between nodes is handled using named pipes and multiplexed I/O. This allows processes to run in parallel.
Graphs are stored as JSON in a *.dcg file.
The current set of executable nodes includes:
Options are passed to nodes via variables that can be set in the Variables panel in the GUI or set using flags from the command-line tool.
Processing happens one string token at a time. Nodes loop over tokens and execute once
per token. This behavior can be changed by checking the batch
checkbox (when a node supports it);
in which case, a node will block until it has received all inputs which are then concatenated into one large
string and set as the input for the node.
Parallel processing can be achieved by duplicating a set of nodes and using a distro
node
to distribute tokens across each group of nodes. This would typically be followed by using a concat
node to bring the inputs back into a single stream.
I/O from node to node are string tokens represented by the ${INPUT}
and ${OUTPUT}
variables.
The ${OUTPUT}
variable is automatically set equal to the ${INPUT}
variable. This leaves the string token
intact as it passes through the graph. However, the ${OUTPUT}
variable can be changed via shell string
substitution patterns in the ${OUTPUT}
field of a node (if present). Additionally, for CommandLine nodes,
${OUTPUT}
can be set to ${STDOUT}
.
Download the source here. The project was primarily developed for MacOS and Linux.
It is possible to build on Windows 11 using WSL2 (e.g. Ubuntu). This may require building Qt6.x manually depending on the distribution used. Very limited testing has been done on WSL2. For best performance, make sure all working files are located on the Linux file system.
The project attempts to leverage as many existing technologies as possible without creating too many run-time dependencies. Most dependencies are included in the project as git submodules. It is highly recommended to use CMake to build the project.
Dependencies:
Git Submodules (header-only dependencies):
GUI Dependencies:
git clone git@github.com:threadkill/daisychain.git
cd daisychain && git submodule update --init --recursive
mkdir build && cd build
cmake .. -DCMAKE_BUILD_TYPE=RelWithDebInfo -DCMAKE_INSTALL_PREFIX=<install dir>
make -j5 install
individual build targets: daisy, chain, libdaisychain, pydaisychain
The configure scripts currently only support building the library (libdaisychain) and the command-line application (daisy). When building the GUI application (chain) or the python bindings (pydaisychain), use the CMake buildscripts.
git clone git@github.com:threadkill/daisychain.git
cd daisychain && git submodule update --init --recursive
./bootstrap
mkdir build && cd build
../configure --prefix=<install dir>
make -j5 install
individual build targets: daisy, libdaisychain
CMAKE_PREFIX_PATH
on the CMake command-line to the root of your Qt6 installation (e.g. -DCMAKE_PREFIX_PATH=<Qt6_Root>/<arch>
).Debugging forked processes can be challenging and typically requires a debugger that can follow forks. The graph uses an initial fork to establish the process leader for the group and subsequent forks for each node in the graph. This is how tasks are parallelized and facilitates signal processing via process group.
The utils.h header includes a function called m_debug_wait()
that does not require following forks in the debugger.
Place that function in the node's Execute()
method and call with a true
value; the node will wait and the log will
print the pid for the node. You can then attach the debugger to that process (via pid), hit pause/break, set the
wait
variable to false
, and continue by stepping. Wrap the m_debug_wait()
call in an if
statement against the
node name to be sure you're in the exact node you're trying to debug.
if (name_ == "Command1") { m_debug_wait(true); }
DaisyChain - node-based dependency graph for file processing. USAGE: ./daisy [-l <level>] [--stdin] [--keep] [-e <key=value>] ... [-s <sandbox directory>] -g <graph *.dcg> [--] [--version] [-h] <filename> ... Where: -l <level>, --loglevel <level> off, info, warn, error, debug --stdin read from STDIN --keep keep sandbox -e <key=value>, --environ <key=value> (accepted multiple times) Shell variables inherited by the execution process. -s <sandbox directory>, --sandbox <sandbox directory> Working directory used for I/O and available as a shell variable during execution ${SANDBOX}. -g <graph *.dcg>, --graph <graph *.dcg> (required) DaisyChain graph file to execute. --, --ignore_rest Ignores the rest of the labeled arguments following this flag. --version Displays version information and exits. -h, --help Displays usage information and exits. <filename> (accepted multiple times) Inputs (typically files).