The vision-based welding status recognition (WSR) provides a basis for online welding quality control. Due to the severe arc and fume interference in the welding area and limited computational resources at the welding edge nodes, it becomes a challenge to mine the most discriminative feature contained in welding images by using a lightweight model. In this paper, we propose an improved three-dimensional convolutional neural network (3DCNN) with separable structure and multi-dimensional attention (3DSMDA-Net) for WSR. The proposed 3DSMDA-Net uses 3DCNN to adaptively extract abstract spatiotemporal features in a welding process and then leverages such time sequence information to improve the recognition accuracy of WSR. In addition, we decompose the classical 3D convolution into depthwise convolution and pointwise convolution to produce a lightweight model. A multi-dimensional attention mechanism is further proposed to compensate for the loss of accuracy caused by the separation operation. The results of experiments reveal that the proposed method reduces the model size to 1/7 of the classical 3DCNN without sacrificing accuracy. The comparison experiment results have indicated that the accuracy of the proposed method is more accurate and noise-resistant than that of the conventional model. © 2021 The Society of Manufacturing Engineers