We propose a forward-only multi-layer encoding-decoding framework based on the principle of Maximal Coding Rate Reduction (MCR$^2$), an information-theoretic metric that measures a statistical distance between two sets of feature vectors up to the second moment. The encoder directly transforms data vectors themselves via gradient ascent to maximize the MCR$^2$ distance between different classes in the feature space, resulting in class-wise mutually orthogonal subspace representations. The decoder follows a process symmetric to the encoder, and transforms the subspace feature vectors via gradient descent to minimize the MCR$^2$ distance between the reconstructed data and the original data. We show that the encoder transforms data to linear discriminative representations without breaking the higher-order manifolds, and the decoder reconstructs the data with high fidelity.