Conventional deduplication systems face critical challenges such as excessive write amplification, high read/write latency, and sub-optimal storage utilization. These limitations often undermine the performance benefits of deduplication by slowing down I/O acknowledgements, due to amplified deduplication I/Os, excessive data chunk replication and strict consistency requirements. To address these issues, we present Speed-Dedup, a novel deduplication framework that employs a deduplicated primary - semi deduplicated replica object approach. This strategy reduces write amplification by restricting deduplication to the primary object while maintaining a semi-deduplicated replica object used for immediate read/write acknowledgements, thus enhancing I/O latency and storage efficiency. Speed-Dedup also replaces traditional strong consistency models with eventual consistency, allowing for non-blocking read operations and improving overall system throughput. Experimental results demonstrate that Speed-Dedup significantly outperforms traditional methods like GRATE and CAO, showing up to 21% improvement in I/O performance under low deduplication ratios and maintaining 14% or more gains under higher ratios. Additionally, write amplification is substantially reduced, and latency improves by over 100%, with faster recovery times during system failures. These findings highlight the effectiveness of Speed-Dedup as a scalable and efficient solution.