Optimized operation scheme of flash-memory-based neural network online training with ultra-high endurance

Yang Feng; Zhaohui Sun; Yueran Qi; Xuepeng Zhan; Junyu Zhang; Jing Liu; Masaharu Kobayashi; Jixuan Wu; Jiezhi Chen

doi:10.1088/1674-4926/45/1/012301

J. Semicond. > 2024, Volume 45?>?Issue 1?> 012301

ARTICLES

Optimized operation scheme of flash-memory-based neural network online training with ultra-high endurance

Yang Feng¹, Zhaohui Sun¹, Yueran Qi¹, Xuepeng Zhan¹, Junyu Zhang², Jing Liu³, Masaharu Kobayashi⁴, Jixuan Wu^1, and Jiezhi Chen^1,

+ Author Affiliations

Corresponding author: Jixuan Wu, jixuanwu@sdu.edu.cn; Jiezhi Chen, chen.jiezhi@sdu.edu.cn

DOI: 10.1088/1674-4926/45/1/012301

Abstract: With the rapid development of machine learning, the demand for high-efficient computing becomes more and more urgent. To break the bottleneck of the traditional Von Neumann architecture, computing-in-memory (CIM) has attracted increasing attention in recent years. In this work, to provide a feasible CIM solution for the large-scale neural networks (NN) requiring continuous weight updating in online training, a flash-based computing-in-memory with high endurance (10⁹ cycles) and ultra-fast programming speed is investigated. On the one hand, the proposed programming scheme of channel hot electron injection (CHEI) and hot hole injection (HHI) demonstrate high linearity, symmetric potentiation, and a depression process, which help to improve the training speed and accuracy. On the other hand, the low-damage programming scheme and memory window (MW) optimizations can suppress cell degradation effectively with improved computing accuracy. Even after 10⁹ cycles, the leakage current (I_off) of cells remains sub-10pA, ensuring the large-scale computing ability of memory. Further characterizations are done on read disturb to demonstrate its robust reliabilities. By processing CIFAR-10 tasks, it is evident that ~90% accuracy can be achieved after 10⁹ cycles in both ResNet50 and VGG16 NN. Our results suggest that flash-based CIM has great potential to overcome the limitations of traditional Von Neumann architectures and enable high-performance NN online training, which pave the way for further development of artificial intelligence (AI) accelerators.

Key words: NOR flash memory, computing-in-memory, endurance, neural network, online training

References

[1]	Yao P, Wu H Q, Gao B, et al. Fully hardware-implemented memristor convolutional neural network. Nature, 2020, 577, 641 doi: 10.1038/s41586-020-1942-4
[2]	Khwa W S, Akarvardar K, Chen Y S, et al. MLC PCM techniques to improve nerual network inference retention time by 105X and reduce accuracy degradation by 10.8X. Proc IEEE Symp VLSI Technol, 2020, 1
[3]	Zhang W Y, Wang S C, Li Y, et al. Few-shot graph learning with robust and energy-efficient memory-augmented graph neural network (MAGNN) based on homogeneous computing-in-memory. 2022 IEEE Symposium on VLSI Technology and Circuits (VLSI Technology and Circuits), 2022, 224
[4]	Kumar S, Wang X X, Strachan J P, et al. Dynamical memristors for higher-complexity neuromorphic computing. Nat Rev Mater, 2022, 7, 575 doi: 10.1038/s41578-022-00434-z
[5]	Lu Y M, Li X, Yan B N, et al. In-memory realization of eligibility traces based on conductance drift of phase change memory for energy-efficient reinforcement learning. Adv Mater, 2022, 34, 2107811 doi: 10.1002/adma.202107811
[6]	Huang P, Zhou Z, Zhang Y, et al. Dual-configuration in-memory computing bitcells using SiO_x RRAM for binary neural networks. APL Mater, 2019, 7, 081105 doi: 10.1063/1.5116863
[7]	Chang C C, Chen P C, Chou T, et al. Mitigating asymmetric nonlinear weight update effects in hardware neural network based on analog resistive synapse. IEEE J Emerg Sel Top Circuits Syst, 2018, 8, 116 doi: 10.1109/JETCAS.2017.2771529
[8]	Ravsher T, Garbin D, Fantini A, et al. Enhanced performance and low-power capability of SiGeAsSe-GeSbTe 1S1R phase-change memory operated in bipolar mode. 2022 IEEE Symposium on VLSI Technology and Circuits (VLSI Technology and Circuits), 2022, 312
[9]	Ielmini D, Ghetti A, Spinelli A S, et al. A study of hot-hole injection during programming drain disturb in flash memories. IEEE Trans Electron Devices, 2006, 53, 668 doi: 10.1109/TED.2006.870280
[10]	Wu W, Wu H Q, Gao B, et al. A methodology to improve linearity of analog RRAM for neuromorphic computing. 2018 IEEE Symposium on VLSI Technology, 2018, 103 doi: 10.1109/VLSIT.2018.8510690
[11]	Wang Q W, Park Y, Lu W D. Device variation effects on neural network inference accuracy in analog In-memory computing systems. Adv Intell Syst, 2022, 4, 2100199 doi: 10.1002/aisy.202100199
[12]	Ogawa S, Shiono N. Interface-trap generation induced by hot-hole injection at the Si-SiO₂ interface. Appl Phys Lett, 1992, 61, 807 doi: 10.1063/1.107751
[13]	Choi W, Kwak M, Heo S, et al. Hardware neural network using hybrid synapses via transfer learning: WO_x nano-resistors and TiO_x RRAM synapse for energy-efficient edge-AI sensor. 2021 IEEE International Electron Devices Meeting (IEDM), 2021, 23.1. 1
[14]	Ali T, Seidel K, Kühnel K, et al. A novel dual ferroelectric layer based MFMFIS FeFET with optimal stack tuning toward low power and high-speed NVM for neuromorphic applications. 2020 IEEE Symposium on VLSI Technology, 2020, 1 doi: 10.1109/VLSITechnology18217.2020.9265111
[15]	Lue H T, Hsu P K, Wei M L, et al. Optimal design methods to transform 3D NAND flash into a high-density, high-bandwidth and low-power nonvolatile computing in memory (nvCIM) accelerator for deep-learning neural networks (DNN). 2019 IEEE International Electron Devices Meeting (IEDM), 2020, 38.1.1 doi: 10.1109/IEDM19573.2019.8993652
[16]	Malavena G, Spinelli A S, Compagnoni C M. Implementing spike-timing-dependent plasticity and unsupervised learning in a mainstream NOR flash memory array. 2018 IEEE International Electron Devices Meeting (IEDM), 2019, 2.3.1 doi: 10.1109/IEDM.2018.8614561

Fig. 1. (Color online) Schematics of flash-based CIM architecture. The pulse time of V_g and the threshold voltage is individually mapped as vector and matrix, then the amount of charge can represent the result of MVM.

DownLoad: Full-Size Img PowerPoint

Fig. 2. (Color online) (a) Schematic of adopted CHEI and HHI programming scheme. (b) The energy band diagram of CHEI and HHI programming scheme.

DownLoad: Full-Size Img PowerPoint

Fig. 3. (Color online) The architecture of (a) ResNet 50 and (b) VGG 16 convolutional neural network.

DownLoad: Full-Size Img PowerPoint

Fig. 4. (Color online) (a) The proposed scheme to improve both endurance and speed by optimizing the operation scheme for NN online training. (b) The comparison of the V_th tuning speed of FN tunneling and the HHI. (c) The high linearity and symmetric potentiation and depression process using the CHEI and the HHI combined methods.

DownLoad: Full-Size Img PowerPoint

Fig. 5. (Color online) (a) The I–V curves of the programmed/erased state before and after 109 cycles. (b) Enhancements of endurance at lower MW show the trade-off between MW and endurance. (c) SS value and (d) I_off of different MW and cycles compared with the traditionalprogramming scheme, wherein each box contains 15 different memory cells.

DownLoad: Full-Size Img PowerPoint

Fig. 6. (Color online) (a) Comparisons between the proposed scheme and the traditional scheme. (b) Read disturbance of different states after 10⁹ cycles. (c) Applications in CIFAR-10 using ResNet50 and Vgg16. Even after 10⁹ cycles, ~90% accuracy can be achieved for the CIFAR-10 task.

DownLoad: Full-Size Img PowerPoint

Table 1. The benchmark of this work and various non-volatile CIM devices.

Ref	Cell type	On/off ratio	Pgm.speed	Endurance	DR (s)
[8]	PCM	10⁴	–	10⁸	10⁶
[13]	RRAM	10³	1 μs	–	10⁴
[14]	FeFET	10⁵	300 ns	10⁵	10⁴
[15]	3D flash	10⁵	–	10⁵	10⁵
[16]	Flash	10²	10 μs	10⁵	–
This work (optimized operation)	Flash	10⁶ (before cycles)	10 ns	10⁹	10⁵

DownLoad: CSV

[1]	Yao P, Wu H Q, Gao B, et al. Fully hardware-implemented memristor convolutional neural network. Nature, 2020, 577, 641 doi: 10.1038/s41586-020-1942-4
[2]	Khwa W S, Akarvardar K, Chen Y S, et al. MLC PCM techniques to improve nerual network inference retention time by 105X and reduce accuracy degradation by 10.8X. Proc IEEE Symp VLSI Technol, 2020, 1
[3]	Zhang W Y, Wang S C, Li Y, et al. Few-shot graph learning with robust and energy-efficient memory-augmented graph neural network (MAGNN) based on homogeneous computing-in-memory. 2022 IEEE Symposium on VLSI Technology and Circuits (VLSI Technology and Circuits), 2022, 224
[4]	Kumar S, Wang X X, Strachan J P, et al. Dynamical memristors for higher-complexity neuromorphic computing. Nat Rev Mater, 2022, 7, 575 doi: 10.1038/s41578-022-00434-z
[5]	Lu Y M, Li X, Yan B N, et al. In-memory realization of eligibility traces based on conductance drift of phase change memory for energy-efficient reinforcement learning. Adv Mater, 2022, 34, 2107811 doi: 10.1002/adma.202107811
[6]	Huang P, Zhou Z, Zhang Y, et al. Dual-configuration in-memory computing bitcells using SiO_x RRAM for binary neural networks. APL Mater, 2019, 7, 081105 doi: 10.1063/1.5116863
[7]	Chang C C, Chen P C, Chou T, et al. Mitigating asymmetric nonlinear weight update effects in hardware neural network based on analog resistive synapse. IEEE J Emerg Sel Top Circuits Syst, 2018, 8, 116 doi: 10.1109/JETCAS.2017.2771529
[8]	Ravsher T, Garbin D, Fantini A, et al. Enhanced performance and low-power capability of SiGeAsSe-GeSbTe 1S1R phase-change memory operated in bipolar mode. 2022 IEEE Symposium on VLSI Technology and Circuits (VLSI Technology and Circuits), 2022, 312
[9]	Ielmini D, Ghetti A, Spinelli A S, et al. A study of hot-hole injection during programming drain disturb in flash memories. IEEE Trans Electron Devices, 2006, 53, 668 doi: 10.1109/TED.2006.870280
[10]	Wu W, Wu H Q, Gao B, et al. A methodology to improve linearity of analog RRAM for neuromorphic computing. 2018 IEEE Symposium on VLSI Technology, 2018, 103 doi: 10.1109/VLSIT.2018.8510690
[11]	Wang Q W, Park Y, Lu W D. Device variation effects on neural network inference accuracy in analog In-memory computing systems. Adv Intell Syst, 2022, 4, 2100199 doi: 10.1002/aisy.202100199
[12]	Ogawa S, Shiono N. Interface-trap generation induced by hot-hole injection at the Si-SiO₂ interface. Appl Phys Lett, 1992, 61, 807 doi: 10.1063/1.107751
[13]	Choi W, Kwak M, Heo S, et al. Hardware neural network using hybrid synapses via transfer learning: WO_x nano-resistors and TiO_x RRAM synapse for energy-efficient edge-AI sensor. 2021 IEEE International Electron Devices Meeting (IEDM), 2021, 23.1. 1
[14]	Ali T, Seidel K, Kühnel K, et al. A novel dual ferroelectric layer based MFMFIS FeFET with optimal stack tuning toward low power and high-speed NVM for neuromorphic applications. 2020 IEEE Symposium on VLSI Technology, 2020, 1 doi: 10.1109/VLSITechnology18217.2020.9265111
[15]	Lue H T, Hsu P K, Wei M L, et al. Optimal design methods to transform 3D NAND flash into a high-density, high-bandwidth and low-power nonvolatile computing in memory (nvCIM) accelerator for deep-learning neural networks (DNN). 2019 IEEE International Electron Devices Meeting (IEDM), 2020, 38.1.1 doi: 10.1109/IEDM19573.2019.8993652
[16]	Malavena G, Spinelli A S, Compagnoni C M. Implementing spike-timing-dependent plasticity and unsupervised learning in a mainstream NOR flash memory array. 2018 IEEE International Electron Devices Meeting (IEDM), 2019, 2.3.1 doi: 10.1109/IEDM.2018.8614561

Search

GET CITATION

shu

Export: BibTex EndNote

Article Metrics

Article views: 2164 Times PDF downloads: 171 Times Cited by: 0 Times

History

Received: 14 July 2023 Revised: 09 September 2023 Online: Accepted Manuscript: 10 November 2023Uncorrected proof: 11 December 2023Published: 10 January 2024

With the rapid development of machine learning, the demand for high-efficient computing becomes more and more urgent. To break the bottleneck of the traditional Von Neumann architecture, computing-in-memory (CIM) has attracted increasing attention in recent years. In this work, to provide a feasible CIM solution for the large-scale neural networks (NN) requiring continuous weight updating in online training, a flash-based computing-in-memory with high endurance (10⁹ cycles) and ultra-fast programming speed is investigated. On the one hand, the proposed programming scheme of channel hot electron injection (CHEI) and hot hole injection (HHI) demonstrate high linearity, symmetric potentiation, and a depression process, which help to improve the training speed and accuracy. On the other hand, the low-damage programming scheme and memory window (MW) optimizations can suppress cell degradation effectively with improved computing accuracy. Even after 10⁹ cycles, the leakage current (I_off) of cells remains sub-10pA, ensuring the large-scale computing ability of memory. Further characterizations are done on read disturb to demonstrate its robust reliabilities. By processing CIFAR-10 tasks, it is evident that ~90% accuracy can be achieved after 10⁹ cycles in both ResNet50 and VGG16 NN. Our results suggest that flash-based CIM has great potential to overcome the limitations of traditional Von Neumann architectures and enable high-performance NN online training, which pave the way for further development of artificial intelligence (AI) accelerators.

香蕉久久这里只有精品-91国产自拍免费视频-免费A级毛片无码专区网站-无码八A片人妻少妇久久-特黄三级又长又粗又爽-国产精品人成在线播放-国产男女猛烈无遮挡性视频网站-丰满五十路熟女高清免费视频-欧美日韩午夜激情福利

Optimized operation scheme of flash-memory-based neural network online training with ultra-high endurance

References

Search

GET CITATION

Share:

Article Metrics

History

Catalog

Email This Article

Optimized operation scheme of flash-memory-based neural network online training with ultra-high endurance

DOI: 10.1088/1674-4926/45/1/012301

Abstract

References

Proportional views

Catalog

Optimized operation scheme of flash-memory-based neural network online training with ultra-high endurance

References

Search

GET CITATION

Share:

Article Metrics

History

Catalog

Email This Article

Optimized operation scheme of flash-memory-based neural network online training with ultra-high endurance

DOI: 10.1088/1674-4926/45/1/012301

Abstract

References

Proportional views

Catalog

Export File

Citation

Format

Content