본문 바로가기

전체 글

(259)

[논문 공부] Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism 원문 : https://arxiv.org/abs/1909.08053 Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism Recent work in language modeling demonstrates that training large transformer models advances the state of the art in Natural Language Processing applications. However, very large models can be quite difficult to train due to memory constraints. In this wo arxiv.org 개요 이번 포..

[논문 공부] LUKE: Deep Contextualized Entity Representations with Entity-aware Self-attention [논문 공부] LUKE: Deep Contextualized Entity Representations with Entity-aware Self-attention 원문 : https://arxiv.org/abs/2010.01057 LUKE: Deep Contextualized Entity Representations with Entity-aware Self-attention Entity representations are useful in natural language tasks involving entities. In this paper, we propose new pretrained contextualized representations of words and entities based on the b..

[논문 공부] BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension [논문 공부] BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension 원문 : https://arxiv.org/abs/1910.13461 BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension We present BART, a denoising autoencoder for pretraining sequence-to-sequence models. BART is trained by (1) corrupting text..

[논문 공부] ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators 원문 : https://arxiv.org/abs/2003.10555 ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators Masked language modeling (MLM) pre-training methods such as BERT corrupt the input by replacing some tokens with [MASK] and then train a model to reconstruct the original tokens. While they produce good res..

[논문 공부] ALBERT : A LITE BERT FOR SELF-SUPERVISED LEARNING OF LANGUAGE REPRESENTATIONS [논문 공부] ALBERT : A Lite BERT for Self-supervised Learning of Language Representations 원문 : https://arxiv.org/abs/1909.11942 ALBERT: A Lite BERT for Self-supervised Learning of Language Representations Increasing model size when pretraining natural language representations often results in improved performance on downstream tasks. However, at some point further model increases become harder due t..

내가 볼라고 쓰는 CUDA 부터 pytorch 설치 까지 폐쇄망에 CUDA 설치부터 pytorch 설치 뻘짓하고 내가 볼라고 쓰는 포스팅 **제일 중요함** 폐쇄망인 경우, 인생 편하게 사는 법 온라인 망에서 도커이미지 만들어서 오자 간단한 커맨드 자기 컴퓨터 윈도우 버전 확인 window키 + R(실행창)에 winver 검색하면 확인 가능 conda install list 저장 conda list --export > list.txt conda list 사용해서 설치 conda install --file list.txt conda 가상환경 생성 conda create -n test python=3.9 conda 가상환경 리스트 conda env list Pytorch 설치까지 1. 컴퓨터 그래픽 카드 확인 nvidia 정보에서 확인하거나 뭐 암튼 알아서 확인 ..

[논문 공부] ELMO : Deep contextualized word representations [논문 공부] ELMO : Deep contextualized word representations 원문 : https://arxiv.org/abs/1802.05365 Deep contextualized word representations We introduce a new type of deep contextualized word representation that models both (1) complex characteristics of word use (e.g., syntax and semantics), and (2) how these uses vary across linguistic contexts (i.e., to model polysemy). Our word vectors are arxiv...

[논문 공부] RoBERTa : A Robustly Optimized BERT Pretraining Approach RoBERTa : A Robustly Optimized BERT Pretraining Approach 원문 : https://arxiv.org/abs/1907.11692 RoBERTa: A Robustly Optimized BERT Pretraining Approach Language model pretraining has led to significant performance gains but careful comparison between different approaches is challenging. Training is computationally expensive, often done on private datasets of different sizes, and, as we will show,..

이전 1 ··· 4 5 6 7 8 9 10 ··· 33 다음

티스토리툴바