Jump2Paper archive

문서 001

Agent Lightning: Train ANY AI Agents with Reinforcement Learning

multi-turn GRPOveRLmulti-agent LLMagent frameworksDeepSeek-R1

Korean

읽기 →

문서 002

Automatic Curriculum LearningFor Deep RL: A Short Survey

Multi-Goal RLIntrinsic MotivationPCG for RLSim2Real TransferSelf-Play

Korean

읽기 →

문서 003

Born-Again Neural Networks

Label SmoothingKnowledge DistillationDark KnowledgeKnowledge Distillation SurveySelf-Distillation

Korean

읽기 →

문서 004

Curriculum Learning for LLM Pretraining:An Analysis of Learning Dynamics

curriculum learningLLM pretrainingPythia scalingHMM training trajectorypretraining data mixture

Korean

읽기 →

문서 005

DUMP: Automated Distribution-Level Curriculum Learning for RL-based LLM Post-training

multi-armed bandit분포 샘플링data mixture trainingUCB banditcurriculum learning

Korean

읽기 →

문서 006

Dynamic Loss-Based Sample Reweighting for Improved Large Language Model Pretraining

데이터 효율적 학습Sample Efficiency LLMData Selection for LLMsSlimPajamaDRO Machine Learning

Korean

읽기 →

문서 007

Exclusive Self Attention

Attention MechanismLong ContextSA-FFN 역할 분담Attention SinkTransformer 개선

Korean

읽기 →

문서 008

KV Cache Transform Codingfor Compact Storage in LLM Inference

PCA decorrelationadaptive quantizationlearned transform codingspeculative decodingtransform coding

Korean

읽기 →

문서 009

KVzap: Fast, Adaptive, and Faithful KV Cache Pruning

LLM inference efficiencylong-context LLMKV quantizationsurrogate modelreasoning efficiency

Korean

읽기 →

문서 010

Mamba: Linear-Time Sequence Modeling with Selective State Spaces

Mamba-2S4linear-time sequence modelingstate space modelHyenaDNA

Korean

읽기 →

문서 011

PivotRL: High Accuracy Agentic Post-Training at Low Compute Cost

curriculum learning RLagentic post-trainingBC-to-RL bridgenatural gradient RLSWE-Bench

Korean

읽기 →

문서 012

RegMix: Data Mixture as Regression for Language Model Pre-Training

pre-training data selectiondata mixturegradient boostingDirichlet samplingproxy model methods

Korean

읽기 →

문서 013

Swin Transformer: Hierarchical Vision Transformer using Shifted Windows

Hierarchical Vision TransformerShifted Window AttentionSemantic SegmentationHierarchical Feature MapMasked Image Modeling

Korean

읽기 →

문서 014

The Forward-Forward Algorithm:Some Preliminary Investigations

예측 코딩Noise Contrastive Estimation대조 학습역전파 대안Forward-Forward Algorithm

Korean

읽기 →

문서 015

Think Anywhere in Code Generation

inline reasoninginterleaved thinkingchain-of-thoughtcode generation LLMLoRA

Korean

읽기 →

문서 016

TiKMiX: Take Data Influence into Dynamic Mixture for Language Model Pre-training

domain adaptationcurriculum learningdata mixturedynamic data mixtureproxy-free data selection

Korean

읽기 →

문서 017

Transformers are RNNs:Fast Autoregressive Transformers with Linear Attention

linear attentionTransformer–RNN 등가성kernel self-attentionRetNet선형 언어 모델

Korean

읽기 →

문서 018

TurboQuant: Online Vector Quantization with Near-optimal Distortion Rate

랜덤 회전 양자화온라인 양자화KV cache quantizationvector quantizationonline vector quantization

Korean

읽기 →

문서 019

Very Large-Scale Multi-Agent Simulation in AgentScope

AgentScopeGenerative AgentsDistributed AIGame Theory + LLMLarge-Scale Multi-Agent Simulation

Korean

읽기 →

Jump2Paper Archive

Agent Lightning: Train ANY AI Agents with Reinforcement Learning

Automatic Curriculum LearningFor Deep RL: A Short Survey

Born-Again Neural Networks

Curriculum Learning for LLM Pretraining:An Analysis of Learning Dynamics

DUMP: Automated Distribution-Level Curriculum Learning for RL-based LLM Post-training

Dynamic Loss-Based Sample Reweighting for Improved Large Language Model Pretraining

Exclusive Self Attention

KV Cache Transform Codingfor Compact Storage in LLM Inference

KVzap: Fast, Adaptive, and Faithful KV Cache Pruning

Mamba: Linear-Time Sequence Modeling with Selective State Spaces

PivotRL: High Accuracy Agentic Post-Training at Low Compute Cost

RegMix: Data Mixture as Regression for Language Model Pre-Training

Swin Transformer: Hierarchical Vision Transformer using Shifted Windows

The Forward-Forward Algorithm:Some Preliminary Investigations

Think Anywhere in Code Generation

TiKMiX: Take Data Influence into Dynamic Mixture for Language Model Pre-training

Transformers are RNNs:Fast Autoregressive Transformers with Linear Attention

TurboQuant: Online Vector Quantization with Near-optimal Distortion Rate

Very Large-Scale Multi-Agent Simulation in AgentScope