시가총액: $3.2872T 0.380%
거래량(24시간): $81.5121B -1.040%
  • 시가총액: $3.2872T 0.380%
  • 거래량(24시간): $81.5121B -1.040%
  • 공포와 탐욕 지수:
  • 시가총액: $3.2872T 0.380%
암호화
주제
암호화
소식
cryptostopics
비디오
최고의 뉴스
암호화
주제
암호화
소식
cryptostopics
비디오
bitcoin
bitcoin

$105829.665817 USD

0.28%

ethereum
ethereum

$2575.126838 USD

1.78%

tether
tether

$1.000249 USD

-0.02%

xrp
xrp

$2.175291 USD

1.30%

bnb
bnb

$651.619775 USD

0.64%

solana
solana

$155.699632 USD

6.94%

usd-coin
usd-coin

$0.999848 USD

0.00%

dogecoin
dogecoin

$0.176139 USD

-0.84%

tron
tron

$0.271683 USD

0.03%

cardano
cardano

$0.638069 USD

1.76%

hyperliquid
hyperliquid

$42.236115 USD

3.89%

sui
sui

$3.069457 USD

2.84%

bitcoin-cash
bitcoin-cash

$456.825549 USD

4.82%

chainlink
chainlink

$13.442800 USD

1.49%

unus-sed-leo
unus-sed-leo

$9.270180 USD

1.71%

암호화폐 뉴스 기사

Co-citation based data augmentation for contrastive learning of scientific domains

2025/04/29 14:08

Co-citation based data augmentation for contrastive learning of scientific domains

Data compilation

We used co-citations as a similarity heuristic to generate sufficiently large training datasets for contrastive learning over scientific domains. Our strategy enabled the production of large training datasets from small amounts of data due to the nonlinear scaling of citation graphs, as a single paper citing N other papers produces (N2) co-citation pairs. For context, a dataset of 10,000 individual papers can produce well over 125,000 co-citation pairs. While this measurement of similarity is not perfect, co-citations have generally been shown to imply a high degree of similarity between papers21. We assume for our modeling purposes that two co-cited papers are more similar than two random papers, even if they are from the same field.

To build our dataset, we randomly chose five biomedical subfields with little overlap. The domains of choice include papers related to cardiovascular disease (CVD), chronic obstructive pulmonary disease (COPD), parasitic diseases, autoimmune diseases, and skin cancers. PubMed Central was queried with Medical Subject Heading (MeSH) terms for each domain, requiring at least one citation and an abstract present between 2010 and 2022. This means that within the time period, we kept the co-citation pairs of the possible (N2) co-citations per paper that were returned from the same common MeSH terms. We sampled preferentially from samples co-cited more times when constructing our final dataset.

For evaluation, we constructed “negative” examples of abstract pairs that were not co-cited. The training dataset was split randomly in a 99:1 ratio followed by deduplication. We built negative pairs by pairing abstracts that had not been co-cited and had both been cited at least 15 times. This criteria allowed us to construct a representative evaluation set for binary classification with balanced classes, with 1’s for co-cited pairs and 0 if not. The exact dataset counts are outlined in Table 1.

Transformer neural networks

The transformer architecture is adept at sequential processing and is state-of-the-art for various natural language processing (NLP) and vision tasks24,25,26,27,28,29,30. A transformer block comprised a self-attention layer and multi-layer perception (MLP) interleaved with skip connections. Full transformers were made of T transformer blocks stacked together1.

Prior to the transformer blocks is the token embedding process, where tokenization maps an input string into a list of L integers from a dictionary. These integers served as the indices for a matrix We, where each row is a learnable representative vector for that token, making We∈ Rv×d where v is the total number of unique tokens in the vocabulary and d an arbitrarily chosen hidden dimension. The initial embedding is Rl×d.

Each block in the transformer then transforms this embedding, i.e., the i^{th} transformer block maps the embedding X(i-1) = [x1(i-1), ..., xL(i-1)]⊤∈Rl×d to X(i) = [x1(i), ..., xL(i)]⊤∈Rl × d1,31,32. X(T) is the last hidden state of the network. The first part of this map is self-attention, which mixes information across the vectors, followed by the MLP which mixes information across d31,33.

Including the MLP, the entire transformer block can be written as:

where b1 and b2 are biases associated with learned linear transformations W1 ∈ Rd×I and W2 ∈ RI×d, where I > d. The activation function σ, e.g., ReLU or GeLU, introduces non-linearity1. More recently, biases are not included, which improves training stability, throughput, and final performance. Additionally, improvements like SwiGLU activation functions and rotary positional embeddings are also commonly utilized3,4,34,35.

GPT (Generative Pretrained Transformer) models, such as OpenAI’s GPT series (GPT-3, GPT-4, etc.), are designed for generative tasks and use transformer decoders36,37,38. They employ causal (unidirectional) attention, meaning each token attends only to previous tokens in the sequence, enabling autoregressive generation during inference. This allows them to predict the next word in a sequence without direct access to future words.

In contrast, BERT models utilize transformer encoders with bidirectional attention, meaning they can attend to all tokens within an input simultaneously. This structure enables them to capture additional contextual dependencies, making them well-suited for tasks like text classification and sentence similarity39. Unlike GPT models, BERT is trained using a masked language modeling (MLM) objective, where some tokens are randomly hidden, requiring the model to predict them based on the surrounding context.

Mixture of Experts

Mixture of

부인 성명:info@kdj.com

제공된 정보는 거래 조언이 아닙니다. kdj.com은 이 기사에 제공된 정보를 기반으로 이루어진 투자에 대해 어떠한 책임도 지지 않습니다. 암호화폐는 변동성이 매우 높으므로 철저한 조사 후 신중하게 투자하는 것이 좋습니다!

본 웹사이트에 사용된 내용이 귀하의 저작권을 침해한다고 판단되는 경우, 즉시 당사(info@kdj.com)로 연락주시면 즉시 삭제하도록 하겠습니다.

2025年06月17日 에 게재된 다른 기사