BiCA-base / README.md
chungimungi's picture
Update README.md
4680192 verified
metadata
tags:
  - sentence-transformers
  - sentence-similarity
  - feature-extraction
  - dense
  - generated_from_trainer
  - dataset_size:95253
  - loss:MultipleNegativesRankingLoss
base_model: thenlper/gte-base
widget:
  - source_sentence: Molecular phylogenetic resolution of the mega-diverse clade Apoditrysia
    sentences:
      - >-
        In a previous study of higher-level arthropod phylogeny, analyses of
        nucleotide sequences from 62 protein-coding nuclear genes for 80
        panarthopod species yielded significantly higher bootstrap support for
        selected nodes than did amino acids. This study investigates the cause
        of that discrepancy. The hypothesis is tested that failure to
        distinguish the serine residues encoded by two disjunct clusters of
        codons (TCN, AGY) in amino acid analyses leads to this discrepancy. In
        one test, the two clusters of serine codons (Ser1, Ser2) are
        conceptually translated as separate amino acids. Analysis of the
        resulting 21-amino-acid data matrix shows striking increases in
        bootstrap support, in some cases matching that in nucleotide analyses.
        In a second approach, nucleotide and 20-amino-acid data sets are
        artificially altered through targeted deletions, modifications, and
        replacements, revealing the pivotal contributions of distinct Ser1 and
        Ser2 codons. We confirm that previous methods of coding nonsynonymous
        nucleotide change are robust and computationally efficient by
        introducing two new degeneracy coding methods. We demonstrate for
        degeneracy coding that neither compositional heterogeneity at the level
        of nucleotides nor codon usage bias between Ser1 and Ser2 clusters of
        codons (or their separately coded amino acids) is a major source of
        non-phylogenetic signal. The incongruity in support between amino-acid
        and nucleotide analyses of the forementioned arthropod data set is
        resolved by showing that "standard" 20-amino-acid analyses yield lower
        node support specifically when serine provides crucial signal. Separate
        coding of Ser1 and Ser2 residues yields support commensurate with that
        found by degenerated nucleotides, without introducing phylogenetic
        artifacts. While exclusion of all serine data leads to reduced support
        for serine-sensitive nodes, these nodes are still recovered in the ML
        topology, indicating that the enhanced signal from Ser1 and Ser2 is not
        qualitatively different from that of the other amino acids.
      - >-
        Recent molecular phylogenetic studies of the insect order Lepidoptera
        have robustly resolved family-level divergences within most
        superfamilies, and most divergences among the relatively species-poor
        early-arising superfamilies. In sharp contrast, relationships among the
        superfamilies of more advanced moths and butterflies that comprise the
        mega-diverse clade Apoditrysia (ca. 145,000 spp.) remain mostly poorly
        supported. This uncertainty, in turn, limits our ability to discern the
        origins, ages and evolutionary consequences of traits hypothesized to
        promote the spectacular diversification of Apoditrysia. Low support
        along the apoditrysian "backbone" probably reflects rapid
        diversification. If so, it may be feasible to strengthen resolution by
        radically increasing the gene sample, but case studies have been few. We
        explored the potential of next-generation sequencing to conclusively
        resolve apoditrysian relationships. We used transcriptome RNA-Seq to
        generate 1579 putatively orthologous gene sequences across a broad
        sample of 40 apoditrysians plus four outgroups, to which we added two
        taxa from previously published data. Phylogenetic analysis of a
        46-taxon, 741-gene matrix, resulting from a strict filter that
        eliminated ortholog groups containing any apparent paralogs, yielded
        dramatic overall increase in bootstrap support for deeper nodes within
        Apoditrysia as compared to results from previous and concurrent 19-gene
        analyses. High support was restricted mainly to the huge subclade
        Obtectomera broadly defined, in which 11 of 12 nodes subtending multiple
        superfamilies had bootstrap support of 100%. The strongly supported
        nodes showed little conflict with groupings from previous studies, and
        were little affected by changes in taxon sampling, suggesting that they
        reflect true signal rather than artifacts of massive gene sampling. In
        contrast, strong support was seen at only 2 of 11 deeper nodes among the
        "lower", non-obtectomeran apoditrysians. These represent a much harder
        phylogenetic problem, for which one path to resolution might include
        further increase in gene sampling, together with improved orthology
        assignments. 
      - >-
        One of the major challenges in cell implantation therapies is to promote
        integration of the microcirculation between the implanted cells and the
        host. We used adipose-derived stromal vascular fraction (SVF) cells to
        vascularize a human liver cell (HepG2) implant. We hypothesized that the
        SVF cells would form a functional microcirculation via vascular assembly
        and inosculation with the host vasculature. Initially, we assessed the
        extent and character of neovasculatures formed by freshly isolated and
        cultured SVF cells and found that freshly isolated cells have a higher
        vascularization potential. Generation of a 3D implant containing fresh
        SVF and HepG2 cells formed a tissue in which HepG2 cells were entwined
        with a network of microvessels. Implanted HepG2 cells sequestered
        labeled LDL delivered by systemic intravascular injection only in
        SVF-vascularized implants demonstrating that SVF cell-derived
        vasculatures can effectively integrate with host vessels and interface
        with parenchymal cells to form a functional tissue mimic. 
  - source_sentence: Exosomes as drug delivery systems for gastrointestinal cancers
    sentences:
      - >-
        Gastrointestinal cancer is one of the most common malignancies with
        relatively high morbidity and mortality. Exosomes are nanosized
        extracellular vesicles derived from most cells and widely distributed in
        body fluids. They are natural endogenous nanocarriers with low
        immunogenicity, high biocompatibility, and natural targeting, and can
        transport lipids, proteins, DNA, and RNA. Exosomes contain DNA, RNA,
        proteins, lipids, and other bioactive components, which can play a role
        in information transmission and regulation of cellular physiological and
        pathological processes during the progression of gastrointestinal
        cancer. In this paper, the role of exosomes in gastrointestinal cancers
        is briefly reviewed, with emphasis on the application of exosomes as
        drug delivery systems for gastrointestinal cancers. Finally, the
        challenges faced by exosome-based drug delivery systems are discussed.
      - >-
        Background In the myocardium, pericytes are often confused with other
        interstitial cell types, such as fibroblasts. The lack of
        well-characterized and specific tools for identification, lineage
        tracing, and conditional targeting of myocardial pericytes has hampered
        studies on their role in heart disease. In the current study, we
        characterize and validate specific and reliable strategies for labeling
        and targeting of cardiac pericytes. Methods and Results Using the
        neuron-glial antigen 2 (NG2)
      - >-
        Exosomes are small extracellular vesicles with diameters of 30-150 nm.
        In both physiological and pathological conditions, nearly all types of
        cells can release exosomes, which play important roles in cell
        communication and epigenetic regulation by transporting crucial protein
        and genetic materials such as miRNA, mRNA, and DNA. Consequently,
        exosome-based disease diagnosis and therapeutic methods have been
        intensively investigated. However, as in any natural science field, the
        in-depth investigation of exosomes relies heavily on technological
        advances. Historically, the two main technical hindrances that have
        restricted the basic and applied researches of exosomes include, first,
        how to simplify the extraction and improve the yield of exosomes and,
        second, how to effectively distinguish exosomes from other extracellular
        vesicles, especially functional microvesicles. Over the past few
        decades, although a standardized exosome isolation method has still not
        become available, a number of techniques have been established through
        exploration of the biochemical and physicochemical features of exosomes.
        In this work, by comprehensively analyzing the progresses in exosome
        separation strategies, we provide a panoramic view of current exosome
        isolation techniques, providing perspectives toward the development of
        novel approaches for high-efficient exosome isolation from various types
        of biological matrices. In addition, from the perspective of
        exosome-based diagnosis and therapeutics, we emphasize the issue of
        quantitative exosome and microvesicle separation.
  - source_sentence: >-
      Comparison of pesticide active substances in conventional agriculture and
      organic agriculture in Europe
    sentences:
      - >-
        Total concentrations of metals in soil are poor predictors of toxicity.
        In the last decade, considerable effort has been made to demonstrate how
        metal toxicity is affected by the abiotic properties of soil. Here this
        information is collated and shows how these data have been used in the
        European Union for defining predicted-no-effect concentrations (PNECs)
        of Cd, Cu, Co, Ni, Pb, and Zn in soil. Bioavailability models have been
        calibrated using data from more than 500 new chronic toxicity tests in
        soils amended with soluble metal salts, in experimentally aged soils,
        and in field-contaminated soils. In general, soil pH was a good
        predictor of metal solubility but a poor predictor of metal toxicity
        across soils. Toxicity thresholds based on the free metal ion activity
        were generally more variable than those expressed on total soil metal,
        which can be explained, but not predicted, using the concept of the
        biotic ligand model. The toxicity thresholds based on total soil metal
        concentrations rise almost proportionally to the effective cation
        exchange capacity of soil. Total soil metal concentrations yielding 10%
        inhibition in freshly amended soils were up to 100-fold smaller (median
        3.4-fold, n = 110 comparative tests) than those in corresponding aged
        soils or field-contaminated soils. The change in isotopically
        exchangeable metal in soil proved to be a conservative estimate of the
        change in toxicity upon aging. The PNEC values for specific soil types
        were calculated using this information. The corrections for aging and
        for modifying effects of soil properties in metal-salt-amended soils are
        shown to be the main factors by which PNEC values rise above the natural
        background range.
      - >-
        There is much debate about whether the (mostly synthetic) pesticide
        active substances (AS) in conventional agriculture have different
        non-target effects than the natural AS in organic agriculture. We
        evaluated the official EU pesticide database to compare 256 AS that may
        only be used on conventional farmland with 134 AS that are permitted on
        organic farmland. As a benchmark, we used (i) the hazard classifications
        of the Globally Harmonized System (GHS), and (ii) the dietary and
        occupational health-based guidance values, which were established in the
        authorization procedure. Our comparison showed that 55% of the AS used
        only in conventional agriculture contained health or environmental
        hazard statements, but only 3% did of the AS authorized for organic
        agriculture. Warnings about possible harm to the unborn child, suspected
        carcinogenicity, or acute lethal effects were found in 16% of the AS
        used in conventional agriculture, but none were found in organic
        agriculture. Furthermore, the establishment of health-based guidance
        values for dietary and non-dietary exposures were relevant by the
        European authorities for 93% of conventional AS, but only for 7% of
        organic AS. We, therefore, encourage policies and strategies to reduce
        the use and risk of pesticides, and to strengthen organic farming in
        order to protect biodiversity and maintain food security.
      - >-
        Herpes simplex virus 1 (HSV-1) encodes Us3 protein kinase, which is
        critical for viral pathogenicity in both mouse peripheral sites (e.g.,
        eyes and vaginas) and in the central nervous systems (CNS) of mice after
        intracranial and peripheral inoculations, respectively. Whereas some Us3
        substrates involved in Us3 pathogenicity in peripheral sites have been
        reported, those involved in Us3 pathogenicity in the CNS remain to be
        identified. We recently reported that Us3 phosphorylated HSV-1 dUTPase
        (vdUTPase) at serine 187 (Ser-187) in infected cells, and this
        phosphorylation promoted viral replication by regulating optimal
        enzymatic activity of vdUTPase. In the present study, we show that the
        replacement of vdUTPase Ser-187 by alanine (S187A) significantly reduced
        viral replication and virulence in the CNS of mice following
        intracranial inoculation and that the phosphomimetic substitution at
        vdUTPase Ser-187 in part restored the wild-type viral replication and
        virulence. Interestingly, the S187A mutation in vdUTPase had no effect
        on viral replication and pathogenic effects in the eyes and vaginas of
        mice after ocular and vaginal inoculation, respectively. Similarly, the
        enzyme-dead mutation in vdUTPase significantly reduced viral replication
        and virulence in the CNS of mice after intracranial inoculation, whereas
        the mutation had no effect on viral replication and pathogenic effects
        in the eyes and vaginas of mice after ocular and vaginal inoculation,
        respectively. These observations suggested that vdUTPase was one of the
        Us3 substrates responsible for Us3 pathogenicity in the CNS and that the
        CNS-specific virulence of HSV-1 involved strict regulation of vdUTPase
        activity by Us3 phosphorylation.
  - source_sentence: >-
      Load-dependent detachment and reattachment kinetics of kinesin-1, -2 and 3
      motors
    sentences:
      - >-
        Bidirectional cargo transport by kinesin and dynein is essential for
        cell viability and defects are linked to neurodegenerative diseases.
        Computational modeling suggests that the load-dependent off-rate is the
        strongest determinant of which motor 'wins' a kinesin-dynein tug-of-war,
        and optical tweezer experiments find that the load-dependent detachment
        sensitivity of transport kinesins is kinesin-3 > kinesin-2 > kinesin-1.
        However, in reconstituted kinesin-dynein pairs vitro, all three kinesin
        families compete nearly equally well against dynein. Modeling and
        experiments have confirmed that vertical forces inherent to the large
        trapping beads enhance kinesin-1 dissociation rates. In vivo, vertical
        forces are expected to range from negligible to dominant, depending on
        cargo and microtubule geometries. To investigate the detachment and
        reattachment kinetics of kinesin-1, 2 and 3 motors against loads
        oriented parallel to the microtubule, we created a DNA tensiometer
        comprising a DNA entropic spring attached to the microtubule on one end
        and a motor on the other. Kinesin dissociation rates at stall were
        slower than detachment rates during unloaded runs, and the complex
        reattachment kinetics were consistent with a weakly-bound 'slip' state
        preceding detachment. Kinesin-3 behaviors under load suggested that long
        KIF1A run lengths result from the concatenation of multiple short runs
        connected by diffusive episodes. Stochastic simulations were able to
        recapitulate the load-dependent detachment and reattachment kinetics for
        all three motors and provide direct comparison of key transition rates
        between families. These results provide insight into how kinesin-1, -2
        and -3 families transport cargo in complex cellular geometries and
        compete against dynein during bidirectional transport.
      - >-
        AP-1 and AP-2 adaptor protein (AP) complexes mediate clathrin-dependent
        trafficking at the trans-Golgi network (TGN) and the plasma membrane,
        respectively. Whereas AP-1 is required for trafficking to plasma
        membrane and vacuoles, AP-2 mediates endocytosis. These AP complexes
        consist of four subunits (adaptins): two large subunits (β1 and γ for
        AP-1 and β2 and α for AP-2), a medium subunit μ, and a small subunit σ.
        In general, adaptins are unique to each AP complex, with the exception
        of β subunits that are shared by AP-1 and AP-2 in some invertebrates.
        Here, we show that the two putative Arabidopsis thaliana AP1/2β adaptins
        co-assemble with both AP-1 and AP-2 subunits and regulate exocytosis and
        endocytosis in root cells, consistent with their dual localization at
        the TGN and plasma membrane. Deletion of both β adaptins is lethal in
        plants. We identified a critical role of β adaptins in pollen wall
        formation and reproduction, involving the regulation of membrane
        trafficking in the tapetum and pollen germination. In tapetal cells, β
        adaptins localize almost exclusively to the TGN and mediate exocytosis
        of the plasma membrane transporters such as ATP-binding cassette (ABC)G9
        and ABCG16. This study highlights the essential role of AP1/2β adaptins
        in plants and their specialized roles in specific cell types.
      - >-
        A single kinesin molecule can move "processively" along a microtubule
        for more than 1 micrometer before detaching from it. The prevailing
        explanation for this processive movement is the "walking model," which
        envisions that each of two motor domains (heads) of the kinesin molecule
        binds coordinately to the microtubule. This implies that each kinesin
        molecule must have two heads to "walk" and that a single-headed kinesin
        could not move processively. Here, a motor-domain construct of KIF1A, a
        single-headed kinesin superfamily protein, was shown to move
        processively along the microtubule for more than 1 micrometer. The
        movement along the microtubules was stochastic and fitted a biased
        Brownian-movement model.
  - source_sentence: >-
      Phylogenetic analysis of mitochondrial genes in Macquarie perch from three
      river basins
    sentences:
      - >-
        Sedentary behavior is an emerging risk factor for cardiovascular disease
        (CVD) and may be particularly relevant to the cardiovascular health of
        older adults. This scoping review describes the existing literature
        examining the prevalence of sedentary time in older adults with CVD and
        the association of sedentary behavior with cardiovascular risk in older
        adults. We found that older adults with CVD spend >75 % of their waking
        day sedentary, and that sedentary time is higher among older adults with
        CVD than among older adults without CVD. High sedentary behavior is
        consistently associated with worse cardiac lipid profiles and increased
        cardiac risk scores in older adults; the associations of sedentary
        behavior with blood pressure, CVD incidence, and CVD-related mortality
        among older adults are less clear. Future research with larger sample
        sizes using validated methods to measure sedentary behavior are needed
        to clarify the association between sedentary behavior and cardiovascular
        outcomes in older adults.
      - >-
        An improved Bayesian method is presented for estimating phylogenetic
        trees using DNA sequence data. The birth-death process with species
        sampling is used to specify the prior distribution of phylogenies and
        ancestral speciation times, and the posterior probabilities of
        phylogenies are used to estimate the maximum posterior probability (MAP)
        tree. Monte Carlo integration is used to integrate over the ancestral
        speciation times for particular trees. A Markov Chain Monte Carlo method
        is used to generate the set of trees with the highest posterior
        probabilities. Methods are described for an empirical Bayesian analysis,
        in which estimates of the speciation and extinction rates are used in
        calculating the posterior probabilities, and a hierarchical Bayesian
        analysis, in which these parameters are removed from the model by an
        additional integration. The Markov Chain Monte Carlo method avoids the
        requirement of our earlier method for calculating MAP trees to sum over
        all possible topologies (which limited the number of taxa in an analysis
        to about five). The methods are applied to analyze DNA sequences for
        nine species of primates, and the MAP tree, which is identical to a
        maximum-likelihood estimate of topology, has a probability of
        approximately 95%.
      - >-
        Genetic variation in mitochondrial genes could underlie metabolic
        adaptations because mitochondrially encoded proteins are directly
        involved in a pathway supplying energy to metabolism. Macquarie perch
        from river basins exposed to different climates differ in size and
        growth rate, suggesting potential presence of adaptive metabolic
        differences. We used complete mitochondrial genome sequences to build a
        phylogeny, estimate lineage divergence times and identify signatures of
        purifying and positive selection acting on mitochondrial genes for 25
        Macquarie perch from three basins: Murray-Darling Basin (MDB),
        Hawkesbury-Nepean Basin (HNB) and Shoalhaven Basin (SB). Phylogenetic
        analysis resolved basin-level clades, supporting incipient speciation
        previously inferred from differentiation in allozymes, microsatellites
        and mitochondrial control region. The estimated time of lineage
        divergence suggested an early- to mid-Pleistocene split between SB and
        the common ancestor of HNB+MDB, followed by mid-to-late Pleistocene
        splitting between HNB and MDB. These divergence estimates are more
        recent than previous ones. Our analyses suggested that evolutionary
        drivers differed between inland MDB and coastal HNB. In the cooler and
        more climatically variable MDB, mitogenomes evolved under strong
        purifying selection, whereas in the warmer and more climatically stable
        HNB, purifying selection was relaxed. Evidence for relaxed selection in
        the HNB includes elevated transfer RNA and 16S ribosomal RNA
        polymorphism, presence of potentially mildly deleterious mutations and a
        codon (ATP6
pipeline_tag: sentence-similarity
library_name: sentence-transformers
license: mit

SentenceTransformer based on thenlper/gte-base

This is a sentence-transformers model finetuned from thenlper/gte-base. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: thenlper/gte-base
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 768 dimensions
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False, 'architecture': 'BertModel'})
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
    'Phylogenetic analysis of mitochondrial genes in Macquarie perch from three river basins',
    'Genetic variation in mitochondrial genes could underlie metabolic adaptations because mitochondrially encoded proteins are directly involved in a pathway supplying energy to metabolism. Macquarie perch from river basins exposed to different climates differ in size and growth rate, suggesting potential presence of adaptive metabolic differences. We used complete mitochondrial genome sequences to build a phylogeny, estimate lineage divergence times and identify signatures of purifying and positive selection acting on mitochondrial genes for 25 Macquarie perch from three basins: Murray-Darling Basin (MDB), Hawkesbury-Nepean Basin (HNB) and Shoalhaven Basin (SB). Phylogenetic analysis resolved basin-level clades, supporting incipient speciation previously inferred from differentiation in allozymes, microsatellites and mitochondrial control region. The estimated time of lineage divergence suggested an early- to mid-Pleistocene split between SB and the common ancestor of HNB+MDB, followed by mid-to-late Pleistocene splitting between HNB and MDB. These divergence estimates are more recent than previous ones. Our analyses suggested that evolutionary drivers differed between inland MDB and coastal HNB. In the cooler and more climatically variable MDB, mitogenomes evolved under strong purifying selection, whereas in the warmer and more climatically stable HNB, purifying selection was relaxed. Evidence for relaxed selection in the HNB includes elevated transfer RNA and 16S ribosomal RNA polymorphism, presence of potentially mildly deleterious mutations and a codon (ATP6',
    'An improved Bayesian method is presented for estimating phylogenetic trees using DNA sequence data. The birth-death process with species sampling is used to specify the prior distribution of phylogenies and ancestral speciation times, and the posterior probabilities of phylogenies are used to estimate the maximum posterior probability (MAP) tree. Monte Carlo integration is used to integrate over the ancestral speciation times for particular trees. A Markov Chain Monte Carlo method is used to generate the set of trees with the highest posterior probabilities. Methods are described for an empirical Bayesian analysis, in which estimates of the speciation and extinction rates are used in calculating the posterior probabilities, and a hierarchical Bayesian analysis, in which these parameters are removed from the model by an additional integration. The Markov Chain Monte Carlo method avoids the requirement of our earlier method for calculating MAP trees to sum over all possible topologies (which limited the number of taxa in an analysis to about five). The methods are applied to analyze DNA sequences for nine species of primates, and the MAP tree, which is identical to a maximum-likelihood estimate of topology, has a probability of approximately 95%.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 0.9449, 0.8056],
#         [0.9449, 1.0000, 0.7868],
#         [0.8056, 0.7868, 1.0000]])

Training Details

Training Dataset

Unnamed Dataset

  • Size: 95,253 training samples
  • Columns: sentence_0, sentence_1, and sentence_2
  • Approximate statistics based on the first 1000 samples:
    sentence_0 sentence_1 sentence_2
    type string string string
    details
    • min: 6 tokens
    • mean: 19.51 tokens
    • max: 56 tokens
    • min: 3 tokens
    • mean: 223.97 tokens
    • max: 512 tokens
    • min: 51 tokens
    • mean: 309.24 tokens
    • max: 512 tokens
  • Samples:
    sentence_0 sentence_1 sentence_2
    Sox5 modulates the activity of Sox10 in the melanocyte lineage The transcription factor Sox5 has previously been shown in chicken to be expressed in early neural crest cells and neural crest-derived peripheral glia. Here, we show in mouse that Sox5 expression also continues after neural crest specification in the melanocyte lineage. Despite its continued expression, Sox5 has little impact on melanocyte development on its own as generation of melanoblasts and melanocytes is unaltered in Sox5-deficient mice. Loss of Sox5, however, partially rescued the strongly reduced melanoblast generation and marker gene expression in Sox10 heterozygous mice arguing that Sox5 functions in the melanocyte lineage by modulating Sox10 activity. This modulatory activity involved Sox5 binding and recruitment of CtBP2 and HDAC1 to the regulatory regions of melanocytic Sox10 target genes and direct inhibition of Sox10-dependent promoter activation. Both binding site competition and recruitment of corepressors thus help Sox5 to modulate the activity of Sox10 in the melano... Transcripts for a new form of Sox5, called L-Sox5, and Sox6 are coexpressed with Sox9 in all chondrogenic sites of mouse embryos. A coiled-coil domain located in the N-terminal part of L-Sox5, and absent in Sox5, showed >90% identity with a similar domain in Sox6 and mediated homodimerization and heterodimerization with Sox6. Dimerization of L-Sox5/Sox6 greatly increased efficiency of binding of the two Sox proteins to DNA containing adjacent HMG sites. L-Sox5, Sox6 and Sox9 cooperatively activated expression of the chondrocyte differentiation marker Col2a1 in 10T1/2 and MC615 cells. A 48 bp chondrocyte-specific enhancer in this gene, which contains several HMG-like sites that are necessary for enhancer activity, bound the three Sox proteins and was cooperatively activated by the three Sox proteins in non-chondrogenic cells. Our data suggest that L-Sox5/Sox6 and Sox9, which belong to two different classes of Sox transcription factors, cooperate with each other in expression of Col2a1 a...
    are asgard archaea related to eukaryotes Asgard archaea are considered to be the closest known relatives of eukaryotes. Their genomes contain hundreds of eukaryotic signature proteins (ESPs), which inspired hypotheses on the evolution of the eukaryotic cell Eukaryotes evolved from a symbiosis involving alphaproteobacteria and archaea phylogenetically nested within the Asgard clade. Two recent studies explore the metabolic capabilities of Asgard lineages, supporting refined symbiotic metabolic interactions that might have operated at the dawn of eukaryogenesis.
    Fanconi Anemia in Pediatric Medulloblastoma and Fanconi Anemia The outcome of children with medulloblastoma (MB) and Fanconi Anemia (FA), an inherited DNA repair deficiency, has not been described systematically. Treatment is complicated by high vulnerability to treatment-associated side effects, yet structured data are lacking. This study aims to give a comprehensive overview of clinical and molecular characteristics of pediatric FA MB patients. The Sonic Hedgehog (SHH) signaling pathway is indispensable for development, and functions to activate a transcriptional program modulated by the GLI transcription factors. Here, we report that loss of a regulator of the SHH pathway, Suppressor of Fused (Sufu), resulted in early embryonic lethality in the mouse similar to inactivation of another SHH regulator, Patched1 (Ptch1). In contrast to Ptch1+/- mice, Sufu+/- mice were not tumor prone. However, in conjunction with p53 loss, Sufu+/- animals developed tumors including medulloblastoma and rhabdomyosarcoma. Tumors present in Sufu+/-p53-/- animals resulted from Sufu loss of heterozygosity. Sufu+/-p53-/- medulloblastomas also expressed a signature gene expression profile typical of aberrant SHH signaling, including upregulation of N-myc, Sfrp1, Ptch2 and cyclin D1. Finally, the Smoothened inhibitor, hedgehog antagonist, did not block growth of tumors arising from Sufu inactivation. These data demonstrate that Sufu is essential for deve...
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • num_train_epochs: 1
  • max_steps: 20
  • multi_dataset_batch_sampler: round_robin

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: no
  • prediction_loss_only: True
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1
  • num_train_epochs: 1
  • max_steps: 20
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.0
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: round_robin
  • router_mapping: {}
  • learning_rate_mapping: {}

Framework Versions

  • Python: 3.10.14
  • Sentence Transformers: 5.0.0
  • Transformers: 4.52.4
  • PyTorch: 2.6.0+cu124
  • Accelerate: 1.6.0
  • Datasets: 3.6.0
  • Tokenizers: 0.21.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}

If our work was helpful conside citing us ☺️

@misc{sinha2025bicaeffectivebiomedicaldense,
      title={BiCA: Effective Biomedical Dense Retrieval with Citation-Aware Hard Negatives}, 
      author={Aarush Sinha and Pavan Kumar S and Roshan Balaji and Nirav Pravinbhai Bhatt},
      year={2025},
      eprint={2511.08029},
      archivePrefix={arXiv},
      primaryClass={cs.IR},
      url={https://arxiv.org/abs/2511.08029}, 
}