Tasks

Cell Generation

To generate cells conditioned on cell type using a C2S model, you can use the tasks.generate_cells_conditioned_on_cell_type() function. This function will call the batched generation function of the CSModel class with cell type generation prompts.

tasks.generate_cells_conditioned_on_cell_type(csmodel: CSModel, cell_types_list: list, n_genes: int = 200, organism: str = 'Homo sapiens', inference_batch_size: int = 8, max_num_tokens: int = 1024, use_flash_attn: bool = False, **kwargs)

Generate new cells using a C2S model, conditioned on cell type.

Parameters:
  • csmodel – a CSModel object wrapping the C2S model

  • cell_types_list – list of strings representing the cell type labels to generate from

  • n_genes – the number of genes to prompt the model to generate for each cell sentence

  • organism – the organism to generate cells for (‘Homo sapiens’, ‘Mus musculus’)

  • inference_batch_size – batch size of inference for text generation

  • max_num_tokens – maximum number of tokens to generate

  • use_flash_attn – if True, uses Flash Attention in model.generate() for faster inference

  • kwargs – additional arguments for Huggingface model.generate(). For generation options, see Huggingface docs: https://huggingface.co/docs/transformers/en/main_classes/text_generation

Returns:

List of generated cells in the form of cell sentences

Cell Type Annotation

To predict cell types of data, you can use the tasks.predict_cell_types_of_data() function:

tasks.predict_cell_types_of_data(csdata: CSData, csmodel: CSModel, n_genes: int = 200, **kwargs)

Predict cell types of data using C2S model.

Parameters:
  • csdata – a CSData object wrapping the dataset to predict cell types with

  • csmodel – a CSModel object wrapping the C2S model to predict cell types with

  • n_genes – the number of genes to use for each cell sentence

  • kwargs – additional arguments for Huggingface model.generate(). For generation options, see Huggingface docs: https://huggingface.co/docs/transformers/en/main_classes/text_generation

Returns:

List of predicted cell types

Cell Embedding

To embed cells using C2S models, you can use the tasks.embed_cells() function. This function loads a CSModel object, and uses the C2S model to embed cell sentences from the CSData object into embedding vectors.

tasks.embed_cells(csdata: CSData, csmodel: CSModel, n_genes: int = 200, inference_batch_size: int = 8)

Embed cells using C2S model.

Parameters:
  • csdata – a CSData object wrapping the dataset to predict cell types with

  • csmodel – a CSModel object wrapping the C2S model to predict cell types with

  • n_genes – the number of genes to use for each cell sentence

  • inference_batch_size – batch size for inference

Returns:

Numpy array of embedded cells