Prompt Formatter
A Prompt Formatter object deals with formatting prompts for specific tasks, such as cell generation and cell type prediction.
- prompt_formatter.get_cell_sentence_str(ds_sample, num_genes: int | None = None)
Helper function for formatting cell sentences. Returns a cell sentence string containing a list of space-separated gene names. Caps number of genes at ‘num_genes’ if not None.
- Parameters:
ds_sample – Huggingface dataset sample, assumed to follow C2S data schema.
num_genes – if not None, integer representing number of genes to limit cell sentence to.
- prompt_formatter.PromptFormatter()
Abstract base class for prompt formatting.
Subclasses should implement the format_hf_ds method, which takes a Huggingface dataset and formats it with any chosen prompts for the desired task. It should return a new Huggingface dataset with at least the following columns: - model_input: str, the formatted model input - response: str, the formatted model response These will be used by the tokenizer to format the data for the model.
- prompt_formatter.PromptFormatter.__init__(self, /, *args, **kwargs)
Initialize self. See help(type(self)) for accurate signature.
- prompt_formatter.PromptFormatter.format_hf_ds(self, hf_ds)
Format a Huggingface dataset with prompts.