引言

最近硅星人多次報道過 AI 圖片生成技術(shù)，提到過 DALL·E、Midjourney、DALL·E mini（現(xiàn)用名 Craiyon）、Imagen、TikTok AI綠幕等知名產(chǎn)品。

實際上，Stable Diffusion 有著強(qiáng)大的生成能力和廣泛的使用可能性，模型可以直接在消費級顯卡上運行，生成速度也相當(dāng)之快。而其免費開放的本質(zhì)，更是能夠讓 AI 圖片生成模型不再作為少數(shù)業(yè)內(nèi)人士的玩物。

在強(qiáng)者如云、巨頭紛紛入局的 AI 圖片生成領(lǐng)域，Stable Diffusion 背后的“神秘”機(jī)構(gòu) Stability AI，也像是“世外高僧”一般的存在。它的創(chuàng)始人沒有那么出名，創(chuàng)辦故事和融資細(xì)節(jié)也不是公開信息。再加上免費開源 Stable Diffusion 的慈善行為，更讓人增加了對這家神秘 AI 科研機(jī)構(gòu)的興趣。

Stable Diffusion介紹

項目開發(fā)領(lǐng)導(dǎo)者有兩位，分別是 AI 視頻剪輯技術(shù)創(chuàng)業(yè)公司 Runway 的 Patrick Esser，和慕尼黑大學(xué)機(jī)器視覺學(xué)習(xí)組的 Robin Romabach。這個項目的技術(shù)基礎(chǔ)主要來自于這兩位開發(fā)者之前在計算機(jī)視覺大會 CVPR22 上合作發(fā)表的潛伏擴(kuò)散模型 (Latent Diffusion Model) 研究。

在訓(xùn)練方面，模型采用了4000臺 A100 顯卡集群，用了一個月時間。訓(xùn)練數(shù)據(jù)來自大規(guī)模AI開放網(wǎng)絡(luò)項目旗下的一個注重“美感”的數(shù)據(jù)子集 LAION-Aesthetics，包括近59億條圖片-文字平行數(shù)據(jù)。

雖然訓(xùn)練過程的算力要求特別高，Stable Diffusion使用起來還是相當(dāng)親民的：可以在普通顯卡上運行，即使顯存不到10GB，仍可以在幾秒鐘內(nèi)生成高分辨率的圖像結(jié)果。

訓(xùn)練擴(kuò)散模型，預(yù)測每一步對樣本進(jìn)行輕微去噪的方法，經(jīng)過幾次迭代，得到結(jié)果。擴(kuò)散模型已經(jīng)應(yīng)用于各種生成任務(wù)，例如圖像、語音、3D 形狀和圖形合成。

擴(kuò)散模型包括兩個步驟：

前向擴(kuò)散——通過逐漸擾動輸入數(shù)據(jù)將數(shù)據(jù)映射到噪聲。這是通過一個簡單的隨機(jī)過程正式實現(xiàn)的，該過程從數(shù)據(jù)樣本開始，并使用簡單的高斯擴(kuò)散核迭代地生成噪聲樣本。此過程僅在訓(xùn)練期間使用，而不用于推理。
參數(shù)化反向 - 撤消前向擴(kuò)散并執(zhí)行迭代去噪。這個過程代表數(shù)據(jù)合成，并被訓(xùn)練通過將隨機(jī)噪聲轉(zhuǎn)換為真實數(shù)據(jù)來生成數(shù)據(jù)。

這其實是非常繁瑣的，而正是基于此，Stable Diffusion采用了一種更加高效的方式構(gòu)建擴(kuò)散模型，具體如下（來源于該模型paper）：

Stable Diffusion模型搭建記錄

stable-diffusion-v1-1 環(huán)境準(zhǔn)備

為啥區(qū)別開v1.1與后面的v1.4環(huán)境，是我看到v1.1的倉庫好像只是作為一個測試，里面并沒有v1.4完整的代碼，并且模型權(quán)重以及安裝難度小很多。

sd-v1-1.ckpt: 237k steps at resolution 256x256 on laion2B-en. 194k steps at resolution 512x512 on laion-high-resolution (170M examples from LAION-5B with resolution >= 1024x1024).
sd-v1-2.ckpt: Resumed from sd-v1-1.ckpt. 515k steps at resolution 512x512 on laion-aesthetics v2 5+ (a subset of laion2B-en with estimated aesthetics score > 5.0, and additionally filtered to images with an original size >= 512x512, and an estimated watermark probability < 0.5. The watermark estimate is from the LAION-5B metadata, the aesthetics score is estimated using the LAION-Aesthetics Predictor V2).
sd-v1-3.ckpt: Resumed from sd-v1-2.ckpt. 195k steps at resolution 512x512 on “l(fā)aion-aesthetics v2 5+” and 10% dropping of the text-conditioning to improve classifier-free guidance sampling.
sd-v1-4.ckpt: Resumed from sd-v1-2.ckpt. 225k steps at resolution 512x512 on “l(fā)aion-aesthetics v2 5+” and 10% dropping of the text-conditioning to improve classifier-free guidance sampling.

上述來源于Github，簡單解釋就是sd-v1-1.ckpt大概有1.3G左右，而sd-v1-4.ckpt是4G，full-v1.4是7.4G，所以進(jìn)入v1.1環(huán)境安裝過程。

pip install --upgrade diffusers transformers scipy

沒錯，就一句話。v1.1環(huán)境只是v1.4的一個簡略版本，v1.4是完全版。

stable-diffusion-v1-4 環(huán)境準(zhǔn)備

這個問題就有點多了，因為外網(wǎng)問題，以及有些包確實不好安裝，開梯子可能會快很多，因我是在服務(wù)器上，以下是我踩坑的一些記錄。

https://github.com/CompVis/stable-diffusion.git
conda env create -f environment.yaml
conda activate ldm

上述bug主要在第二步，下載速度很慢，這里提供幾種解決方案。作者在yaml中設(shè)置的channels是依據(jù)pytorch和conda默認(rèn)源，但是很顯然，沒有梯子，不僅會很慢，而且timeout幾率大大增加?？紤]改變channel地址，為：

name: ldm
channels:
  - http://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free/
  - http://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge/
  - http://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/msys2/
  - http://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/bioconda/
  - http://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main/
    # - defaults

我不知道是不是就我有問題，出現(xiàn)報錯為Solving environment: failed,ResolvePackageNotFound，具體如下：

這個錯我沒分析出啥意思，但大概感覺里面有東西沖突了，我就改手動了，手動創(chuàng)建一個虛擬環(huán)境為py38，然后去下載包。除了CLIP和taming-transformers，其他沒在出現(xiàn)問題。

最后兩個包錯誤為 error: RPC failed; curl 56 GnuTLS recv error (-54): Error in the pull function.，報錯給出的方案為note: This error originates from a subprocess, and is likely not a problem with pip.：

這個錯的原因是，我手動創(chuàng)建的虛擬環(huán)境的pip一般安裝最新版本，但這倆包需要的環(huán)境為pip==20.3，所以退下pip版本就安裝成功。

huggingface 上 Diffusion申請使用資格

首先，如果想下載Stable Diffusion的模型，必須要去huggingface同意下載協(xié)議，具體鏈接為：

stable-diffusion-v1-1：
https://huggingface.co/CompVis/stable-diffusion-v1-1

stable-diffusion-v1-4：
https://huggingface.co/CompVis/stable-diffusion-v1-4

點進(jìn)這兩個里面，首先會彈出相關(guān)協(xié)議，大概是不用于商用，不做違法亂紀(jì)，xxxxx等，但怎么說呢，量子位那篇《Stable Diffusion火到被藝術(shù)家集體舉報，網(wǎng)友科普背后機(jī)制被LeCun點贊》一文看完，感覺該商用的公司依然會套層皮商用，因為太火？emmm。。。切回正題，只有點擊同意該協(xié)議后，就可以在服務(wù)器端下載了。

在服務(wù)器端輸入：

huggingface-cli login

就會彈出登錄界面：

然后去網(wǎng)頁上進(jìn)入settings，跟GitHub操作差不多，選擇User Access Tokens，復(fù)制token，輸入上圖進(jìn)行登陸，如果沒有User Access Tokens，請進(jìn)行創(chuàng)建：

token登錄后，就能進(jìn)行模型測試了。

stable-diffusion-v1-1 測試

import torch
from torch import autocast
from diffusers import StableDiffusionPipeline

model_id = "CompVis/stable-diffusion-v1-1"
device = "cuda"


pipe = StableDiffusionPipeline.from_pretrained(model_id, use_auth_token=True)
pipe = pipe.to(device)

prompt = "a photo of an astronaut riding a horse on mars"
with autocast("cuda"):
    image = pipe(prompt, guidance_scale=7.5)["sample"][0]

image.save("astronaut_rides_horse.png")

不出意外，會出現(xiàn)條形滾動模型下載輸出，我就不再演示了，雖然該模型只有1.3G，但是我網(wǎng)速有點差，下了v1.4，已經(jīng)有點耐心受限。。

當(dāng)然，上述只是最原始的模型下載方式，還有其余選項下載不同權(quán)重：

"""
如果您受到 GPU 內(nèi)存的限制并且可用的 GPU RAM 少于 10GB，請確保以 float16 精度加載 StableDiffusionPipeline，而不是如上所述的默認(rèn) float32 精度。
"""
import torch

pipe = StableDiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float16, revision="fp16", use_auth_token=True)
pipe = pipe.to(device)

prompt = "a photo of an astronaut riding a horse on mars"
with autocast("cuda"):
    image = pipe(prompt, guidance_scale=7.5)["sample"][0]  
    
image.save("astronaut_rides_horse.png")

"""
要換出噪聲調(diào)度程序，請將其傳遞給from_pretrained：
"""
from diffusers import StableDiffusionPipeline, LMSDiscreteScheduler

model_id = "CompVis/stable-diffusion-v1-1"
# Use the K-LMS scheduler here instead
scheduler = LMSDiscreteScheduler(beta_start=0.00085, beta_end=0.012, beta_schedule="scaled_linear", num_train_timesteps=1000)
pipe = StableDiffusionPipeline.from_pretrained(model_id, scheduler=scheduler, use_auth_token=True)
pipe = pipe.to("cuda")

prompt = "a photo of an astronaut riding a horse on mars"
with autocast("cuda"):
    image = pipe(prompt, guidance_scale=7.5)["sample"][0]  
    
image.save("astronaut_rides_horse.png")

最后，如果網(wǎng)速實在太差，可以直接去網(wǎng)頁端下載，鏈接為：
https://huggingface.co/CompVis/stable-diffusion-v-1-1-original

stable-diffusion-v1-4 測試

和1.1一樣，首先是模型下載，也是有很多種選擇，我就不一一列出了：

# make sure you're logged in with `huggingface-cli login`
from torch import autocast
from diffusers import StableDiffusionPipeline

pipe = StableDiffusionPipeline.from_pretrained(
        "CompVis/stable-diffusion-v1-4",
        use_auth_token=True
).to("cuda")

prompt = "a photo of an astronaut riding a horse on mars"
with autocast("cuda"):
    image = pipe(prompt)["sample"][0]

image.save("astronaut_rides_horse.png")


# device = "cuda"
# model_path = "CompVis/stable-diffusion-v1-4"
# 
# # Using DDIMScheduler as anexample,this also works with PNDMScheduler
# # uncomment this line if you want to use it.
# 
# # scheduler = PNDMScheduler.from_config(model_path, subfolder="scheduler", use_auth_token=True)
# 
# scheduler = DDIMScheduler(beta_start=0.00085, beta_end=0.012, beta_schedule="scaled_linear", clip_sample=False, set_alpha_to_one=False)
# pipe = StableDiffusionImg2ImgPipeline.from_pretrained(
#     model_path,
#     scheduler=scheduler,
#     revision="fp16", 
#     torch_dtype=torch.float16,
#     use_auth_token=True
# ).to(device)

上述我采用最開始的下載方式，默認(rèn)為32位，其它參數(shù)沒動，就是大概要下載4個多G的模型：

中途斷過幾次，每次斷都跟xxxx一樣，網(wǎng)絡(luò)不好就很難受。但所幸還是下載完了，下載完后跟pytorch的模型庫一樣，存儲路徑為：

當(dāng)前目錄生成了prompt的話內(nèi)容相似的圖：

感覺還是挺有喜劇效果的。另外在上述等待時間內(nèi)，我還做了兩手準(zhǔn)備，直接在官方下模型了，不怕一萬，就怕萬一。地址為：https://huggingface.co/CompVis/stable-diffusion-v-1-4-original/blob/main/sd-v1-4.ckpt

不管哪種方式，只要能用就好，那么緊接著就可以測試文本轉(zhuǎn)圖像文本例程，這里我自己寫了兩條，另外，參考了模型方法–Stable Diffusion 中的prompt和運行命令，因為感覺寫得很全的樣子。實例為：

python txt2img.py --prompt "Asia girl, glossy eyes, face, long hair, fantasy, elegant, highly detailed, digital painting, artstation, concept art, smooth, illustration, renaissance, flowy, melting, round moons, rich clouds, very detailed, volumetric light, mist, fine art, textured oil over canvas, epic fantasy art, very colorful, ornate intricate scales, fractal gems, 8 k, hyper realistic, high contrast" 
                  --plms 
                  --outdir ./output/
                  --ckpt ./models/sd-v1-4.ckpt 
                  --ddim_steps 100 
                  --H 512 
                  --W 512 
                  --seed 8

這里為了好看，參數(shù)做了換行處理，如果直接運行請去除換行，參數(shù)的解釋可以直接看GitHub，沒有太難的參數(shù)設(shè)置。在終端跑起來后，還需要下載一個HardNet模型：

下載完后就可以出結(jié)果了，圖像為：

還有兩組我隨便寫得參數(shù)為：

prompt = "women, pink hair, ArtStation, on the ground, open jacket, video game art, digital painting, digital art, video game girls, sitting, game art, artwork"

prompt = "fantasy art, women, ArtStation, fantasy girl, artwork, closed eyes, long hair. 4K, Alec Tucker, pipes, fantasy city, fantasy art, ArtStation"

好像混進(jìn)去什么奇怪的東西？emmm，我也不知道為什么會出來。。。

這是文字轉(zhuǎn)圖片的用例，還有一種就是 圖像+文字轉(zhuǎn)圖像，那么啟動方式為：

python img2img.py --prompt "magic fashion girl portrait, glossy eyes, face, long hair, fantasy, intricate, elegant, highly detailed, digital painting, artstation, concept art, smooth, sharp focus, illustration, renaissance, flowy, melting, round moons, rich clouds, very detailed, volumetric light, mist, fine art, textured oil over canvas, epic fantasy art, very colorful, ornate intricate scales, fractal gems, 8 k, hyper realistic, high contrast" 
                          --init-img ./ceshi/33.jpg 
                          --strength 0.8 
                          --outdir ./output/
                          --ckpt ./models/sd-v1-4.ckpt 
                          --ddim_steps 100

本來我以為，跑demo就此就可以很順利的結(jié)束了，然而很悲催的是，卡資源不夠了。剛好卡空間少了幾G（PS：也就是v1.4需要的顯存，不止15G）：

    return _VF.einsum(equation, operands)  # type: ignore[attr-defined]
RuntimeError: CUDA out of memory. Tried to allocate 2.44 GiB (GPU 0; 14.75 GiB total capacity; 11.46 GiB already allocated; 1.88 GiB free; 11.75 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

所以，我也不糾結(jié)了，直接轉(zhuǎn)FP16精度，并且參考colab上的實驗，我看有人是用t4成功了，那么話不多說，直接轉(zhuǎn)jupyter notebook。

先導(dǎo)包：

import inspect
import warnings
from typing import List, Optional, Union

import torch
from torch import autocast
from tqdm.auto import tqdm

from diffusers import (
    AutoencoderKL,
    DDIMScheduler,
    DiffusionPipeline,
    PNDMScheduler,
    UNet2DConditionModel,
)
from diffusers.pipelines.stable_diffusion import StableDiffusionSafetyChecker
from transformers import CLIPFeatureExtractor, CLIPTextModel, CLIPTokenizer

然后加入數(shù)據(jù)管道源碼，下載預(yù)訓(xùn)練權(quán)重模型，指定模型為float16：

class StableDiffusionImg2ImgPipeline(DiffusionPipeline):
    def __init__(
        self,
        vae: AutoencoderKL,
        text_encoder: CLIPTextModel,
        tokenizer: CLIPTokenizer,
        unet: UNet2DConditionModel,
        scheduler: Union[DDIMScheduler, PNDMScheduler],
        safety_checker: StableDiffusionSafetyChecker,
        feature_extractor: CLIPFeatureExtractor,
    ):
        super().__init__()
        scheduler = scheduler.set_format("pt")
        self.register_modules(
            vae=vae,
            text_encoder=text_encoder,
            tokenizer=tokenizer,
            unet=unet,
            scheduler=scheduler,
            safety_checker=safety_checker,
            feature_extractor=feature_extractor,
        )

    @torch.no_grad()
    def __call__(
        self,
        prompt: Union[str, List[str]],
        init_image: torch.FloatTensor,
        strength: float = 0.8,
        num_inference_steps: Optional[int] = 50,
        guidance_scale: Optional[float] = 7.5,
        eta: Optional[float] = 0.0,
        generator: Optional[torch.Generator] = None,
        output_type: Optional[str] = "pil",
    ):

        if isinstance(prompt, str):
            batch_size = 1
        elif isinstance(prompt, list):
            batch_size = len(prompt)
        else:
            raise ValueError(f"`prompt` has to be of type `str` or `list` but is {type(prompt)}")

        if strength < 0 or strength > 1:
          raise ValueError(f'The value of strength should in [0.0, 1.0] but is {strength}')

        # set timesteps
        accepts_offset = "offset" in set(inspect.signature(self.scheduler.set_timesteps).parameters.keys())
        extra_set_kwargs = {}
        offset = 0
        if accepts_offset:
            offset = 1
            extra_set_kwargs["offset"] = 1

        self.scheduler.set_timesteps(num_inference_steps, **extra_set_kwargs)

        # encode the init image into latents and scale the latents
        init_latents = self.vae.encode(init_image.to(self.device)).sample()
        init_latents = 0.18215 * init_latents

        # prepare init_latents noise to latents
        init_latents = torch.cat([init_latents] * batch_size)
        
        # get the original timestep using init_timestep
        init_timestep = int(num_inference_steps * strength) + offset
        init_timestep = min(init_timestep, num_inference_steps)
        timesteps = self.scheduler.timesteps[-init_timestep]
        timesteps = torch.tensor([timesteps] * batch_size, dtype=torch.long, device=self.device)
        
        # add noise to latents using the timesteps
        noise = torch.randn(init_latents.shape, generator=generator, device=self.device)
        init_latents = self.scheduler.add_noise(init_latents, noise, timesteps)

        # get prompt text embeddings
        text_input = self.tokenizer(
            prompt,
            padding="max_length",
            max_length=self.tokenizer.model_max_length,
            truncation=True,
            return_tensors="pt",
        )
        text_embeddings = self.text_encoder(text_input.input_ids.to(self.device))[0]

        # here `guidance_scale` is defined analog to the guidance weight `w` of equation (2)
        # of the Imagen paper: https://arxiv.org/pdf/2205.11487.pdf . `guidance_scale = 1`
        # corresponds to doing no classifier free guidance.
        do_classifier_free_guidance = guidance_scale > 1.0
        # get unconditional embeddings for classifier free guidance
        if do_classifier_free_guidance:
            max_length = text_input.input_ids.shape[-1]
            uncond_input = self.tokenizer(
                [""] * batch_size, padding="max_length", max_length=max_length, return_tensors="pt"
            )
            uncond_embeddings = self.text_encoder(uncond_input.input_ids.to(self.device))[0]

            # For classifier free guidance, we need to do two forward passes.
            # Here we concatenate the unconditional and text embeddings into a single batch
            # to avoid doing two forward passes
            text_embeddings = torch.cat([uncond_embeddings, text_embeddings])


        # prepare extra kwargs for the scheduler step, since not all schedulers have the same signature
        # eta (η) is only used with the DDIMScheduler, it will be ignored for other schedulers.
        # eta corresponds to η in DDIM paper: https://arxiv.org/abs/2010.02502
        # and should be between [0, 1]
        accepts_eta = "eta" in set(inspect.signature(self.scheduler.step).parameters.keys())
        extra_step_kwargs = {}
        if accepts_eta:
            extra_step_kwargs["eta"] = eta

        latents = init_latents
        t_start = max(num_inference_steps - init_timestep + offset, 0)
        for i, t in tqdm(enumerate(self.scheduler.timesteps[t_start:])):
            # expand the latents if we are doing classifier free guidance
            latent_model_input = torch.cat([latents] * 2) if do_classifier_free_guidance else latents

            # predict the noise residual
            noise_pred = self.unet(latent_model_input, t, encoder_hidden_states=text_embeddings)["sample"]

            # perform guidance
            if do_classifier_free_guidance:
                noise_pred_uncond, noise_pred_text = noise_pred.chunk(2)
                noise_pred = noise_pred_uncond + guidance_scale * (noise_pred_text - noise_pred_uncond)

            # compute the previous noisy sample x_t -> x_t-1
            latents = self.scheduler.step(noise_pred, t, latents, **extra_step_kwargs)["prev_sample"]

        # scale and decode the image latents with vae
        latents = 1 / 0.18215 * latents
        image = self.vae.decode(latents)

        image = (image / 2 + 0.5).clamp(0, 1)
        image = image.cpu().permute(0, 2, 3, 1).numpy()

        # run safety checker
        safety_cheker_input = self.feature_extractor(self.numpy_to_pil(image), return_tensors="pt").to(self.device)
        image, has_nsfw_concept = self.safety_checker(images=image, clip_input=safety_cheker_input.pixel_values)

        if output_type == "pil":
            image = self.numpy_to_pil(image)

        return {"sample": image, "nsfw_content_detected": has_nsfw_concept}

device = "cuda"
model_path = "CompVis/stable-diffusion-v1-4"

# Using DDIMScheduler as anexample,this also works with PNDMScheduler
# uncomment this line if you want to use it.

# scheduler = PNDMScheduler.from_config(model_path, subfolder="scheduler", use_auth_token=True)

scheduler = DDIMScheduler(beta_start=0.00085, beta_end=0.012, beta_schedule="scaled_linear", clip_sample=False, set_alpha_to_one=False)
pipe = StableDiffusionImg2ImgPipeline.from_pretrained(
    model_path,
    scheduler=scheduler,
    revision="fp16", 
    torch_dtype=torch.float16,
    use_auth_token=True
).to(device)

這里大概也有接近3G的模型，沒有報錯后，載入圖像并對其進(jìn)行預(yù)處理，以便我們可以將其傳遞給管道?？梢韵冗x擇官方圖進(jìn)行測試：

預(yù)處理：

import PIL
from PIL import Image
import numpy as np

def preprocess(image):
    w, h = image.size
    w, h = map(lambda x: x - x % 32, (w, h))  # resize to integer multiple of 32
    image = image.resize((w, h), resample=PIL.Image.LANCZOS)
    image = np.array(image).astype(np.float32) / 255.0
    image = image[None].transpose(0, 3, 1, 2)
    image = torch.from_numpy(image)
    return 2.*image - 1.

加載官方圖，可以手動下載傳上去，也能直接走網(wǎng)絡(luò)請求：

import requests
from io import BytesIO

url = "https://raw.githubusercontent.com/CompVis/stable-diffusion/main/assets/stable-samples/img2img/sketch-mountains-input.jpg"

response = requests.get(url)
init_img = Image.open(BytesIO(response.content)).convert("RGB")
init_img = init_img.resize((768, 512))
init_img

最后載入prompt，加載進(jìn)pipeline，就可以得到跟GitHub中一樣的效果：

init_image = preprocess(init_img)

prompt = "A fantasy landscape, trending on artstation"

generator = torch.Generator(device=device).manual_seed(1024)
with autocast("cuda"):
    images = pipe(prompt=prompt, init_image=init_image, strength=0.75, guidance_scale=7.5, generator=generator)["sample"]

不過我這里加入的是另一個詞條，為：

prompt = "Anime, Comic, pink hair, ArtStation, on the ground,cartoon, Game "

結(jié)果為：

這樣看上去還行，但我去下了幾張動漫圖，準(zhǔn)備還用上面詞條，主要是pink hair的關(guān)鍵字，腦子一瞬間想到的是栗山未來和圣人惠（檢查的時候發(fā)現(xiàn)問題，然而櫻花+惠的組合讓我印象深刻），結(jié)果上述圖里我的jupyter本來就幾個命令塊代碼，跑了接近80次，有60多次都是我在微調(diào)。。。單詞黔驢技窮了，感覺詞條有問題，但就那樣了，調(diào)的比較好的一次作品為：

不過看網(wǎng)上別人做的，是真的好看。從結(jié)果來講，第一可能是我模型精度選得小，第二就是我的詞匯量有點匱乏，這個用例是邊寫博客邊調(diào)的，另外有其它事情忙，調(diào)得有點煩，不過還算滿意。（PS：不滿意又能怎么辦？emmm）

上面內(nèi)容都是自己搭建環(huán)境自己調(diào)，相當(dāng)于可以自己手動調(diào)教模型參數(shù)，朝著自己想要的方向走，而下面將介紹一些我在huggingface以及一個商用的已經(jīng)調(diào)教好的在線平臺。

在線體驗Stable Diffusion

這里推薦兩個地址，一個是為官方的測試地址：

https://huggingface.co/spaces/stabilityai/stable-diffusion

輸入Anime, Comic, on the ground,cartoon, Game，感覺上不可名狀，官方在線部署的應(yīng)該是小模型了，并且訓(xùn)練結(jié)果很慢。

https://huggingface.co/spaces/huggingface/diffuse-the-rest

不錯，看來我畫的還是很寫實的，emmm。另外，體驗了幾次后，我發(fā)現(xiàn)對于亞洲，或者直接指定國內(nèi)，不論男女，顏值與歐美的相比有點出入，可能還是國內(nèi)的數(shù)據(jù)集不夠多。

最后，是找到一個非開源的名字叫做stable-diffusion-animation的項目：

https://replicate.com/andreasjansson/stable-diffusion-animation

這個就比較寫實了，用24幀圖像做出了一個20秒的視頻，正好對上了最火的那個生物起源的視頻，不知道是不是用該項目做的。那么到此，本篇博客結(jié)束。

本站僅提供存儲服務(wù)，所有內(nèi)容均由用戶發(fā)布，如發(fā)現(xiàn)有害或侵權(quán)內(nèi)容，請點擊舉報。

中文字幕理论片,69视频免费在线观看,亚洲成人app,国产1级毛片,刘涛最大尺度戏视频,欧美亚洲美女视频,2021韩国美女仙女屋vip视频

引言