四时宝库

程序员的知识宝库

Spring Boot+VUE集成科大讯飞语音在线合成解决方案

在项目中需要用到将景点文字合成语音,通过语音方式向用户介绍景点信息,需要用到文字转语音的在线合成解决方案。通过对各种文字转语音合成方案与效果比较,觉得讯飞的效果最好,语音拟人效果、文章断词都非常不错,并且有一年10万次的免费使用量,因此对比后决定使用讯的在线语音合成解决方案。由于这信主题网上教程非常少,只找到了一个没提供完整源代码的参考案例,结合官网资料,搞定的完整解决方案和效果图如下:



一、注册讯飞开发者,获取访问Key

到讯飞开发者平台(https://console.xfyun.cn/app/myapp),用实名注册好讯飞开发者用户,添加好自己拟开发的应用系统,申请好应用访问讯飞平台的访问参数。由于自己的服务器性能不行,是使用在线语音合成,采用的是WebAPI方式。此外,读飞还提供了一相demo程序,新手需要下载后反复研读。


二、创建在线语音转换的Util

讯飞demo提供的是一个独立运行的方法,将且用PCM格式,将转换后文件保存在服务器上,需要对此程序进行改造,我主要进行了以下几个方面的改造:

1、将访问参数放到properties文件,以免放在程序代码段被泄漏,由于这部分代码定义的是静态变量,参数注入方式有变化

2、对前端传递过来的TEXT文本,转换后使用MP3格式输出给前端,完整代码如下

//静态参数注入,必须增加@Component注解
@Component
public class XunFeiUtil {
    protected static final Logger log = LoggerFactory.getLogger(XunFeiUtil.class);
    //讯飞四个注入参数,保存在配置文件,便于复用和避免代码上传gitee后泄漏
    private static String hostUrl;
    @Value("${xunfei.hostUrl}")
    public void setHostUrl(String hostUrl) {
        XunFeiUtil.hostUrl = hostUrl;
    }

    private static String appid;
    @Value("${xunfei.appid}")
    public void setAppid(String appid) {
        XunFeiUtil.appid = appid;
    }

    private static String apiSecret;
    @Value("${xunfei.apiSecret}")
    public void setApiSecret(String apiSecret) {
        XunFeiUtil.apiSecret = apiSecret;
    }

    private static String apiKey;
    @Value("${xunfei.apiKey}")
    public void setApiKey(String apiKey) {
        XunFeiUtil.apiKey = apiKey;
    }

    public static final Gson json = new Gson();
    private static String base64 = "";
    private static volatile boolean lock = true;

    /**
     * 将文本转换为MP3格语音base64文件
     *
     * @param text 要转换的文本(如JSON串)
     * @return 转换后的base64文件
     * @throws IOException 异常
     */
    public static String convertText(String text) throws Exception {
        lock = true;
        base64 = "";
        // 构建鉴权url
        String authUrl = getAuthUrl(hostUrl, apiKey, apiSecret);
        OkHttpClient client = new OkHttpClient.Builder().build();
        //将url中的 schema http://和https://分别替换为ws:// 和 wss://
        String url = authUrl.toString().replace("http://", "ws://").replace("https://", "wss://");
        Request request = new Request.Builder().url(url).build();
        List<byte[]> list = Lists.newArrayList();
        WebSocket webSocket = client.newWebSocket(request, new WebSocketListener() {
            @Override
            public void onOpen(WebSocket webSocket, Response response) {
                super.onOpen(webSocket, response);
               try {
                    System.out.println(response.body().string());
                } catch (IOException e) {
                    e.printStackTrace();
                }
                //发送数据
                JsonObject frame = new JsonObject();
                JsonObject business = new JsonObject();
                JsonObject common = new JsonObject();
                JsonObject data = new JsonObject();
                // 填充common
                common.addProperty("app_id", appid);
                //填充business,AUE属性lame是MP3格式,raw是PCM格式
                business.addProperty("aue", "lame");
                business.addProperty("sfl", 1);
                business.addProperty("tte", "UTF8");//小语种必须使用UNICODE编码
                business.addProperty("vcn", "xiaoyan");//到控制台-我的应用-语音合成-添加试用或购买发音人,添加后即显示该发音人参数值,若试用未添加的发音人会报错11200
                business.addProperty("pitch", 50);
                business.addProperty("speed", 50);
                //填充data
                data.addProperty("status", 2);//固定位2
                try {
                    data.addProperty("text", Base64.getEncoder().encodeToString(text.getBytes("utf8")));
                    //使用小语种须使用下面的代码,此处的unicode指的是 utf16小端的编码方式,即"UTF-16LE"”
                    //data.addProperty("text", Base64.getEncoder().encodeToString(text.getBytes("UTF-16LE")));
                } catch (UnsupportedEncodingException e) {
                    e.printStackTrace();
                }
                //填充frame
                frame.add("common", common);
                frame.add("business", business);
                frame.add("data", data);
                webSocket.send(frame.toString());
            }

            @Override
            public void onMessage(WebSocket webSocket, String text) {
                super.onMessage(webSocket, text);
                //处理返回数据
                System.out.println("receive=>");
                ResponseData resp = null;
                try {
                    resp = json.fromJson(text, ResponseData.class);
                } catch (Exception e) {
                    e.printStackTrace();
                }
                if (resp != null) {
                    if (resp.getCode() != 0) {
                        System.out.println("error=>" + resp.getMessage() + " sid=" + resp.getSid());
                        return;
                    }
                    if (resp.getData() != null) {
                        String result = resp.getData().audio;
                        byte[] audio = Base64.getDecoder().decode(result);
                        list.add(audio);
                        // 说明数据全部返回完毕,可以关闭连接,释放资源
                        if (resp.getData().status == 2) {
                            String is = base64Concat(list);
                            base64 = is;
                            lock = false;
                            webSocket.close(1000, "");
                        }
                    }
                }
            }

            @Override
            public void onMessage(WebSocket webSocket, ByteString bytes) {
                super.onMessage(webSocket, bytes);
            }

            @Override
            public void onClosing(WebSocket webSocket, int code, String reason) {
                super.onClosing(webSocket, code, reason);
                System.out.println("socket closing");
            }

            @Override
            public void onClosed(WebSocket webSocket, int code, String reason) {
                super.onClosed(webSocket, code, reason);
                System.out.println("socket closed");
            }

            @Override
            public void onFailure(WebSocket webSocket, Throwable t, Response response) {
                super.onFailure(webSocket, t, response);
                System.out.println("connection failed" + response.message());
            }
        });
        while (lock) {
        }
        return base64;
    }

    /**
     *  * base64拼接
     *  
     */
    static String base64Concat(List<byte[]> list) {
        int length = 0;
        for (byte[] b : list) {
            length += b.length;
        }
        byte[] retByte = new byte[length];
        for (byte[] b : list) {
            retByte = ByteUtils.concat(retByte, b);
        }
        return cn.hutool.core.codec.Base64.encode(retByte);
    }

    /**
     *  * 获取权限地址
     *  *
     *  * @param hostUrl
     *  * @param apiKey
     *  * @param apiSecret
     *  * @return
     *  
     */
    private static String getAuthUrl(String hostUrl, String apiKey, String apiSecret) throws Exception {
        URL url = new URL(hostUrl);
        SimpleDateFormat format = new SimpleDateFormat("EEE, dd MMM yyyy HH:mm:ss z", Locale.US);
        format.setTimeZone(TimeZone.getTimeZone("GMT"));
        String date = format.format(new Date());
        StringBuilder builder = new StringBuilder("host: ").append(url.getHost()).append("\n").
                append("date: ").append(date).append("\n").
                append("GET ").append(url.getPath()).append(" HTTP/1.1");
        Charset charset = Charset.forName("UTF-8");
        Mac mac = Mac.getInstance("hmacsha256");
        SecretKeySpec spec = new SecretKeySpec(apiSecret.getBytes(charset), "hmacsha256");
        mac.init(spec);
        byte[] hexDigits = mac.doFinal(builder.toString().getBytes(charset));
        String sha = Base64.getEncoder().encodeToString(hexDigits);
        String authorization = String.format("hmac username=\"%s\", algorithm=\"%s\", headers=\"%s\", signature=\"%s\"", apiKey, "hmac-sha256", "host date request-line", sha);
        HttpUrl httpUrl = HttpUrl.parse("https://" + url.getHost() + url.getPath()).newBuilder().
                addQueryParameter("authorization", Base64.getEncoder().encodeToString(authorization.getBytes(charset))).
                addQueryParameter("date", date).
                addQueryParameter("host", url.getHost()).
                build();
        return httpUrl.toString();
    }


    public static class ResponseData {
        private int code;
        private String message;
        private String sid;
        private Data data;

        public int getCode() {
            return code;
        }

        public String getMessage() {
            return this.message;
        }

        public String getSid() {
            return sid;
        }

        public Data getData() {
            return data;
        }
    }

    private static class Data {
        //标志音频是否返回结束  status=1,表示后续还有音频返回,status=2表示所有的音频已经返回
        private int status;
        //返回的音频,base64 编码
        private String audio;
        // 合成进度
        private String ced;
    }
}

三、构建前后端接口控制文件

构建建前后端访问的接口,由于前后端通过VUE参数方式传递过来,使用Post方式,我另有文章介绍为什么要用Post,这是VUE的约定,用Get有问题。

@RestController
@RequestMapping("/data/xunfei")
public class TextToAudioController {

    private static final Logger log = LoggerFactory.getLogger(TextToAudioController.class);

    @ApiOperation(value = "文字转语音", notes = "文字转语音")
    @PostMapping(value = "text_to_audio")
    public void textToAudio(String text, HttpServletRequest request , HttpServletResponse response) throws IOException {
        if (StringUtils.isNotBlank(text)) {
            //过滤图片,h5标签
            text = text.replaceAll("\\&[a-zA-Z]{1,10};", "").replaceAll("<[^>]*>", "").replaceAll("[(/>)<]", "").trim();
            //调用后台服务接口获取音频base64
            String result = "";
            try {
                result = XunFeiUtil.convertText(text);
            } catch (Exception e) {
                log.error("【文字转语音接口调用异常】", e);
            }
            //音频数据
            byte[] audioByte = Base64.getDecoder().decode(result);
          //以@RestController
@RequestMapping("/data/xunfei")
public class TextToAudioController {

    private static final Logger log = LoggerFactory.getLogger(TextToAudioController.class);

    @ApiOperation(value = "文字转语音", notes = "文字转语音")
    @PostMapping(value = "text_to_audio")
    public void textToAudio(String text, HttpServletRequest request , HttpServletResponse response) throws IOException {
        if (StringUtils.isNotBlank(text)) {
            //过滤图片,h5标签
            text = text.replaceAll("\\&[a-zA-Z]{1,10};", "").replaceAll("<[^>]*>", "").replaceAll("[(/>)<]", "").trim();
            //调用微服务接口获取音频base64
            String result = "";
            try {
                result = XunFeiUtil.convertText(text);
            } catch (Exception e) {
                log.error("【文字转语音接口调用异常】", e);
            }
            //音频数据
            byte[] audioByte = Base64.getDecoder().decode(result);
            response.setContentType("application/octet-stream;charset=UTF-8");
            OutputStream os = new BufferedOutputStream(response.getOutputStream());
            try {
            //音频流
                os.write(audioByte);
            } catch (IOException e) {
                e.printStackTrace();
            } finally {
                os.flush();
                os.close();
            }
        }
    }
}
            response.setContentType("application/octet-stream;charset=UTF-8");
            OutputStream os = new BufferedOutputStream(response.getOutputStream());
            try {
            //音频流
                os.write(audioByte);
            } catch (IOException e) {
                e.printStackTrace();
            } finally {
                os.flush();
                os.close();
            }
        }
    }
}

四、前端访问请求修改

在request.js拦截器中增加语音处理部分,让前端能正确接收语音语件

// 响应拦截器
service.interceptors.response.use(res => {
  //语音处理
    const headers = res.headers
    if (headers['content-type'] === 'application/octet-stream;charset=UTF-8') {
      return res.data
    }
}

五、构建前端访问接口

// 讯飞语音获取
/**
 * 文字转语音接口
 */
export function textToAudio(data) {
  return request({
    url: '/data/xunfei/text_to_audio',
    method: 'post',
    data: data,
    responseType: "blob"//后台返回的为语音的流数据
  })
}

六、前端程序

为了 便于介绍,单独做了一个独立,完整的程序,这个组件功能就是选取textarea中的内容,进行后台转换为MP3语音,并且增加了避免重复转换判断,增加暂停与继续播放按钮,效果如后面图。

<!--语音播放-->
<template>
  <div class="audio">
    <div>
      <el-input
        type="textarea"
        :autosize="{minRows:3,maxRows:5}"
        placeholder="请输入内容"
        v-model="textarea">
      </el-input>
      <el-badge class="item" style="margin-right: 12px" v-loading="audioLoading">
        <el-button v-if="!audioPlay" icon="el-icon-video-play" size="medium" style="margin: 10px 10px;" circle
                   @click="getAudio(textarea)"></el-button>
        <el-button v-if="audioPlay" icon="el-icon-video-pause" size="medium" style="margin: 10px 10px;" circle
                   @click="audioPause"></el-button>
      </el-badge>
    </div>
  </div>
</template>
<script>
  import {textToAudio} from '@/api/data/xunfei'
  export default {
    name: "audioPlay",
    props: {},
    components: {},
    data() {
      return {
        //文件组件
        textarea: '',
        //语音组件
        audioObj: null,
        //转换时loading设置
        audioLoading: false,
        audioPlay: false,
      }
    },
    mounted() {
      this.audioObj = new Audio();//在VUE中使用audio标签
    },

    methods: {

      //调用后台讯飞语音转换
      getAudio(text) {
        //已有声音直接播放
        if (this.audioObj.src) {
          this.audioObj.play()
        } else {
          if (text) {
            this.audioLoading = true
            let formData = new FormData()
            formData.append('text', text)
            textToAudio(formData).then(response => {
              let url = URL.createObjectURL(response)//通过这个API让语音数据转为成一个url地址
              this.audioObj.src = url//设置audio的src为上面生成的url
              let playPromiser = this.audioObj.play()//进行播放
              //在谷歌内核中,audio.play()会返回一个promise的值,在IE内核中就不会返回任何的值
              //所以如果你要分浏览器,可以判断playPromiser的值来进行操作哦
              this.audioObj.onended = () => {
              }
              this.audioLoading = false
            }).catch(err => {
            })
          }
        }
        this.audioPlay = true
      },
      audioPause() {
        this.audioObj.pause()
        this.audioPlay = false
      }
    }
  }
</script>

<style>
  .audio{
    width: 90%;
    position: absolute;
    top: 20px;
    left: 20px;
    font-size: 26px;
  }
</style>


语音转文字的完整方案还是比较少,经过几天努力搞定了此问题,特完整地记录一下,后续将增加转换MP3保存在OSS功能,避免重复调服务,毕竟后续服务要费用,OSS还是比较便宜。

发表评论:

控制面板
您好,欢迎到访网站!
  查看权限
网站分类
最新留言
    友情链接