I experiment and I write
Finetuning VITS on Common voice Dhivehi
Text to speech has been something that interested and involved me for sometime, And the last publically available models are from 2021 era, Since then there has been lots of improvements in the area .
I decided to cook up some and see how it goes
The model we are gonna be training the is the VITS implementaion from coqui (check them out they are the coolest fr)
Whats VITS and why VITS out of all VITS (Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech) is a autoregressive end to end model, while most other architectures such as tacotron rely on a two-stage method where a mel spectogram is first generated, which is then converted to speech using a vocoder , VITS uses a single model architecture which takes in text and returns the raw audio waveforms.…
Read more ⟶Real-time Vehicle Detection and Speed Estimation using YOLO and Deepsort : Part II
New features added pushing image detections to db How? The Detection object already has a list of features for the detections, So what needs to be done is for the Detection object to also have a list image crops of the detections taken using the Bounding box Data from the model.
This is done by adding a function to crop the detections and add to the Detections object
def _get_im_crops(self, bbox_xywh, ori_img): im_crops = [] for box in bbox_xywh: x1,y1,x2,y2 = self.…
Read more ⟶Real-time Vehicle Detection and Speed Estimation using YOLO and Deepsort
Having a good view of the road infront and extra android phone laying around
I wondered
can i log very passing vehicle? can i get their types (car,motorbike,truck)? can i get their speed? can i visualize how busy the and generate live reports based on the logged data? if so that would look darn cool
This could technically be acheived through
live stream of the road server to process the stream and run object detection and tracking backend to save the data Frontend to do cool visuals Final dashboard But how ?…
Read more ⟶